ACM

Communications of the ACM

Home/Magazine Archive/January 2004 (Vol. 47, No. 1)/Network and Computing Research Infrastructure: Back.../Full Text

Network and Computing Research Infrastructure: Back to the Future

By Robert J. Aiken, Javad Boroumand, Stephen Wolff
Communications of the ACM, January 2004, Vol. 47 No. 1, Pages 93-98
10.1145/962081.962086
Comments

View as: Print Mobile App ACM Digital Library Full Text (PDF) Share:

Network, computer engineering, and computer science research has long been dependent on a core Internet infrastructure, including the ARPANET and NSFNET, providing concurrent support for application, computational, and network research. Since the mid-1990s, however, as a result of research networks evolving into production systems, there has been a notable lack of any new such infrastructure to support this goal. Many current government programs, including the National Science Foundation's testbed and experimental programs, advocate and support the three-tiered model of production, experimental, and research network infrastructure developed and promulgated by a few advanced network infrastructures, including Gigaport, CalREN-2, and the National LambdaRail; for more on these systems, as well as others cited throughout the article, see the table here. Some of them treat research, experimental, and production networks as separate entities. In doing so, they lose sight of the goals of both the Next Generation Internet (NGI) and the High Performance Computing and Communications (HPCC) programs, as well as the MORPHnet architecture [2], to provide a combined research infrastructure truly capable of supporting the next step function for application, computing, and network research.

Only through deployment of an integrated research and production infrastructure at network layers 1 through 3 will the various technical communities be able to address large-scale and complex systems research in peer-to-peer systems, Grids, collaboratories, peering, routing, network management, network monitoring, end-to-end QoS, adaptive and ad hoc networks, fault tolerance, high availability, and critical infrastructure.

As the physics community has repeatedly demonstrated, a balance of theory and experimentation is necessary to not only identify but validate research hypotheses. The 2001 National Academy of Sciences report Looking Over the Fence [6] also identified the need for appropriate research infrastructures.

The ARPANET began with a few sites focusing on basic network protocol research, coupled with a smattering of applications, including the File Transfer Protocol (FTP) and email. As the ARPANET grew and evolved into the NSFNET during the late 1980s, the nature of the research also evolved toward higher-layer network challenges, including next-generation routing devices, as well as scaling, peering, and network monitoring driven by applications requiring collaboration and interconnectivity among research communities.

The early 1990s witnessed the next evolution of the NSFNET and the Internet. Development of the very high performance Backbone Network Service (vBNS) and Network Access Point (NAP) architecture [1] focused on scaling and interconnection, particularly between commercial and research networks. The transparent interconnectivity provided by the prototypical and pioneering peering pointsMetropolitan Area Ethernets, NAPS, Federal Internet Exchangeto the service providers and research networks ultimately made the Web possible.

The Web then ushered in an entirely new genre of network research focusing on content switching, network-based storage, caching, and other relevant network, system, and application research. This unique set of network research, driven by application requirements, including the Web, was possible only because a large-scale network research infrastructure already existed on which to run the experiment.

By the mid-1990s, network and computing research had moved up the protocol food chain to include middleware [3] and applications. It focused on scaling, as well as network monitoring and management, to support advanced applications and Grids. However, there was practically no infrastructure-based network research at the time. The Grid High Performance Networking subgroup of the Global Grid Forum is now working to identify relevant network research topics for Grid applications.

The U.S. multiagency-sponsored HPCC program of the early 1990s evolved into the NGI program circa 1997. NGI sought to integrate three principal research themes: experimental for advanced network technologies; next-generation network fabric; and revolutionary applications. Many successful independent network and computing research projects emerged; but as an integrated research support infrastructure, NGI and HPCC fell short of their goals. Two network research infrastructure concepts were proffered at the time by Joseph Touch and Joseph Bannister of the Information Sciences Institute at the University of Southern California and by one of the authors (Aiken) et al. while at Argonnne National Laboratory to provide an infrastructure concurrently supporting production use and network research. The X-BONE is an IP (level 3) infrastructure supporting network research projects that can be tunneled within IP. The MORPHnet concept, promulgated the idea that a multi-policy network infrastructure, or research and production as two different types of policy, was needed to support concurrent research on network layers 1 through 7, as well as on middleware and application research. As it turned out, however, the goal in 1997 of the MORPHnet architecture was too ambitious, as it required layer 1 network capabilities, which were too difficult to manage at the time.

The Web is a perfect illustration of the importance of providing an experimental infrastructure concurrently supporting both experimental and production applications.

Multiple Layers

The current concept in network research infrastructure is the three-tiered model, including a production network, an experimental network, and a network research infrastructure. They are often defined and managed independently; but to succeed they need to be integrated and support the concurrent multiple layers of research, as defined in the MORPHnet model (see the figure here) and adopted by the National LambdaRail.

The Netherlands' GigaPort project integrates network research, advanced infrastructures, and applications. The California-state-based CalREN-2 network, which defined the three-tiered model, is currently deploying a MORPHnet-like infrastructure that concurrently supports production and network research requirements. The National LambdaRail initiative is a U.S. version of the same concept, with an emphasis on network research. Canada's CA*Net network is an optical network that also supports these requirements. These advanced infrastructures are a result of the recent availability of affordable dark fiber and "lightwaves" that provide a much more manageable way to support the concurrent network research and production infrastructures envisioned in the original MORPHnet architecture.

Meanwhile, many other national, regional, state, and metropolitan research networks have investigated and adopted the same model. Please note that although most related discussions focus on advanced optical and core routing, the major challenges are in providing a true end-to-end advanced infrastructure, including end systems, as well as metropolitan and campus infrastructures, to support computational and applications research.

Applications Drive Research

Advanced applications often drive network research. The lines connecting networks, applications, operating systems, and systems research are blurring, so research in one domain is often germane to and affects the other domains. System and applications research now includes networks and middleware as fundamental components, which are therefore relevant as co-dependent research opportunities. These combinations and interactions increasingly define new and hybrid areas of research.

Middleware's role in advanced applications and infrastructures has evolved over the past few yearsreflected in such Grid systems as Globus, Legion, and Condor. These and other similar activities are now under the umbrella of the Global Grid Forum; as a result, there is a plethora of research opportunities in middleware and its intersection with intelligent, adaptive, ad hoc networks and systems. However, like other high-performance computing networks they also require a large-scale research network environment for developing and validating new ideas; such ideas include interactions between policy-based network protocols, QoS, autonomic computing, dynamic provisioning, intelligent-network and application-triggered fault tolerance, high availability, and network management and monitoring. Meanwhile, in the areas of security and critical infrastructure, ideas include intrusion-detection systems, firewalls, and denial of service.

All these network and systems research areas share the common challenge of scale and complexity and require a large-scale network infrastructure on which to perform and validate their research.

The Web has defined, and continues to define, new areas of network research as a direct result of the existence of the experimental network on which it began and the production networks on which it operates today. In the same vein, new Grid-based applications, including the Network for Earthquake Experimental Simulation (NEES) and the High Energy Physics Large Hadron Collider (HEP LHC) research program, help identify and drive new network and systems researchbut only if advanced applications and network research are tightly coupled and supported on the same experimental infrastructure(s), making it possible to investigate complexity and scaling issues.

Many areas of research relate to distributed computing and the virtualization of network-based storage, computing, and collaboration-based environments; example directions include addressing and content routing, the extension of storage-area and system-area network protocols, policy interaction (such as security, provisioning, and QoS), and protocol and peering interaction of access networks, devices, and systems on an application-to-application basis.

The NSF-sponsored Distributed TeraGrid Facility (DTF) is investigating the concept of a distributed machine room floor where four different computing centers perform as one virtual center. Although it encompasses research and development of distributed systems and applications, the project's managers, focusing on application- and system-based research, have decided not to integrate network research at this time. Several of the Extended TeraGrid Facility sites, including the Pittsburgh Supercomputer Center, are pursuing integration of network research and the applications on the National LambdaRail. Specifically, they are investigating the scaling of TCP/IP on an end-to-end basis. Meanwhile, some of the DTF institutions are involved in network and systems research through the OptIPuter projecta virtual metacomputer designed to support such advanced applications as biomedical informatics and distributed large-scale scientific databases [11].

Many other Grid-based applications are capable of supporting advanced research in computing and networks as a way to advance their own research efforts. The NEES program is a Grid-based project focusing on the integration of field-based sensors with a number of major earthquake engineering sites to couple simulation modeling, computational science, and advanced visualization into a collaboration Grid. The HEP LHC project [10] pushes the limits of today's networks not only in terms of bandwidth demand but in the integration of application, middleware, computing, storage, and network research. Its advanced applications combine an IP-over-optics transcontinental infrastructure (including the National LambdaRail and CalREN-2), focusing on methods for LHC applications to better utilize the network through monitoring, dynamic bandwidth, lambda allocation, multi-path load balancing when application demand exceeds the bandwidth available per lightwave, and techniques to scale TCP. Scaling TCP involves research on FAST TCP [8], the eXplicit Congestion control Protocol [9], and High Speed TCP [7]. Another Grid-based application experiment, emphasizing dynamic bandwidth management, is being undertaken by the U.K. science community as part of its Managed Bandwidth Next-Generation project.

Though the Web has clearly influenced many areas of application, computing, and network research since its debut in the early 1990s, much of that influence was not originally envisioned or anticipated by its creators. This is a perfect illustration of the importance of providing an experimental infrastructure concurrently supporting both experimental and production applications, along with network and systems research. Today, we see the same kind of influence on network and system requirements from peer-to-peer applications, most notably Kazaa and Morpheus. From a technical computer science perspective, the peer-to-peer model and relevant architectures provide interesting research opportunities.

The Netherlands' GigaPort project is intended to encourage the same kind of application, system, and network research dynamics as those envisioned by the NGI and HPCC programs. It incorporates an application-research-program component focused on virtual and persistent presence, including mobile and nomadic connectivity, to investigate potential requirement changes on the network (on an end-to-end basis) resulting from deployment of these new applications. For example, the National LambdaRail infrastructure is purposefully designed without an "acceptable use policy" in order to encourage development of new applications in both the commercial and the academic arenas, thus helping identify future network and systems research issues.

Scaling and Complexity

Basic network research on university campuses and research centers employs virtual networks (such as ABone), tunneling (such as PlanetLab), federated networks, network research kits, and simulation and modeling labs. Many types of computing science and engineering research lend themselves to initial development and vetting by using local laboratories, simulations, and modeling; eventually, however, all must be deployed and tested on large-scale experimental networks to validate their algorithms on large-scale systems, as well as in a variety of technologies and business and policy-based domains.

Much network and systems research does not, however, lend itself to virtual approaches; rather, it requires an infrastructure concurrently supporting integrated applications, research infrastructure(s), and network research. Included are network scaling and protocol and layer interaction complexity, network management and monitoring, intrusion-detection and distributed denial-of-service systems, self-diagnosing and self-healing networks and systems, and integrated management of underlying transmission, switching, and routing. Also included are storage and computing resources, adaptive and ad hoc networks, dynamic provisioning of layer 1 through 3 services, control-plane protocols and architectures, and real-time real-data-driven simulation and modeling capabilities, as defined in the DARPA- and Cisco Systems-supported Tools for Smart Networks project at the University of California, Berkeley [5].

That project integrates simulation and modeling using real-time, real-network-status data to better understand the state of the network; this knowledge then helps make intelligent decisions on network provisioning, engineering, and management. It requires a research infrastructure with real data flows and applications for developing and validating the models. Other instances where network engineers have enhanced their understanding of networks and systems depends on their ability to monitor and analyze real traffic flows and handle commercial traffic, as well as its research and educational counterpart. This research includes the work of principal investigator kc claffy [how she spells it] and her collaborators at the Cooperative Association for Internet Data Analysis, or CAIDA, including the spectroscopy of Domain Name Service update traffic [4] and Border Gateway Protocol (BGP) Autonomous System (AS) Path Incongruities. BGP-AS research investigates and analyzes the AS connectivity of the Internet core, as deduced from traceroute measurements from BGP tables. Beyond gaining a deeper understanding of Internet routing, the ultimate goal of this work is to identify a replacement for the venerable BGP. Getting there requires access to real network data on a real experimental infrastructure.

The Service Architecture for Heterogenous Access, Resources, and Applications (SAHARA) research project at the University of California, Berkeley, examines new service architectures and business models operating over a heterogeneous set of autonomous networks; it supports research on the use of confederated services, network-aware applications, and service peering. Service peering represents an important evolution of the peering principles established by the NAPs and focuses on systems above the network and transport layers while still being dependent on them.

Finally, it is important to note that even network layer 1 optical research needs to be integrated into any advanced network and application research infrastructure. The OptIPuter is an example; another involves research on optical packet labeling and switching systems. Daniel Blumenthal of the University of California, Santa Barbara, is working on integrating this type of research into the CalREN-2 experimental network. Both CalREN-2 and the National LambdaRail provide important research infrastructures for such efforts. The National LambdaRail has adopted the MORPHnet model and concurrently supports network and systems research at all layers, as outlined in the figure, providing both production and research capabilities at the next higher layer of the network, either concurrently or via time-phasing allocation of the infrastructure.

All these areas share the common challenge of scale and complexity and require a large-scale network infrastructure on which to perform and validate their research. We at Cisco Systems are convinced that future Internet-based distributed systems depend on integrated research; as a result, Cisco has made a substantial investment in the National LambdaRail, as well as in similar experimental networks, to encourage the computing and networking communities to return to their roots to pursue integrated applications, systems, and network research.

Conclusion

Given their interdependence, advanced applications, middleware, Grids, computers, and networks need to be more tightly coupled than they are now. Regrettably, system interdependence and network research in security, reliability, resilience, virtual environments, and network management and monitoring are often not supported due to their complexity and the cost of the infrastructure. In light of the trend toward virtualization and Grid-based systems, the network is clearly the key building block for future functionality and performance, as well as for research. It is therefore imperative that research infrastructures, including CalREN-2, GigaPort, and the National LambdaRail, embodying and extending the MORPHnet concept, be developed and deployed to support not only network research but systems and applications research as well.

The future evolution of computing, networks, middleware, and applications into dependable infrastructures rests on being able to support integrated research on the same infrastructure. Therefore, it is imperative that we go back to the future and again embrace the tenets of the ARPANET and early NSFNET, where both network and application research were undertaken simultaneously on the same infrastructure. This duality rendered these networks the powerful influence for which they are now known. The future of network and computational sciences, along with network-enabled applications, depends on our achieving this goal.

References

1. Aiken, R., Braun, H., and Ford, P. NSF Implementation Plan for an Interagency Interim National Research and Education Network. Rep. No. GAA21174. San Diego Supercomputer Center/General Atomics, May 1992.

2. Aiken, R., Carlson, R., Foster, I., Kuhfuss, T., Stevens, R., and Winkler, L. Architecture of the Multi-Modal Organizational Research and Production Heterogeneous Network (MORPHnet). ANL/ECT/97/1. Argonne National Laboratory, Argonne, IL, Jan. 1997.

3. Aiken, R., Strassner, J., Carpenter, B., Foster, I., Lynch, C., Mambretti, J., Moore, R., and Teitelbaum, B. Network Policy and Services: A Report of the Workshop on Middleware. IETF RFC2768. Internet Engineering Task Force, Feb. 2000.

4. Broida, A., Nemeth, E., and claffy, k. Spectroscopy of DNS update traffic. In Proceedings of the International Conference on Measurement and Modeling of Computer Systems (ACM SIGMETRICS) (San Diego, CA, June 1014, 2003).

5. Chi, E., Fu, M., and Walrend, J. Proactive resource provisioning for Voice over IP. In Proceedings of the International Symposium on Performance Evaluation of Computer and Telecommunication Systems (SPECTS'03) (Montreal, Canada, July 2024, 2003).

6. Computer Science and Telecommunications Board Committee on Research Horizons in Networks. Looking Over the Fence at Networks: A Neighbor's View of Network Research. National Academy of Sciences, Washington, D.C., 2001.

7. Floyd, S. High-Speed TCP for Large Congestion Windows. IETF Draft. Internet Engineering Task Force, July 2003; see draft-ietf-tsvwg-highspeed-00.txt.

8. Jin, C., Wei, D., and Low, S. FAST TCP for High-Speed Long-Distance Networks, IETF Draft. Internet Engineering Task Force, June 2003; see draft-jwl-tcp-fast-01.txt.

9. Katabi, D., Handley, M., and Rohrs, C. Congestion control for high-bandwidth-delay product networks. In Proceedings of ACM SIGCOMM'02 (Pittsburgh, PA, Aug. 1923). ACM Press, New York, 2002.

10. Newman, H., Ellisman, M., and Orcutt, J. Data-intensive e-science frontier research. Commun. ACM 46, 11 (Nov. 2003), 6875.

11. Smarr, L., Chien, A., DeFanti, T., Leigh, J., and Papadopoulos, P. The OptIPuter. Commun. ACM 46, 11 (Nov. 2003), 5966.

Authors

Robert J. Aiken ([email protected]) is director of engineering in Academic Research and Technologies Initiatives at Cisco Systems, Inc. in Sabillasville, MD.

Javad Boroumand ([email protected]) is manager of research programs in Academic Research and Technologies Initiatives at Cisco Systems, Inc. in Fairfax, VA.

Stephen Wolff ([email protected]) is a technical leader and program manager of Academic Research and Technologies Initiatives at Cisco Systems, Inc. in Washington, D.C.

Figures

Figure. National LambdaRail networking research vs. production.

Tables

Table. Access to sources and information.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

No entries found