acm-header
Sign In

Communications of the ACM

Blueprint for the future of high-performance networking

Introduction


Figure. Turbulent flow. Adaptive mesh refinement volume-rendering using parallel offscreen graphics hardware to remotely do the rendering and deliver the result back to the client application. (Simulation data: Phil Colella, Lawrence Berkeley National Laboratory Advanced Numerical Algorithms Group; visualization: Cristina Siegerist, LBNL, on her Grid-based VisPortal using the AMR Volume Renderer)

Major technological and cost breakthroughs in networking technology over the past few years have made it possible to send multiple lambdas down a single length of user-owned optical fiber. (A lambda, in networking parlance, is a fully dedicated wavelength of light in an optical network, capable of bandwidth speeds of 1–10Gbps.) Rather than being a bottleneck, metro and long-haul lambdas at 10Gbps are 100 times faster than 100T-base Fast Ethernet local area networks used by PCs in research laboratories. The exponential growth rate in bandwidth capacity over the past 10 years has surpassed even Moore's Law, due, in part, to the use of parallelism in network architectures. Now the parallelism is in multiple lambdas on single-strand optical fibers, creating supernetworks, or networks faster (and someday cheaper) than the computers attached to them.

Supernetworks are key architectural elements of the IT infrastructure (cyberinfrastructure) now being installed worldwide for the advancement of scientific research. High-performance computing and communication technologies are enabling computational scientists, or e-scientists, to study and better understand complex systems—physical, geological, biological, environmental, and atmospheric—from the micro to the macro scale, in both time and space. These technologies allow for new levels of persistent collaboration over continental and transoceanic distances, coupled with the ability to process, disseminate, and share information on unprecedented scales, immediately benefiting the scientific community and ultimately, everyone else, as well.

A Grid is a set of networked, middleware-enabled computing resources. A LambdaGrid is a Grid in which the lambda networks themselves are resources that can be scheduled like any other computing, storage, and visualization resource. E-scientists are today exploring how to augment their data-intensive Grid computing—primarily distributed number crunching—with high-throughput, or LambdaGrid, computing. The ability to schedule and quickly provision lambdas can provide (between enabled sites) deterministic end-to-end network performance for collaborative and other time-critical applications. Grid computing, and now LambdaGrid computing, is being designed by global, interdisciplinary teams of e-scientists, application programmers, networking engineers, electrical/computer engineers, and computer scientists in universities, government laboratories, and industrial research environments. Attacking challenging research issues, they are developing innovative solutions for advanced Grid services, including new techniques for network control and traffic engineering, middleware for bandwidth-matching distributed resources, and collaboration and visualization tools for real-time interaction with high-definition imagery.

The articles that follow are not just about high-performance optical networking but represent an early blueprint for how the technology is being integrated into today's evolving cyberinfrastructure for the benefit of e-science. Written by leading figures in their respective fields, they describe the evolving LambdaGrid, data transmission protocols utilizing the bandwidth glut resulting from the telecom fiber-optic building boom of the late 1990s, middleware for data-intensive applications, lambda-centric computer architecture, and application drivers, as well as the related technological challenges, in order to achieve data-intensive scientific research and collaboration. I am privileged to share their pioneering efforts and vision with you.

DeFanti et al. describe TransLight, a global-scale experimental networking initiative that aims to advance cyberinfrastructure for e-scientists through the collaborative development of optical networking tools and techniques and advanced LambdaGrid services, including lambda provisioning and traffic engineering. The countries/consortia/groups that have funded development of these links are allowing a portion of their bandwidth to be scheduled by e-scientists and computer scientists as part of this global experiment. TransLight complements, but does not replace, global production research and education best-effort networks, including Abilene in the U.S. and Géant in Europe. It experiments with deterministic, or known and knowable, massive bandwidth to encourage optimizations of bulk traffic behavior. The long-term goal is discovery of novel uses of lambdas and creation of testbeds for new networking paradigms.

Asking what will happen to the Internet's Transmission Control Protocol (TCP) as these networks continue to grow, Falk et al. explore transport protocols for high-performance networks, weighing today's TCP/IP-based Internet, which relies on congestion control in TCP for stability. While TCP enables a large number of users to share congested networks in a stable manner, e-scientists using lambdas experience performance penalties because their high-capacity links are poorly served by standard TCP. The authors survey several new transport protocols and techniques for tuning TCP and the User Datagram Protocol to enable very large-scale, data-intensive applications to efficiently use optical-fiber networks, comparing them to the typical email- and Web-based inquiries shared over today's Internet.

Foster and Grossman examine technical advances that must be made in middleware for scientific communities to be able to share data and associated network and computational resources in a controlled and secure manner. Data Grids are distributed computing platforms providing computational and data resources, resource integration and management, and security and policy support; these resources, which are required to analyze, integrate, and mine even the largest petascale data sets, are being developed so their complexity is hidden from the applications using them. Data Webs are Web-service-based platforms enabling access, exploration, analysis, integration, and mining of remote and distributed data. The research effort within the Data Grid, data Web, and data mining communities is most likely to converge in a new generation of technologies that transforms the entire global Internet into a data-exploration and data-integration platform.


Terabit networks will eventually transmit data 20 million times faster than today's dialup 56K Internet connection and approximately a million times faster than a cable modem connection.


The OptIPuter, explored here by Smarr et al., is so named for its use of optical networking, Internet Protocol (IP), computer storage, and processing and visualization technologies. Essentially a research infrastructure, it tightly couples computational resources over parallel optical networks using IP—a virtual parallel computer in which the individual processors are widely distributed clusters. The OptIPuter's backplane is provided by IP delivered over multiple dedicated lambdas (each 1–10Gbps); its mass storage systems are large distributed scientific data repositories, fed by scientific instruments functioning as OptIPuter peripheral devices. The OptIPuter's main contributions to cyberinfrastructure and scientific discovery include software and middleware abstractions to provide data-delivery capabilities intended for the future lambda-rich world—where endpoint-delivered bandwidth is routinely greater than individual computers are likely to be able to saturate.

Finally, weighing the future of data-intensive e-science research, Newman et al. look at three application groups—High-Energy and Nuclear Physics, EarthScope, and the Biomedical Informatics Research Network—pioneering advances in globally connected, Grid-enabled, data-intensive systems. Each anticipates the data it processes, analyzes, and shares today will increase from the multi-petabyte to the exabyte range over the next 10 to 15 years, while the corresponding network speed requirements on major links increase from the 10Gbps to the Tbps range. The authors describe what works today, what doesn't work, and what's possible only in a future LambdaGrid world.

The ultimate cyberinfrastructure, which may take decades and billions of dollars to develop, will make available petascale computing, exabyte storage, and terabit networks. A petaflop is two orders of magnitude faster than today's fastest university computers; an exabyte is a billion GB of storage. Terabit networks will eventually transmit data 20 million times faster than today's dialup 56K Internet connection and approximately a million times faster than a cable modem connection. All this requires technologies and methodologies more advanced than are available today, but thanks to these contributors the (r)evolution has begun.

Back to Top

Author

Maxine D. Brown ([email protected]) is associate director of the Electronic Visualization Laboratory at the University of Illinois at Chicago.

Back to Top

Figures

UF1Figure. Turbulent flow. Adaptive mesh refinement volume-rendering using parallel offscreen graphics hardware to remotely do the rendering and deliver the result back to the client application. (Simulation data: Phil Colella, Lawrence Berkeley National Laboratory Advanced Numerical Algorithms Group; visualization: Cristina Siegerist, LBNL, on her Grid-based VisPortal using the AMR Volume Renderer)

Back to top


©2003 ACM  0002-0782/03/1100  $5.00

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

The Digital Library is published by the Association for Computing Machinery. Copyright © 2003 ACM, Inc.