acm-header
Sign In

Communications of the ACM

ACM Careers

Nsf Leads Federal Efforts In Big Data


View as: Print Mobile App Share:
visualization of Hurricane Ike

Throughout the 2008 hurricane season, the Texas Advanced Computing Center was an active participant in a NOAA research effort that used up to 40,000 processing cores to develop and create next-generation hurricane models. This visualization of Hurricane I

Credit: The University of Texas at Austin; NOAA; University of Pennsylvania; Texas A&M

National Science Foundation Director Subra Suresh has outlined efforts to build on NSF's legacy in supporting the fundamental science and underlying infrastructure enabling the big data revolution. At an event Thursday (March 29) led by the White House Office of Science and Technology Policy in Washington, DC, Suresh joined other federal science agency leaders to discuss cross-agency big data plans and announce new areas of research funding across disciplines in this field.

NSF announced new awards under its Cyberinfrastructure for the 21st Century framework and Expeditions in Computing programs, as well as awards that expand statistical approaches to address big data. The agency is also seeking proposals under a Big Data solicitation, in collaboration with the National Institutes of Health (NIH), and anticipates opportunities for cross-disciplinary efforts under its Integrative Graduate Education and Research Traineeship program and an Ideas Lab for researchers in using large datasets to enhance the effectiveness of teaching and learning.

NSF-funded research in these key areas will develop new methods to derive knowledge from data, and to construct new infrastructure to manage, curate and serve data to communities. As part of these efforts, NSF will forge new approaches for associated education and training.

"Data are motivating a profound transformation in the culture and conduct of scientific research in every field of science and engineering," Suresh says. "American scientists must rise to the challenges and seize the opportunities afforded by this new, data-driven revolution. The work we do today will lay the groundwork for new enterprises and fortify the foundations for U.S. competitiveness for decades to come."

NSF released a solicitation, "Core Techniques and Technologies for Advancing Big Data Science & Engineering," or "Big Data," jointly with NIH. This program aims to extract and use knowledge from collections of large data sets in order to accelerate progress in science and engineering research. Specifically, it will fund research to develop and evaluate new algorithms, statistical methods, technologies, and tools for improved data collection and management, data analytics and e-science collaboration environments.

"The Big Data solicitation creates enormous opportunities for extracting knowledge from large-scale data across all disciplines," says Farnam Jahanian, assistant director for NSF's directorate for computer and information science and engineering. "Foundational research advances in data management, analysis and collaboration will change paradigms of research and education, and promise new approaches to addressing national priorities."

One of NSF's awards announced Thursday includes a $10 million award under the Expeditions in Computing program to researchers at the University of California, Berkeley. The team will integrate algorithms, machines, and people to turn data into knowledge and insight. The objective is to develop new scalable machine-learning algorithms and data management tools that can handle large-scale and heterogeneous datasets, novel datacenter-friendly programming models, and an improved computational infrastructure.

NSF's Cyberinfrastructure Framework for 21st Century Science and Engineering, or "CIF21," is core to strategic efforts. CIF21 will foster the development and implementation of a national cyberinfrastructure for researchers in science and engineering to achieve a democratization of data. In the near term, NSF will provide opportunities and platforms for science research projects to develop the appropriate mechanisms, policies and governance structures to make data available within different research communities. In the longer term, what will result in the integration of ground-up efforts, within a larger-scale national framework, for the sharing of data among disciplines and institutions.

The first round of awards made through an NSF geosciences program called EarthCube, under the CIF21 framework, was also announced. These awards will support the development of community-guided cyberinfrastructure to integrate big data across geosciences and ultimately change how geosciences research is conducted. Integrating data from disparate locations and sources with eclectic structures and formats that has been stored as well as captured in real time, will expedite the delivery of geoscience knowledge.

"EarthCube is a groundbreaking NSF program," says Tim Killeen, assistant director for NSF's geosciences directorate. "It represents a dynamic new way to access, share and use data of all types to accelerate and transform research for understanding our planet. We are asking experts from all sectors — industry, academia, government and non-U.S. institutions — to form collaborations and tell us what research topics they think are most important. Their enthusiastic and energetic response has resulted in a synergy of exhilarating and novel ideas."

NSF also announced a $1.4 million award for a focused research group that brings together statisticians and biologists to develop network models and automatic, scalable algorithms and tools to determine protein structures and biological pathways.

And, a $2 million award for a research training group in big data will support training for undergraduates, graduates, and postdoctoral fellows to use statistical, graphical, and visualization techniques for complex data.

"NSF is developing a bold and comprehensive approach for this new data-centric world, from fundamental mathematical, statistical and computational approaches needed to understand the data, to infrastructure at a national and international level needed to support and serve our communities, to policy enabling rapid dissemination and sharing of knowledge," says Ed Seidel, assistant director for NSF's mathematical and physical sciences directorate. "Together, this will accelerate scientific progress, create new possibilities for education, enhance innovation in society, and be a driver for job creation. Everyone will benefit from these activities."

In addition, anticipated cross-disciplinary efforts at NSF include encouraging data citation to increase opportunities for the use and analysis of data sets; participation in an Ideas Lab to explore ways to use big data to enhance teaching and learning effectiveness; and the use of NSF's Integrative Graduate Education and Research Traineeship mechanism to educate and train researchers in data enabled science and engineering.

 

HIPerWall system The HIPerWall system at the University of California, Irvine, provideshigh-capacity visualization capabilities for experimental and theoreticalresearchers. The 50-panel display brings to life terabyte-sized datasets.Credit: Calit2, University of California, San Diego

A full list of NSF data-enabled science and engineering projects follows.

 

Core Techniques and Technologies for Advancing Big Data Science & Engineering (Big Data) is a new joint solicitation between NSF and NIH that aims to advance the core scientific and technological means of managing, analyzing, visualizing, and extracting useful information from large, diverse, distributed, and heterogeneous data sets. Specifically, it will support the development and evaluation of technologies and tools for data collection and management, data analytics, and/or e-science collaborations, laying the foundations for U.S. competitiveness for many decades to come. NSF contact: Suzanne Iacono.

Cyberinfrastructure Framework for 21st Century Science and Engineering (CIF21) develops, consolidates, coordinates, and leverages a set of advanced cyberinfrastructure programs and efforts across NSF to create meaningful cyberinfrastructure, as well as develop a level of integration and interoperability of data and tools to support science and education. NSF contacts: Alan Blatecky and Mark Suskin.

CIF21 Track for IGERT. NSF has shared with its community plans to establish a new CIF21 track as part of its Integrative Graduate Education and Research Traineeship (IGERT) program. This track aims to educate and support a new generation of researchers able to address fundamental Big Data challenges concerning core techniques and technologies, problems, and cyberinfrastructure across disciplines. NSF contacts: Mark Suskin and Tom Russell.

Data Citation, which provides transparency and increased opportunities for the use and analysis of data sets, was encouraged in a dear colleague letter initiated by NSF's Geosciences directorate, demonstrating NSF's commitment to responsible stewardship and sustainability of data resulting from federally funded research.

Data and Software Preservation for Open Science is a first attempt to establish a formal collaboration of physicists from experiments at the LHC and Fermilab/Tevatron with experts in digital curation, heterogeneous high-throughput storage systems, large-scale computing systems, and grid access and infrastructure. The intent is to define and execute a compact set of well-defined, entrant-scale activities on which to base a large-scale, long-term program, as well as an index of commonality among various scientific disciplines. NSF contacts: Randal Ruchti, Marv Goldberg, and Saul Gonzalez.

Digging into Data Challenge addresses how big data changes the research landscape for the humanities and social sciences, in which new, computationally-based research methods are needed to search, analyze, and understand massive databases of materials such as digitized books and newspapers, and transactional data from web searches, sensors, and cell phone records. Administered by the National Endowment for the Humanities (NEH), this Challenge is funded by multiple U.S. and international organizations. NEH contact: Brett Bobley.

EarthCube supports the development of community-guided cyberinfrastructure to integrate data into a framework that will expedite the delivery of geoscience knowledge. NSF's just announced, first round of EarthCube awards, made within the CIF21 framework, via the EArly Concept Grants for Exploratory Research mechanism, are the first step in laying the foundation to transform the conduct of research in geosciences. NSF contact: Clifford Jacobs.

Expeditions in Computing has funded a team of researchers at the University of California, Berkeley to deeply integrate algorithms, machines, and people to address big data research challenges. The combination of fundamental innovations in analytics, new systems infrastructure that facilitate scalable resources from cloud and cluster computing and crowd sourcing, and human activity and intelligence will provide solutions to problems not solvable by today's automated data analysis technologies alone. NSF contact: Mitra Basu.

Focused Research Group, stochastic network models. Researchers are developing a unified theoretical framework for principled statistical approaches to network models with scalable algorithms in order to differentiate knowledge in a network from randomness. Collaborators in biology and mathematics will study relationships between words and phrases in a very large newspaper database in order to provide media analysts with automatic and scalable tools. UC Berkelty contact: Peter Bickel. NSF contact: Haiyan Cai.

Ideas Lab. NSF released a dear colleague letter announcing an Ideas Lab, for which cross disciplinary participation will be solicited, to generate transformative ideas for using large datasets to enhance the effectiveness of teaching and learning environments. NSF contact: Doris Carver.

Information Integration and Informatics addresses the challenges and scalability problems involved in moving from traditional scientific research data to very large, heterogeneous data, such as the integration of new data types models and representations, as well as issues related to data path, information life cycle management, and new platforms. NSF contact: Sylvia Spengler.

The Computational and Data-enabled Science and Engineering in Mathematical and Statistical Sciences, created by NSF's Division of Mathematical Sciences and the Office of Cyberinfrastructure, is becoming a distinct discipline encompassing mathematical and statistical foundations and computational algorithms. Proposals in this program are currently being reviewed and new awards will be made in July 2012. NSF contact: Jia Li.

Some Research Training Groups (RTG) and Mentoring through Critical Transition Points relate to big data. The RTG project at the UC Davis addresses the challenges associated with the analysis of object-data — data that take on many forms including images, functions, graphs and trees — in a number of fields such as astronomy, computer science, and neuroscience. Undergraduates will be trained in graphical and visualization techniques for complex data, software packages, and computer simulations to assess the validity of models. The development of student sites with big data applications to climate, image reconstruction, networks, cybersecurity, and cancer are also underway. NSF contact: Nandini Kannan.

The Laser Interferometer Gravitational Wave Observatory (LIGO) detects gravitational waves, previously unobserved form of radiation, which will open a new window on the universe. Processing the deluge of data collected by LIGO is only possible through the use of large computational facilities across the world and the collective work of more than 870 researchers in 77 institutions, as well as the project. NSF contacts: Pedro Marronetti and Tom Carruthers.

The Open Science Grid enables over 8,000 scientists worldwide to collaborate on discoveries, including the search for the Higgs boson. High-speed networks distribute over 15 petabytes of data each year in real-time from the Large Hadron Collider at CERN in Switzerland to more than 100 computing facilities. Partnerships of computer and domain scientists and computing facilities in the U.S. provide the advanced fabric of services for data transfer and analysis, job specification and execution, security and administration, shared across disciplines including physics, biology, nanotechnology, and astrophysics. NSF contacts: Marv Goldberg and Saul Gonzalez.

The Theoretical and Computational Astrophysics Networks program seeks to maximize the discovery potential of massive astronomical data sets by advancing the fundamental theoretical and computational approaches needed to interpret those data, uniting researchers in collaborative networks that cross institutional and geographical divides and training the future theoretical and computational scientists. NSF contact: Tom Statler. NASA contact: Linda Sparke.


 

No entries found

Sign In for Full Access
» Forgot Password? » Create an ACM Web Account