acm-header
Sign In

Communications of the ACM

ACM TechNews

Olcf Expands Data Analytics Capability With Popular Programming Language


View as: Print Mobile App Share:
The logo for Programming with Big Data in R.

Researchers at the Oak Ridge Leadership Computing Facility can use R data analytics software to manage and analyze enormous datasets generated by supercomputers.

Credit: Oak Ridge National Laboratory

Users at the Oak Ridge National Laboratory's Oak Ridge Leadership Computing Facility (OLCF) are able to use R, the most commonly used data analytics software in academia, to manage and analyze enormous datasets generated by supercomputers.

Typically used to analyze small datasets on regular workstations, R has been scaled to enable researchers to expedite analysis by at least an order of magnitude.

The Programming with Big Data in R project was funded by the U.S. National Science Foundation for use on OLCF's systems. The team wrote the code to conduct deep data analysis from the R language, developed the high-level infrastructure to allow for easier implementation of statistical computations on supercomputers, and optimized the library and data input choices on the thousands of cores in OLCF's Rhea, Eos, and Titan systems.

"The main idea is to use some of the same scalable libraries that are already used by simulation science and supercomputers," says OLCF's George Ostrouchov. "We not only make them easily accessible from R, but we also built infrastructure inside and outside R that makes it easier to implement statistical matrix methods in a highly scalable way."

Using scaled R, the group took a complex analytical problem that typically takes several hours on Apache Spark and analyzed it in less than a minute.

From Oak Ridge National Laboratory
View Full Article

 

Abstracts Copyright © 2016 Information Inc., Bethesda, Maryland, USA


 

No entries found

Sign In for Full Access
» Forgot Password? » Create an ACM Web Account