Supercomputers like Oak Ridge National Laboratory's Titan are advancing science at a frenetic pace and helping researchers make sense of data that could have easily been missed, says Ramakrishnan "Ramki" Kannan.
Kannan, a computer scientist who came to ORNL in March 2016 after earning a doctorate from Georgia Institute of Technology, points to a bygone era when scientists often only had to focus on one instrument at a time. They could log data and not miss a thing. Now, however, experiments can involve several instruments, like the sophisticated ones at the U.S. Department of Energy's Spallation Neutron Source and Center for Nanophase Materials Sciences (CNMS) at ORNL. The sheer amount of data generated can be overwhelming.
Enter Kannan's distributive machine learning tool, which collects and sorts enormous amounts of data in a fraction of the time of other methods. It helps researchers extract the most from Titan and its 18,688 nodes (20 petaflops) of computing power.
"This technology condenses the information into what's significant, enabling us to better understand very large high-dimensional data," says Kannan, who notes that computers can provide multiple perspectives that humans cannot. "Scientists can extract the data they want to see, but it sometimes helps them to have even more data than they originally intended."
This innovation was made possible through a project funded by ORNL's laboratory directed research and development program. Kannan designed what he describes as off-the-shelf data analysis algorithms with some important modifications to handle large amounts of scientific and Internet data with amazing speed and efficiency. The native of India notes that his approach provides scientists with the fastest and best program available, and the information could help guide policy-makers.
To accomplish his goal, Kannan minimized the amount of traffic among computers, pooled multiple messages and communications into bigger message sizes — similar to volume discounts and economies of scale — and sequenced operations to avoid unnecessary communication.
In the laboratory, his technique can capture even molecular movements in stunning detail, eliminate background noise, and identify precisely when a significant event occurred. In other application areas, the technique is useful for analyzing video of highways and intersections, for example, which could aid in the design of better roads and help reduce congestion, or help researchers better understand trending social topics in near real time at different geographical levels, from rural to urban.
Some of Kannan's latest work is detailed in "MPI-FAUN: An MPI-Based Framework for Alternating-Updating Nonnegative Matrix Factorization." This math-intensive research performed with Grey Ballard of Wake Forest University and Haesun Park of Georgia Tech explores efficient parallel algorithms to solve the problem of large data sets.
Before joining ORNL's Computational Data Analytics Group, Kannan worked in research and product development for IBM, where he earned several honors. These include Master Inventor, the Authorship Award for sustained contributions to IBM's intellectual property through publications, conference papers, patents, and articles, and the IBM Fifth Plateau Award. His passion for collaborative research, however, led him to ORNL because he has the chance to work with scientists in pursuit of discoveries that will benefit humanity.
"Every day provides me with a new set of challenges and the excitement of making contributions across a wide range of disciplines from materials to the environment and human health," Kannan says. "It's inspiring to think about where we're headed with deep learning and knowledge discovery."
Kannan is especially looking forward to working with the Summit supercomputer, scheduled to be commissioned at ORNL in 2018. Summit will feature more than five times the computational performance of Titan.
The CNMS is an Office of Science User Facility, as is the Oak Ridge Leadership Computing Facility, home to Titan.
No entries found