The "data deluge" in scientific research stems in large part from the growing sophistication of experimental instrumentation and optimizing tools — often using machine- and deep-learning methods — to analyze increasingly large data sets. But what is equally important for improving scientific productivity is the optimization of data collection — aka "data taking" — methods.
Toward this end, Marcus Noack, a postdoctoral scholar at Lawrence Berkeley National Laboratory in the Center for Advanced Mathematics for Energy Research Applications (CAMERA), and James Sethian, director of CAMERA and Professor of Mathematics at UC Berkeley, have been working with beamline scientists at Brookhaven National Laboratory to develop and test SMART (Surrogate Model Autonomous Experiment), a mathematical method that enables autonomous experimental decision making without human interaction. A paper describing SMART and its application in experiments at Brookhaven's National Synchrotron Light Source II (NSLS-II) are described in "A Kriging-Based Approach to Autonomous Experimentation with Applications to X-Ray Scattering," published in Scientific Reports.
"Modern scientific instruments are acquiring data at ever-increasing rates, leading to an exponential increase in the size of data sets," says Noack, lead author on the paper. "Taking full advantage of these acquisition rates requires corresponding advancements in the speed and efficiency not just of data analytics but also experimental control."
The goal of many experiments is to gain knowledge about the material that is studied, and scientists have a well-tested way to do this: they take a sample of the material and measure how it reacts to changes in its environment. User facilities such as Brookhaven's NSLS-II and the Center for Functional Nanomaterials offer access to high-end materials characterization tools. The associated experiments are often lengthy, and complicated procedures and measurement time is precious. A research team might only have a few days to measure their materials, so they need to make the most of each step in each measurement.
"A standard approach for users at light sources like the NSLS-II is to manually and exhaustively scan through a sample," Noack says. "But if you assume the data set is 3D or higher dimensional, at some point this exercise becomes intractable. So what is needed is something that can automatically tell me where I should take my next measurement."
Noack joined Berkeley Lab two years ago to bring mathematics into the design and optimization of experiments, with the ultimate goal of enabling autonomous experiments. The result is SMART, a Python-based algorithm that automatically selects measurements from an experiment and exploits Gaussian process regression (aka Kriging) to construct a surrogate model and an error function based on the available experimental data. Mathematical function optimization is then used to explore the error function to find the maximum error and suggest the location for the next measurement. The result is a mathematically rigorous and compact approach to systematically perform optimally efficient experiments.
"People have been doing intelligent data collection for a long time, but for beamline scientists this is the first application of the most sophisticated generation of Gaussian processes," says Sethian, a co-author on the Scientific Reports paper. "By exploiting Gaussian processes, approximation theory, and optimization, Marcus has designed a framework to bring autonomous optimized modeling and AI to beamline science."
In practice, before starting an experiment, the scientists provide SMART with a set of goals they want to get out of the experiment. The raw data is sent to an automated-analysis software, usually available at beamlines, and then handed to the SMART decision-making algorithm. To determine the next measurement, the algorithm creates a surrogate model of the data, which is comparable to an educated guess on how the material will behave in the next possible steps and calculates the uncertainty — basically how confident it is in its guess — for each possible next step. Based on this, it then selects the most uncertain option to measure next. The trick here is that by picking the most uncertain step to measure next, it maximizes the amount of knowledge it gains by measuring it. The algorithm also defines when to end the experiment by figuring out the moment when any additional measurements would yield no further new knowledge about the material.
"The basic idea is, given a bunch of experiments, how can you automatically pick the next best one?" asks Sethian. "Marcus has built a world which builds an approximate surrogate model on the basis of your previous experiments and suggests the best or most appropriate experiment to try next."
"The final goal is not only to take data faster but also to improve the quality of the data we collect," says Kevin Yager, group leader at the Center for Functional Nanomaterials and co-author of the Scientific Reports paper. "I think of it as experimentalists switching from micromanaging their experiment to managing at a higher level. Instead of having to decide where to measure next on the sample, the scientists can instead think about the big picture, which is ultimately what we as scientists are trying to do."
In experiments run at NSLS-II, the collaborators used SMART to demonstrate autonomous experiments using X-ray scattering. The first experimental setup was on NSLS-II's Complex Materials Scattering beamline, which offers ultrabright X-rays to study the nanostructure of different materials. For their first fully autonomous experiment, the team imaged the thickness of a droplet of nanoparticles using a technique called small-angle X-ray scattering at the scattering beamline. After their initial success, they reached out to other users and proposed having them test SMART on their scientific problems. Since then they have measured a number of samples, Yager says.
"This is an exciting part of this collaboration," says Masafumi Fukuto, co-author of the study and scientist at NSLS-II. "We all provided an essential piece for it: the CAMERA team worked on the decision-making algorithm, Kevin from CFN worked on the real-time data analysis, and we at NSLS-II provided the automation for the measurements."
While the code has been shown to be stable and working well, Noack is making improvements to SMART to make it more powerful, run faster on more measurements, and be less computationally expensive. In the meantime, scientists at other beamlines are expressing interest in using SMART, and new experiments are already scheduled at Brookhaven and Berkeley Lab's Advanced Light Source.
SMART isn't intended just for beamline experiments, however. "SMART is implemented in a way that has nothing to do with a beamline," Noack says. "If you want to explore a space where your data lives, all you need to know is how big that space is, that it has a beginning and an end in every dimension, and you can press SMART and it will give you everything. As long as you put numbers on it and a number on how much you like it, generally any experiment will work with SMART."
Other authors on the report are Gregory S. Doerk, staff scientist at the Center for Functional Nanomaterials, and Ruipeng Li, computer scientist at Lawrence Berkeley National Laboratory.
The National Synchrotron Light Source II and the Center for Functional Nanomaterials are DOE Office of Science User Facilities.
No entries found