Competing in a fictitious high-stakes scenario, a group of scientists at the U.S. Department of Energy's Lawrence Berkeley National Laboratory bested two dozen other teams in a months-long, data-driven scavenger hunt for simulated radioactive materials in a virtual urban environment.
The goal of this hackathon-styled event was both to improve the detection methods that could be applied to actual threats involving nuclear materials, and to create a platform to virtually vet out these methods. The computer-programming challenge ran from Jan. 22 to May 14 and featured 66 researchers on 25 teams based at DOE national laboratories and other U.S. government research labs. The teams received data that simulated radiation sources scanned by a vehicle-mounted detector system traveling along urban streets.
Teams were scored on the success of the algorithm or algorithms they developed in sleuthing the time slots marking the detection of the simulated radioactive hotspots likely associated with potentially hazardous nuclear materials, and in ruling out common sources of radiation. Government agencies have previously conducted many field tests and performed comparisons to study how to find radioactive materials in different settings but this was the first online competition.
Tenzing Joshi, an applied nuclear physicist in Berkeley Lab's Nuclear Science Division, led the winning team in this Urban Radiological Search Competition created by the DOE's National Nuclear Security Administration. His teammates included Mark Bandstra, a senior scientific engineering associate, and UC Berkeley graduate student Kyle Bilton.
Berkeley Lab served as the host institution for the competition, which was also administered by Los Alamos National Laboratory and Oak Ridge National Laboratory. A group of researchers in Berkeley Lab's Computational Research, Nuclear Science, and Information Technology divisions teamed with other researchers in the Nuclear Science Division to develop the competition platform.
Berkeley Lab organizers leveraged the Lab's capabilities in support of big data science to develop and roll out the competition platform, and they hope it can be used in other data analytics challenges. The Lab's Applied Nuclear Physics program has been developing a variety of mobile and portable detector systems to quickly identify radiation sources.
"We had a pretty healthy lead on the public scoreboard, but it turned out to be incredibly close," Joshi says. The Berkeley Lab team had submitted about 204 entries over the course of the competition, and their final submission—which they sent in the final 20 minutes of the competition—put them ahead of the second-place team by just 1.3 points.
"It was down to the wire," says Brian Quiter, an applied nuclear physicist in Berkeley Lab's Nuclear Science Division who managed the competition and led the development of the platform with Shreyas Cholia, who heads up a software systems group in Berkeley Lab's Computational Research Division.
Statistical science teams from Los Alamos and Lawrence Livermore national labs placed second and third in the competition, respectively.
The competition data was divided into public and private sets, and competitors did not know which was grouped into each category. For the public scoring, rankings were instantly updated for the public portion of the data sets. Meanwhile, participants' submissions covered by the private data set were scored only at the completion of the competition.
By separating the data into public and private sets and in limiting the number of algorithm submissions per team to 1,000, the competition was designed to prevent teams from "gaming" the competition. For example, if there was no cap on submissions one team might gain an advantage by submitting a huge volume of slightly varied algorithms until one of them, by chance, earns a top ranking. No team neared the submission limit, and by the end of the competition the teams had sent a total of 1,024 submissions.
"The teams were graded on false positives—whether they reported environmental 'background' radiation as human-made sources, for example—and also on the likelihood of detecting the sought-after radiation sources, the precision in time at which a particular source was reported, and on whether they could specifically identify a particular type of radioactive source based on a list of six possible sources, from weapons-grade plutonium to materials used for nuclear medicine," Quiter says.
To further complicate the challenge, there was no GPS or location-based information to inform participants about the layout of buildings, for example, and participants also had very limited information about the speed of the vehicle and the length of each of the paths traveled. The virtual streetscape used in the challenge was loosely based on real streets, and the travel time of the detector along each path in the simulated data sets varied from 45 seconds to just over 12 minutes.
"We were told that the speeds were variable, from 1 to 13 meters per second," Joshi says. "We were also told that the radiation sources could be shielded in certain situations, which changes the energy spectrum that you can measure. How you handled that in the algorithm was an important part of doing well."
Joshi says that members of his team had already been working on similar algorithms prior to the challenge. "We have been building this capability up over the last year," he says. "We have collected a lot of data from radiological measurements in urban areas and have been investigating the relationships between these measurements and the radiological compositions of our surroundings."
While teams were not limited to using a single algorithm, Joshi says that ultimately the team developed one algorithm—they named it Berkeley Anomaly Detection-Factorized Matrices, or BAD-FM—to handle most of the data, with some refinements toward handling particular phenomena such as different speeds of the vehicle within the simulations.
The model was informed by data for naturally occurring radiation—which can vary based on local geography and building materials, for example, and can fluctuate over time even at the same location—as well as data from test sources of human-made nuclear materials.
Joshi says that the computing power of his desktop computer was adequate for initial prototyping of the algorithm; for the more advanced work, the team used a single node of a Lab supercomputer. They used visualization tools, which color-coded the simulated radiation sources based on the detector's energy readings, to help interpret and analyze the data.
The top three teams in the competition—including the second- and third-place teams composed of data scientists from Los Alamos and Lawrence Livermore national labs, respectively—will be recognized at a July 11 meeting and will receive follow-up funding, Quiter says. There will also likely be a public competition.
"The next big plan is to repeat this challenge on an established open platform where we can ideally tap into a large community of data scientists," Quiter says.
The search competition was supported by the Office of Defense Nuclear Nonproliferation Research and Development, which is part of the U.S. Department of Energy's National Nuclear Security Administration.
No entries found