While deep learning is grabbing the headlines regarding physical simulations of the brain, particle physics also is yielding to simulations. Most importantly, the CERN (Conseil Européen pour la Recherche Nucléaire's) incredibly expensive (~$10 billion) Large Hadron Collider (LHC) quietly has been rivaled by supercomputer simulations for just ~$100 million.
The key to super-accurate physical simulations, according to Jack Dongarra, director of the Innovative Computing Laboratory at the University of Tennessee, Knoxville, is the combination of course- and fine-grain computing cores in the same supercomputer. In particular, the fastest supercomputer in the U.S., the Titan supercomputer built by Cray at the Oak Ridge National Laboratory, recently was upgraded by combining its coarse-grained AMD central processing units (CPUs) with fine-grained Nvidia graphics processing units (GPUs) to set the milestone of a sub-1% error level in simulating results from the LHC, rivaling its accuracy at 100,000th the cost.
"Think of it in comparison to the automobile industry. Computer models are now accurate and computer systems are fast enough to create the 3-D (three-dimensional) models for new cars instead of clay models, and even to perform the crash tests that used to waste resources," said Dongarra. "Now physicists, mathematicians, and computer scientists can together create simulations that advance computational science beyond the need for physical experiments.”
For instance, since the U.N.'s 1996 adoption of the Comprehensive Nuclear-Test-Ban Treaty, which prohibits nuclear explosions for both civilian and military purposes in all environments, supercomputers have been simulating the explosive power of the warheads in the aging U.S. nuclear arsenal.
The next target for supercomputers is the physical simulation of particle accelerators, massive devices that use electromagnetic fields to propel charged particles to nearly light speed and to contain them in well-defined beams. The enormous expense of such devices, and to a lesser extent the controversial possibility of their creating a black-hole that consumes the Earth, has led to them being targeted as the next field to yield to supercomputer simulations.
To be sure, there is much innovation and optimization remaining to be done to match current multiprocessing software to the architecture of the fine-grained enhanced supercomputers harboring thousands or even millions of tiny-core accelerators. Nevertheless, the proof of concept has been accomplished on the Titan and Sequoia supercomputers and software, and groups around the world are rushing to follow up.
By adding GPUs to Titan, a group of researchers explained at the International Conference for High Performance Computing, Networking, Storage and Analysis (SC16) in Salt Lake City, Utah, in November, the supercomputer can accelerate lattice quantum chromodynamics (LQCD) simulations to accurately model the results obtained from the LHC, as well as modeling results from other particle accelerators around the world.
In their peer-reviewed paper Accelerating lattice QCD multigrid on GPUs using fine-grained parallelization, the researchers wrote, “This grand challenge application is extremely computationally demanding, often consuming 10%-20% of public supercomputing cycles around the world. GPUs are very well suited to LQCD, since these computations feature a lot of trivial data parallelism, as well as having highly regular memory accesses, which lead to high bandwidth utilization…the end result is an algorithmic acceleration of the linear solver by potentially upwards of 10X."
In addition, they wrote, "It is only recently that both machines and algorithms have advanced enough to bring LQCD predictions to the sub-1% error level; e.g., LQCD calculations are now finally capable of making high-precision predictions for comparison against large-scale accelerator facilities.”
The key to their success, according to the researchers, was combining course-grained networks of 18,688 AMD Opteron CPUs (with 16 cores each) with 18,688 Nvidia Tesla K20X fine-grained GPUs (with 2,688 cores each). Together, the dual granularities allowed the team's LQCD simulations to operate on two time-scales simultaneously.
The dual granularity advantage on multi-grid (MG) lattice field simulations, including QCD, has been proven before on supercomputers which have added Intel Xeon Phi 70-core fine-grain processors, but "to the best of our knowledge there have been no publications concerning the efficient deployment of these MG algorithms for LQCD on GPUs," the researchers wrote. "MG is a dramatic example of the huge algorithmic potential of incorporating multiple scales in software infrastructure. Algorithmic gains of one to two orders of magnitude [10X-to-100X] are too compelling and must be accommodated."
The biggest problems to be overcome to realize these incredible speed-ups are non-local memory references (many-core processors work best with only local memory references). Other multi-scale architectural modifications being optimized for the most efficient run time are being incorporated into a special version of the GPU open-source software framework Compute Unified Device Architecture (CUDA), called QUDA (QCD on CUDA).
As “already precision results in lattice LQCD are competing in accuracy with the best experimental results,” the researchers wrote, this "has established LQCD as a necessary partner with experiment in the fundamental search for new physics beyond the Standard Model at the Large Hadron Collider and at nuclear and particle physics laboratories around the world."
R. Colin Johnson is a Kyoto Prize Fellow who has worked as a technology journalist for two decades.
No entries found