Detecting nuclear proliferationNuclear weapon simulations show performance in molecular detail
U.S. researchers are perfecting simulations that show a nuclear weapon’s performance in precise molecular detail, tools that are becoming critical for national defense because international treaties forbid the detonation of nuclear test weapons
U.S. researchers are perfecting simulations that show a nuclear weapon’s performance in precise molecular detail, tools that are becoming critical for national defense because international treaties forbid the detonation of nuclear test weapons.
The simulations must be operated on supercomputers containing thousands of processors, but doing so has posed reliability and accuracy problems, said Saurabh Bagchi, an associate professor in Purdue University’s School of Electrical and Computer Engineering.
A Purdue University release reports that now researchers at Purdue and high-performance computing experts at the National Nuclear Security Administration’s (NNSA) Lawrence Livermore National Laboratory have solved several problems hindering the use of the ultra-precise simulations. NNSA is the quasi-independent agency within the U.S. Department of Energy that oversees the U.S. nuclear security activities.
The simulations, which are needed to certify nuclear weapons more efficiently, may require 100,000 machines, a level of complexity that is essential accurately to show molecular-scale reactions taking place over milliseconds, or thousandths of a second. The same types of simulations also could be used in areas such as climate modeling and studying the dynamic changes in a protein’s shape.
Such highly complex jobs must be split into many processes that execute in parallel on separate machines in large computer clusters, Bagchi said.
“Due to natural faults in the execution environment there is a high likelihood that some processing element will have an error during the application’s execution, resulting in corrupted memory or failed communication between machines,” Bagchi said. “There are bottlenecks in terms of communication and computation.”
These errors are compounded as long as the simulation continues to run before the glitch is detected and may cause simulations to stall or crash altogether.
“We are particularly concerned with errors that corrupt data silently, possibly generating incorrect results with no indication that the error has occurred,” said Bronis R. de Supinski, co-leader of the ASC Application Development Environment Performance Team at Lawrence Livermore.
“Errors that significantly reduce system performance are also a major concern since the systems on which the simulations run are very expensive.”
Advanced Simulation and Computing is the computational arm of NNSA’s Stockpile Stewardship Program, which ensures the safety, security and reliability of the U.S. nuclear deterrent without underground testing.
New findings will be detailed in a paper to be presented during the Annual IEEE/IFIP International Conference on Dependable Systems and Networks from 25 to 28 June in Boston. Recent research findings were detailed in two papers last year, one