Performance Optimizations of the Tensor Contraction Engine in NWChem
David Ozog
Committee: Allen Malony (chair), Hank Childs, Boyana Norris
Directed Research Project(Jun 2014)
Keywords:

In order to understand the most fundamental properties of chemical reactions, it is necessary to simulate the electronic structure of molecular systems from first principles. These simulations are infamous for being among the most computationally and memory-intensive scientific applications of all. The parallel performance of quantum chemistry software frameworks is therefore of utmost importance. This talk considers two large-scale optimizations of the NWChem computational chemistry package. The first optimization involves hybrid static/dynamic load balancing techniques of distributed tensor contraction operations in the coupled cluster method. The second optimization incorporates a new execution model that automatically supports the overlap of communication and computation when processing a pool of remote work items. This is accomplished by transforming the typical get/compute/put work-processing model to our proposed "WorkQ" model. In this execution system, some number of on-node "producer" processes primarily do communication and the other "consumer" processes do computation, yet processes can switch roles dynamically for the sake of performance. The system is facilitated by a highly tunable node-wise FIFO message queue protocol. Finally, this talk considers how the two optimizations affect each other and their implications for the future development of computational quantum chemistry codes.