Show Entry in Conferences and Workshops

You may return to the Main Menu when done.

Showing 1 entry


[ipdps16]
Ozog, D., Kamil, A., Zheng, Y., Hargrove, P., Hammond, J. R., Malony, A., ... & Yelick, K. (2016, May). A hartree-fock application using upc++ and the new darray library. In Parallel and Distributed Processing Symposium, 2016 IEEE International (pp. 453-462). IEEE.

Keywords: -Hartree-Fock, self-consistent field (SCF), quantum chemistry, PGAS, UPC/UPC++, Global Arrays, performance analysis, load balancing, work stealing, attentiveness

The Hartree-Fock (HF) method is the fundamental first step for incorporating quantum mechanics into manyelectron simulations of atoms and molecules, and it is an important component of computational chemistry toolkits like NWChem. The GTFock code is an HF implementation that, while it does not have all the features in NWChem, represents crucial algorithmic advances that reduce communication and improve load balance by doing an up-front static partitioning of tasks, followed by work stealing whenever necessary. To enable innovations in algorithms and exploit next generation exascale systems, it is crucial to support quantum chemistry codes using expressive and convenient programming models and runtime systems that are also efficient and scalable. This paper presents an HF implementation similar to GTFock using UPC++, a partitioned global address space model that includes flexible communication, asynchronous remote computation, and a powerful multidimensional array library. UPC++ offers runtime features that are useful for HF such as active messages, a rich calculus for array operations, hardwaresupported fetch-and-add, and functions for ensuring asynchronous runtime progress. We present a new distributed array abstraction, DArray, that is convenient for the kinds of randomaccess array updates and linear algebra operations on blockdistributed arrays with irregular data ownership. We analyze the performance of atomic fetch-and-add operations (relevant for load balancing) and runtime attentiveness, then compare various techniques and optimizations for each. Our optimized implementation of HF using UPC++ and the DArrays library shows up to 20% improvement over GTFock with Global Arrays at scales up to 24,000 cores.

URLs:

Modified:
Created: Fri Jan 20 14:42:27 2017


Current Collection: Conferences and Workshops
[ Menu | List | Show | About ]

Return to the ParaDucks Research Group Publications page.