A Scalable Observation System for Introspection and In Situ Analytics
Chad Wood, Sudhanshu Sane, Daniel Ellsworth, Alfredo Gimenez, Kevin Huck, Todd Gamblin, Allen Malony
Committee: Allen Malony (chair), Joseph Sventek, Hank Childs
Directed Research Project(Sep 2016)
Keywords: hpc; exascale; in situ; performance; monitoring; introspection; monalytics; scientific workflow; sos; sosflow;

SOS is a new model for the online in situ characterization and analysis of complex high-performance computing applications. SOS employs a data framework with distributed information management and structured query and access capabilities. The primary design objectives of SOS are flexibility, scalability, and programmability. SOS provides a complete framework that can be configured with and used directly by an application, allowing for a detailed workflow analysis of scientific applications. This paper describes the model of SOS and the experiments used to validate and explore the performance characteristics of its implementation in SOSflow. Experimental results demonstrate that SOS is capable of observation, introspection, feedback and control of complex high-performance applications, and that it has desirable scaling properties.