Efficiently Analyzing Large-Scale Molecular Simulation Data with Accuracy Guarantees
|Author:||Yicheng Tu University of South Florida|
|Date:||June 07, 2012|
Molecular simulations (MS) have become an integral part of molecular and structural biology by providing model descriptions for biochemical and biophysical processes at nano-scopic scales. With the tremendous development in computational hardware and software for MS, very little attention has been paid to the efficient storage, querying, and analysis of the large amount of data generated from such simulations. In this talk, I will give a briefly overview of the design and implementation of a Database-Centric Molecular Simulation (DCMS) framework under development at the University of South Florida. DCMS is built by extending the functionalities of modern database management systems (DBMS) with novel indexing structures, data placement strategies, and algorithms for processing MS-specific data analytics. I will then focus on our work related to efficient computing of an important group of such analytics - the m-body correlation functions (m-BCF). The m-BCFs contain essential information about the physical features of the simulation system and are thus the building blocks of many high-level analytics such as free energy and temperature. Brute-force computation of an m-BCF requires $O(N^m)$ time where $N$ is the size of the simulation system. In the past few years, we have developed a series of algorithms based on spatial tree structures. While the first (exact) algorithm from this series has lower time complexity than the naive solution, the running time of an approximate algorithm is completely independent to the system size. I will also introduce a few other ideas to make our solution more practical to the extent that real-time processing is possible for lower-order BCFs.
Yi-Cheng Tu received a bachelor's degree in horticulture from Beijing Agricultural University, China, and the MS and PhD degrees in computer science from Purdue University in 2003 and 2007, respectively. He is currently an assistant professor in the Department of Computer Science and Engineering at the University of South Florida, Tampa, Florida, USA. His current research addresses energy-efficient database systems, scientific data management, and high performance computing. He had also worked on data stream management systems, self-tuning databases, peer-to-peer systems, and multimedia databases. He is a member of IEEE, ACM/SIGMOD, and ASEE.