Skip Navigation

Colloquium Details

Practical Application Scheduling on Large-Scale Platforms

Author:Henri Casanova University of California, San Diego
Date:April 07, 2005
Time:3:30
Location:220 Deschutes
Host:Allen Malony

Abstract

In the last five years software technology has been developed that makes it possible to establish computing platforms that aggregate storage, network, and compute resources across multiple institutions. Such "grid" platforms ambition to support users and applications at unprecedented levels of performance, scale, and availability. While these promises have already been demonstrated in some instances, many research questions lay ahead for the widespread use of this technology. Besides the challenges involved in developing the necessary grid software infrastructure, are the two challenges of application scheduling and application deployment, which are faced daily by users of these platforms. In this presentation we highlight scheduling strategies and a production software environment, APST, that we have developed to address both these challenges. Recently, our experience with APST has led us to investigate the scheduling question for divisible loads, which are popular models for applications in many domains. Key limitations of previously proposed divisible load scheduling algorithms prevent their use in practice and we present a series of new algorithms that remove these limitations. Ultimately, our work leads to the first multi-round divisible load scheduling algorithm applicable to heterogeneous platforms that exhibit communication and computation latencies as well as performance variability. This algorithm is implemented as part of APST and we conclude the presentation by discussing its use for real-world applications on real-world platforms.

Biography:

Henri Casanova is an adjunct Assistant Professor of Computer Science and Engineering at the University of California, San Diego, an Associate Research Scientist at the San Diego Supercomputer Center, and the founder and director of the Grid Research And Innovation Laboratory (GRAIL). His research interests are in the area of parallel and distributed, with a focus on modeling, simulation, and scheduling. He obtained his B.S. from the Ecole Nationale Sup'erieure d'Electronique, d'Electrotechnique, d'Informatique et d'Hydraulique de Toulouse, France in 1993, his M.S. from the Universit'e Paul Sabatier, Toulouse, France in 1994, and his Ph.D. from the University of Tennessee, Knoxville in 1998.