Machine learning approaches for understanding the genetic basis of complex traits
|Author:||Su-In Lee University of Washington|
|Date:||October 25, 2012|
Humans differ in many "phenotypes" such as weight, hair color and more importantly disease susceptibility. These phenotypes are largely determined by each individual's specific genotype, stored in the 3.2 billion bases of his or her genome sequence. Deciphering the sequence by finding which sequence variations cause a certain phenotype would have a great impact. The recent advent of high-throughput genotyping methods has enabled retrieval of an individual's sequence information on a genome-wide scale. Classical approaches have focused on identifying which sequence variations are associated with a particular phenotype. However, the complexity of cellular mechanisms, through which sequence variations cause a particular phenotype, makes it difficult to directly infer such causal relationships.
In this talk, I will present statistical machine learning approaches that address these challenges by explicitly modeling the cellular mechanisms induced by sequence variations. For example, one of the approaches can take as input genome-wide expression measurements and aim to generate a finer-grained hypothesis such as "sequence variations S induces cellular processes M, which lead to changes in the phenotype P". Furthermore, we have developed a general machine learning technique, named "meta-prior algorithm", which can learn the regulatory potential of each sequence variation based on their intrinsic characteristics. This improvement helps to identify a true causal sequence variation among a number of sequence variations in the same chromosomal region. Our approaches have led to novel insights on sequence variations, and some of the hypotheses have been validated through biological experiments. Many of our machine learning techniques are generally applicable to a wide-ranging set of applications, and as an example I will present the meta-prior algorithm in the context of movie rating prediction tasks using the Netflix data set.
Su-In Lee is an Assistant Professor of Computer Science & Engineering and Genome Sciences at the University of Washington, Seattle. Her group is broadly interested in developing advanced machine learning algorithms to solve important problems in genetics and molecular biology. The goal of her current research projects can be summarized as: (1) building probabilistic models representing various levels of gene regulation; (2) inferring causal pathways from genetic and environmental influences to complex phenotypic traits such as diseases; (3) developing computational framework for personalized medicine.
She completed her PhD in Jan, 2009 under the supervision of Professor Daphne Koller at Stanford University. Su-In graduated Summa Cum Laude with a B.Sc. in Electrical Engineering and Computer Science from Korea Advanced Institute of Science and Technology.