Skip Navigation

Colloquium Details

Faculty Search Colloquium: Federated Search of Text Search Engines

Author:Luo Si Language Technologies Institute, Carnegie Mellon University
Date:February 23, 2006
Time:8:45 - Please note special morning time
Location:220 Deschutes
Host:Dejing Dou

Abstract

Conventional search engines such as Google or Yahoo! provide access to Web information that can be acquired easily by crawling Web links. However, much valuable information is only accessible through source-specific search interfaces. Federated search provides access to this type of hidden Web contents by providing a single interface that connects to multiple source-specific search engines.

My dissertation research addresses the three main research problems within federated search: resource representation, resource selection and results merging. New algorithms have been proposed for estimating information source sizes, estimating distributions of relevant documents across information sources for resource selection, and merging document rankings returned by selected sources. Furthermore, a unified utility maximization framework is proposed to combine the range of solutions together to construct effective systems for different federated search applications. Empirical studies in a wide range of research environments and a real world prototype system under different operating conditions have demonstrated the effectiveness of the research. This new research, supported by a more theoretical foundation, better empirical results, and more realistic simulation of real world applications, substantially improves the state-of-the-art of federated search.

Biography

Luo Si is a Ph.D. candidate at the Language Technologies Institute, a department in Carnegie Mellon's School of Computer Science. He received his M.S. and B.S. degrees in Computer Science from Tsinghua University. His research spans a range of topics in information retrieval, machine learning, text mining, speech and multimedia processing, and data mining. His recent research focuses on federated search (distributed information retrieval), probabilistic models for collaborative filtering, and text/data mining for bioinformatics. He has published more than 35 conference, journal and workshop papers.