Colloquium Details
Faculty Search Colloquium: Federated Search of Text Search Engines
Author: | Luo Si Language Technologies Institute, Carnegie Mellon University |
---|---|
Date: | February 23, 2006 |
Time: | 8:45 - Please note special morning time |
Location: | 220 Deschutes |
Host: | Dejing Dou |
Abstract
Conventional search engines such as Google or Yahoo! provide access to Web information that can be acquired easily by crawling Web links. However, much valuable information is only accessible through source-specific search interfaces. Federated search provides access to this type of hidden Web contents by providing a single interface that connects to multiple source-specific search engines.
My dissertation research addresses the three main research problems within federated search: resource representation, resource selection and results merging. New algorithms have been proposed for estimating information source sizes, estimating distributions of relevant documents across information sources for resource selection, and merging document rankings returned by selected sources. Furthermore, a unified utility maximization framework is proposed to combine the range of solutions together to construct effective systems for different federated search applications. Empirical studies in a wide range of research environments and a real world prototype system under different operating conditions have demonstrated the effectiveness of the research. This new research, supported by a more theoretical foundation, better empirical results, and more realistic simulation of real world applications, substantially improves the state-of-the-art of federated search.
Biography
Luo Si is a Ph.D. candidate at the Language Technologies Institute, a department in Carnegie Mellon's School of Computer Science. He received his M.S. and B.S. degrees in Computer Science from Tsinghua University. His research spans a range of topics in information retrieval, machine learning, text mining, speech and multimedia processing, and data mining. His recent research focuses on federated search (distributed information retrieval), probabilistic models for collaborative filtering, and text/data mining for bioinformatics. He has published more than 35 conference, journal and workshop papers.