Skip Navigation

Colloquium Details

Mining, mapping, modeling and crawling the Web

Author:Filippo Menczer University of Iowa
Date:March 06, 2003
Time:15:30
Location:220 Deschutes

Abstract

Can we model the scale-free distribution of Web links under realistic assumptions about the behavior of page authors? Can a Web crawler efficiently locate an unknown relevant page? These questions are receiving much attention due to their potential impact for understanding the structure of the Web and for building better search engines. This talk will discuss the semantic maps obtained by analyzing the connection between similarity functions based on text, link and semantic cues across a massive number of page pairs. These maps uncover some striking relationships. For example link probability displays a phase transition between a region where it is not determined by content and one where it decays with textual distance according to a power law. This relationship suggests a novel Web growth model that is shown to accurately predict the distribution of page degree, based on textual content and assuming only local knowledge of degree for existing pages. A similar phase transition is found between link probability and semantic distance, and both results indicate that efficient paths can be discovered by Web crawling algorithms based on textual and/or categorical cues. I will conclude by surveying a number of applications of these findings to the evaluation and design of more efficient, effective, and scalable search engines and crawlers.

Biography

Filippo Menczer is an Assistant Professor in the Department of Management Sciences at the University of Iowa, and a faculty of the gratuate program in Applied Math and Computational Sciences. After receiving a Laurea in Physics from the University of Rome in 1991, he was affiliated with the Italian National Research Council. In 1998 he received a dual Ph.D. in Computer Science and Cognitive Science from the University of California at San Diego. Dr. Menczer has been the recipient of Fulbright, Rotary Foundation, and NATO fellowships, and is a fellow-at-large of the Santa Fe Institute. Dr. Menczer and his Adaptive Agents Research Group pursue interdisciplinary research interests in Web, text, and data mining, Web intelligence, distributed information systems, adaptive agents, e-commerce, evolutionary computation, machine learning, complex systems, artificial life, and agent based computational economics. This research is supported by a CAREER Award from the National Science Foundation.