Committee: Reza Rejaie (chair), Allen Malony, Jun Li
Directed Research Project(Jun 2009)
Keywords: sampling; graphs; online social networks
During recent years, Online Social Networks (OSNs) have evolved in many ways and attracted millions of users. The dramatic increase in the popularity of OSNs has encouraged network researchers to examine their connectivity structure. The majority of empirical studies for characterizing OSN connectivity graphs have analyzed snapshots of the system taken in different times. These snapshots are collected by measurements that crawl OSN connectivity graphs. However, OSN owners are often unwilling to expose their user information due to privacy concerns. On the other hand, because of large population and dynamics of OSNs, the task of crawling may result in an incomplete or distorted snapshot of the system.
These challenges have clearly heightened the urgency for developing efficient and accurate graph sampling techniques. Although a couple of techniques, such as Metropolized Random Walk (MRW) and Respondent-Driven Sampling (RDS), have been proposed for P2P systems, little is known about their accuracy over connectivity graph of OSNs. In this paper, we focus on MRW and RDS sampling techniques and thoroughly investigate their performance in sampling OSN systems. Our main findings can be summarized as follows:
- both techniques are sensitive to graph structure, but RDS exhibits better performance;
- heterogeneous degree distribution and high number of unbalanced edges in OSN graphs are the main factors for poor performance of MRW over such graphs;
- RDS is unable to properly sample low degree nodes which are hard to reach; and
- OSN graphs composes of a dense core in the middle and a large number of partitions hanging from the core. High internal and low external connectivity of this core make it almost infeasible for sampling techniques to explore all regions of OSN graphs.