Distributed Memory Processing of Very Large Graphs
Sara Riazi
Committee: Boyana Norris (chair), Dejing Dou, Stephen Fickas, Peter Ralph
Masters Thesis(May 2024)
Keywords:

Big graphs such as social networks or the internet network, biological networks, knowledge graphs appear in many domains. However, processing these graphs rely on the accessibility of high-performance frameworks which are able to handle these large graphs. One aspect of this accessibility is the usability of the frameworks for a broad community of researches who do not have sufficient expertise to work with these frameworks. To address this issue, we introduce GraphFlow framework, a workflow-based framework that provides several graph mining components. GraphFlow benefits from data-parallel Apache Spark and its GraphX library, as the back-end, so it processes very large graphs. GraphFlow also supports the construction of experiment pipelines that involve running several components.

Integrated into our GraphFlow framework, we also introduce a novel vertex- centric network embedding algorithm, which can learn low-dimensional vectors for vertices of very large graphs. Our network embedding algorithm can scale to graphs with billions of edges, while previous algorithms do not scale to the graphs of this scale.

GraphFlow also supports dynamic graphs using graph snapshots and batch updates. We provide SSSPIncJoint, a novel algorithm for computing single-source shortest paths (SSSP) for dynamic graphs. SSSPIncJoint is significantly more efficient than running SSSP for each snapshot of a dynamic graph.