Large Scale Graph Mining: Visualization, Exploration, and Analysis

Welcome the page dedicated to the tutorial "Large Scale Graph Mining: Visualization, Exploration, and Analysis" to be presented at TheWebConf 2021! You will find here all the information you need to attend the tutorial. This tutorial is organized by Benjamin Ricaud, Nicolas Aspert and Volodymyr Miz.

This page is under construction.

Abstract

What happens inside social networks impacts our everyday life and is of high interest for researchers, data journalists and the general public. These networks, as well as other large online networks of pages or knowledge graphs, contain a rich but overwhelming amount of information. Due to their size and the limited API access, the extraction and analysis of information within these huge networks are challenging. In this hands-on tutorial, we propose an introduction to the data mining of large networks and the analysis of activity inside them.
The tutorial is made of two parts. The first one is an overview of key concepts in (large) graph analysis, an introduction to the main exploration tools in Python and visualization using Gephi as well as a short introduction to machine learning on graphs. It covers a basic set of important tools to start exploring large graphs. During the second part, participants will form teams and focus on a particular large real-world graph either proposed by the organizers or by the participants themselves. The exploration will be guided, alternating short presentations of techniques for the exploration of large networks, using APIs, and interactions of the organizers with the teams.

Learning objectives

Understanding the main concepts for exploring large graphs (first part). Knowing solutions to cope with a large amount of graph data (filtering neighbors randomly or make use of  node and edge attributes). Advantage and drawback and particularities of graph structures (hubs, small world). How to handle a dynamic graph. Tips for visualizing a network (Gephi). Getting familiar with the sampling and exploration of a large online graph (e.g. social network) using an API.

Program

Program part I:

  • Introduction, setting up the environment, general presentation
  • Graph exploration and data mining in graphs. 1) Graph and nodes properties (small world, hubs, centrality, page rank, degree distribution), experiments with Python module Networkx, when you have access to the full graph. 2) when the full graph is out of reach, reducing to a subgraph. Graph sampling, experiments on small toy graph models with Python library Little ball of Fur https://github.com/benedekrozemberczki/littleballoffur (Random walks, snowball sampling, Forest Fire, and more advanced Spikyball).
  • Graph visualization with Gephi. Layouts, visualizing node properties with color, size. Communities, centrality, page rank. Limits of visualization.
  • Conclusion and debriefing of Part I. Challenges, problems, data bottlenecks in large graphs and how to overcome them.

Program Part II:

  • Some Machine learning on graphs: community detection, label propagation, combining graph and data on nodes, keywords/tags in communities (TF/IDF of texts in network communities). How to apply to large graphs: relation with part I) on graph sampling.
  • Exploring online data via an API where access is limited. Example of Wikipedia and social networks (Reddit pushshift API or Twitter). Equivalence with graph sampling.

Pre-requisite

The audience should be familiar with coding in Python. Basic knowledge of Git, requests to databases, and APIs is desirable but not mandatory, as well as basic knowledge about graphs (node, edge, attributes and properties). It is highly encouraged to attend TheWebConf tutorial "learning from graph" prior to ours.