Twitter Community Detection: Infomap vs. Leiden

Team: Jevon Lipsey, Zahara Love, Marcos Arnold

Course: Data Structures & Algorithms Final Project

Overview

This project implements and compares two advanced community detection algorithms—Infomap and Leiden—from scratch in Python. The goal was to analyze complex social network structures within the Stanford SNAP Twitter dataset, compare the algorithmic approaches (flow compression vs. modularity optimization), and identify political echo chambers.

Key Features

From-Scratch Infomap: A Python implementation of the Infomap algorithm based on the Map Equation and information flow compression.
From-Scratch Leiden: A Python implementation of the Leiden algorithm based on modularity optimization, including the local movement, refinement, and aggregation phases.
Performance Benchmarking: Both from-scratch algorithms are benchmarked against their official, optimized library counterparts.
Visualization: Uses NetworkX and Matplotlib to visualize the detected community clusters and identify high-degree "influencer" nodes.

Algorithms Implemented

Infomap (My Contribution): Implemented in scratch_infomap.py. This method views community detection as a data compression problem. It uses a random walker model to find the network partition that provides the most compressed description (shortest codelength) of information flow.
Leiden (Team Contribution): Implemented in scratch_leiden.py. This algorithm is a refinement of the Louvain method that optimizes a "modularity" score. It guarantees that communities are well-connected and prevents the poorly-connected clusters that Louvain can sometimes produce.

Dataset

Source: Stanford SNAP Twitter Dataset
Size: 81,306 nodes and 1,768,149 edges.
Analysis: Due to the O(n^2) complexity of our from-scratch builds, we ran our final analysis on 200-node subgraphs centered on political news accounts (CNN, The New Yorker, Fox News) to test for echo chambers.

Key Findings

Our from-scratch ScratchInfomap implementation successfully identified 3 distinct communities in the 200-node political subgraph and achieved a modularity score of 0.5482 (rounded to 0.55). This is a strong result, as a modularity score greater than 0.3 is generally considered to demonstrate significant community structure. Some really interesting communities can be detected if the graph subset is modified!

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
presentation.pdf		presentation.pdf
scratch_infomap.py		scratch_infomap.py
scratch_leiden.py		scratch_leiden.py
twitter_communities.png		twitter_communities.png
twitter_graph.tar.gz		twitter_graph.tar.gz

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Twitter Community Detection: Infomap vs. Leiden

Overview

Key Features

Algorithms Implemented

Dataset

Key Findings

About

Uh oh!

Releases

Packages

Languages

jevonlipsey/twitter_cluster

Folders and files

Latest commit

History

Repository files navigation

Twitter Community Detection: Infomap vs. Leiden

Overview

Key Features

Algorithms Implemented

Dataset

Key Findings

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages