-
Notifications
You must be signed in to change notification settings - Fork 258
Added Stochastic Variability in Community Detection Algorithms #820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 4 commits
1c5e1ad
a3ac9c1
5247f7e
dc52eb3
97b7192
b6b07e9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,93 @@ | ||||||
""" | ||||||
.. _tutorials-stochastic-variability: | ||||||
|
||||||
========================================================= | ||||||
Stochastic Variability in Community Detection Algorithms | ||||||
========================================================= | ||||||
|
||||||
This example demonstrates the variability of stochastic community detection methods by analyzing the consistency of multiple partitions using similarity measures (NMI, VI, RI) on both random and structured graphs. | ||||||
|
||||||
""" | ||||||
# %% | ||||||
# Import Libraries | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please do not capitalize words without good reason.
Suggested change
|
||||||
import igraph as ig | ||||||
import numpy as np | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do not import libraries you don't use. |
||||||
import matplotlib.pyplot as plt | ||||||
import itertools | ||||||
|
||||||
# %% | ||||||
# First, we generate a graph. | ||||||
# Generates a random Erdos-Renyi graph (no clear community structure) | ||||||
def generate_random_graph(n, p): | ||||||
return ig.Graph.Erdos_Renyi(n=n, p=p) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do not omit diacritics. It is Erdős-Rényi. For clarity, do indicate that it is an Erdős-Rényi Do we really need to define new functions to generate these graphs? This function just wraps |
||||||
|
||||||
# %% | ||||||
# Generates a clustered graph with clear communities using the Stochastic Block Model (SBM) | ||||||
def generate_clustered_graph(n, clusters, intra_p, inter_p): | ||||||
block_sizes = [n // clusters] * clusters | ||||||
prob_matrix = [[intra_p if i == j else inter_p for j in range(clusters)] for i in range(clusters)] | ||||||
return ig.Graph.SBM(sum(block_sizes), prob_matrix, block_sizes) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could we simplify code the code while also make it more illustrative and use empirical network data here? You could try the karate club network, the Les Miserables network (already available in the same directory) or perhaps the famous Jazz musicians network. See which one gives a nicer result. For the random graph, let's use one that has the same vertex count and density as the empirical one. Measure the density and pass it as the |
||||||
|
||||||
# %% | ||||||
# Computes pairwise similarity (NMI, VI, RI) between partitions | ||||||
def compute_pairwise_similarity(partitions, method): | ||||||
"""Computes pairwise similarity measure between partitions.""" | ||||||
scores = [] | ||||||
for p1, p2 in itertools.combinations(partitions, 2): | ||||||
scores.append(ig.compare_communities(p1, p2, method=method)) | ||||||
return scores | ||||||
|
||||||
# %% | ||||||
# Stochastic Community Detection | ||||||
# Runs Louvain's method iteratively to generate partitions | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is called the Louvain method, not Louvain's method. Can you include a short explanation of why the result is different on each run? This is often a point of confusion for empirical researchers who are inexperienced in data analysis. This is a modularity maximization method. Since the exact maximization of modularity is NP-hard, the Louvain method uses a greedy heuristic, processing vertices in a random order. |
||||||
# Computes similarity metrics: | ||||||
def run_experiment(graph, iterations=50): | ||||||
"""Runs the stochastic method multiple times and collects community partitions.""" | ||||||
partitions = [graph.community_multilevel().membership for _ in range(iterations)] | ||||||
nmi_scores = compute_pairwise_similarity(partitions, method="nmi") | ||||||
vi_scores = compute_pairwise_similarity(partitions, method="vi") | ||||||
ri_scores = compute_pairwise_similarity(partitions, method="rand") | ||||||
return nmi_scores, vi_scores, ri_scores | ||||||
|
||||||
# %% | ||||||
# Parameters | ||||||
n_nodes = 100 | ||||||
p_random = 0.05 | ||||||
clusters = 4 | ||||||
p_intra = 0.3 # High intra-cluster connection probability | ||||||
p_inter = 0.01 # Low inter-cluster connection probability | ||||||
|
||||||
# %% | ||||||
# Generate graphs | ||||||
random_graph = generate_random_graph(n_nodes, p_random) | ||||||
clustered_graph = generate_clustered_graph(n_nodes, clusters, p_intra, p_inter) | ||||||
|
||||||
# %% | ||||||
# Run experiments | ||||||
nmi_random, vi_random, ri_random = run_experiment(random_graph) | ||||||
nmi_clustered, vi_clustered, ri_clustered = run_experiment(clustered_graph) | ||||||
|
||||||
# %% | ||||||
# Lets, plot the histograms | ||||||
fig, axes = plt.subplots(3, 2, figsize=(12, 10)) | ||||||
measures = [(nmi_random, nmi_clustered, "NMI"), (vi_random, vi_clustered, "VI"), (ri_random, ri_clustered, "RI")] | ||||||
colors = ["red", "blue", "green"] | ||||||
|
||||||
for i, (random_scores, clustered_scores, measure) in enumerate(measures): | ||||||
axes[i][0].hist(random_scores, bins=20, alpha=0.7, color=colors[i], edgecolor="black") | ||||||
axes[i][0].set_title(f"Histogram of {measure} - Random Graph") | ||||||
axes[i][0].set_xlabel(f"{measure} Score") | ||||||
axes[i][0].set_ylabel("Frequency") | ||||||
|
||||||
axes[i][1].hist(clustered_scores, bins=20, alpha=0.7, color=colors[i], edgecolor="black") | ||||||
axes[i][1].set_title(f"Histogram of {measure} - Clustered Graph") | ||||||
axes[i][1].set_xlabel(f"{measure} Score") | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you please plot the probability density instead of counts? While doesn't make a difference here, it is generally good practice, and it becomes relevant when comparing datasets of different sizes. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, please adjust the NMI and RI histograms to span the range |
||||||
|
||||||
plt.tight_layout() | ||||||
plt.show() | ||||||
|
||||||
# %% | ||||||
# The results are plotted as histograms for random vs. clustered graphs, highlighting differences in detected community structures. | ||||||
#The key reason for the inconsistency in random graphs and higher consistency in structured graphs is due to community structure strength: | ||||||
#Random Graphs: Lack clear communities, leading to unstable partitions. Stochastic algorithms detect different structures across runs, resulting in low NMI, high VI, and inconsistent RI. | ||||||
#Structured Graphs: Have well-defined communities, so detected partitions are more stable across multiple runs, leading to high NMI, low VI, and stable RI. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you please be explicit about the range and interpretation of the three measures? NMI and RI are in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please spell out the names of similarity measures. If you like, you can add the abbreviations in parentheses.