Skip to content

Added Stochastic Variability in Community Detection Algorithms #820

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
93 changes: 93 additions & 0 deletions doc/examples_sphinx-gallery/stochastic_variability.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
"""
.. _tutorials-stochastic-variability:

=========================================================
Stochastic Variability in Community Detection Algorithms
=========================================================

This example demonstrates the variability of stochastic community detection methods by analyzing the consistency of multiple partitions using similarity measures (NMI, VI, RI) on both random and structured graphs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please spell out the names of similarity measures. If you like, you can add the abbreviations in parentheses.


"""
# %%
# Import Libraries
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not capitalize words without good reason.

Suggested change
# Import Libraries
# Import libraries:

import igraph as ig
import numpy as np
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not import libraries you don't use.

import matplotlib.pyplot as plt
import itertools

# %%
# First, we generate a graph.
# Generates a random Erdos-Renyi graph (no clear community structure)
def generate_random_graph(n, p):
return ig.Graph.Erdos_Renyi(n=n, p=p)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not omit diacritics. It is Erdős-Rényi.

For clarity, do indicate that it is an Erdős-Rényi $G(n,p)$ graph (i.e. not $G(n,m)$).

Do we really need to define new functions to generate these graphs? This function just wraps Graph.Erdos_Renyi.


# %%
# Generates a clustered graph with clear communities using the Stochastic Block Model (SBM)
def generate_clustered_graph(n, clusters, intra_p, inter_p):
block_sizes = [n // clusters] * clusters
prob_matrix = [[intra_p if i == j else inter_p for j in range(clusters)] for i in range(clusters)]
return ig.Graph.SBM(sum(block_sizes), prob_matrix, block_sizes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we simplify code the code while also make it more illustrative and use empirical network data here?

You could try the karate club network, the Les Miserables network (already available in the same directory) or perhaps the famous Jazz musicians network. See which one gives a nicer result.

For the random graph, let's use one that has the same vertex count and density as the empirical one. Measure the density and pass it as the $p$ parameter of the $G(n,p)$ model. Alternatively, measure the edge count and pass it as the $m$ parameter of the $G(n,m)$ model.


# %%
# Computes pairwise similarity (NMI, VI, RI) between partitions
def compute_pairwise_similarity(partitions, method):
"""Computes pairwise similarity measure between partitions."""
scores = []
for p1, p2 in itertools.combinations(partitions, 2):
scores.append(ig.compare_communities(p1, p2, method=method))
return scores

# %%
# Stochastic Community Detection
# Runs Louvain's method iteratively to generate partitions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is called the Louvain method, not Louvain's method.

Can you include a short explanation of why the result is different on each run? This is often a point of confusion for empirical researchers who are inexperienced in data analysis.

This is a modularity maximization method. Since the exact maximization of modularity is NP-hard, the Louvain method uses a greedy heuristic, processing vertices in a random order.

# Computes similarity metrics:
def run_experiment(graph, iterations=50):
"""Runs the stochastic method multiple times and collects community partitions."""
partitions = [graph.community_multilevel().membership for _ in range(iterations)]
nmi_scores = compute_pairwise_similarity(partitions, method="nmi")
vi_scores = compute_pairwise_similarity(partitions, method="vi")
ri_scores = compute_pairwise_similarity(partitions, method="rand")
return nmi_scores, vi_scores, ri_scores

# %%
# Parameters
n_nodes = 100
p_random = 0.05
clusters = 4
p_intra = 0.3 # High intra-cluster connection probability
p_inter = 0.01 # Low inter-cluster connection probability

# %%
# Generate graphs
random_graph = generate_random_graph(n_nodes, p_random)
clustered_graph = generate_clustered_graph(n_nodes, clusters, p_intra, p_inter)

# %%
# Run experiments
nmi_random, vi_random, ri_random = run_experiment(random_graph)
nmi_clustered, vi_clustered, ri_clustered = run_experiment(clustered_graph)

# %%
# Lets, plot the histograms
fig, axes = plt.subplots(3, 2, figsize=(12, 10))
measures = [(nmi_random, nmi_clustered, "NMI"), (vi_random, vi_clustered, "VI"), (ri_random, ri_clustered, "RI")]
colors = ["red", "blue", "green"]

for i, (random_scores, clustered_scores, measure) in enumerate(measures):
axes[i][0].hist(random_scores, bins=20, alpha=0.7, color=colors[i], edgecolor="black")
axes[i][0].set_title(f"Histogram of {measure} - Random Graph")
axes[i][0].set_xlabel(f"{measure} Score")
axes[i][0].set_ylabel("Frequency")

axes[i][1].hist(clustered_scores, bins=20, alpha=0.7, color=colors[i], edgecolor="black")
axes[i][1].set_title(f"Histogram of {measure} - Clustered Graph")
axes[i][1].set_xlabel(f"{measure} Score")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please plot the probability density instead of counts? While doesn't make a difference here, it is generally good practice, and it becomes relevant when comparing datasets of different sizes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, please adjust the NMI and RI histograms to span the range $[0,1]$, and adjust the VI histogram to have a lower bound of 0.


plt.tight_layout()
plt.show()

# %%
# The results are plotted as histograms for random vs. clustered graphs, highlighting differences in detected community structures.
#The key reason for the inconsistency in random graphs and higher consistency in structured graphs is due to community structure strength:
#Random Graphs: Lack clear communities, leading to unstable partitions. Stochastic algorithms detect different structures across runs, resulting in low NMI, high VI, and inconsistent RI.
#Structured Graphs: Have well-defined communities, so detected partitions are more stable across multiple runs, leading to high NMI, low VI, and stable RI.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please be explicit about the range and interpretation of the three measures? NMI and RI are in $[0,1]$, and larger values indicate higher similarity. VI is a distance metric, thus lower values indicate higher similarity.

87 changes: 45 additions & 42 deletions doc/source/sg_execution_times.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

Computation times
=================
**00:10.013** total execution time for 25 files **from all galleries**:
**01:51.199** total execution time for 26 files **from all galleries**:

.. container::

Expand All @@ -33,77 +33,80 @@ Computation times
- Time
- Mem (MB)
* - :ref:`sphx_glr_tutorials_visualize_cliques.py` (``../examples_sphinx-gallery/visualize_cliques.py``)
- 00:02.970
- 00:39.554
- 0.0
* - :ref:`sphx_glr_tutorials_visual_style.py` (``../examples_sphinx-gallery/visual_style.py``)
- 00:11.628
- 0.0
* - :ref:`sphx_glr_tutorials_ring_animation.py` (``../examples_sphinx-gallery/ring_animation.py``)
- 00:01.287
- 00:09.870
- 0.0
* - :ref:`sphx_glr_tutorials_cluster_contraction.py` (``../examples_sphinx-gallery/cluster_contraction.py``)
- 00:00.759
* - :ref:`sphx_glr_tutorials_delaunay-triangulation.py` (``../examples_sphinx-gallery/delaunay-triangulation.py``)
- 00:09.261
- 0.0
* - :ref:`sphx_glr_tutorials_betweenness.py` (``../examples_sphinx-gallery/betweenness.py``)
- 00:00.735
- 0.0
* - :ref:`sphx_glr_tutorials_visual_style.py` (``../examples_sphinx-gallery/visual_style.py``)
- 00:00.711
- 0.0
* - :ref:`sphx_glr_tutorials_delaunay-triangulation.py` (``../examples_sphinx-gallery/delaunay-triangulation.py``)
- 00:00.504
- 00:06.259
- 0.0
* - :ref:`sphx_glr_tutorials_configuration.py` (``../examples_sphinx-gallery/configuration.py``)
- 00:00.416
- 00:05.379
- 0.0
* - :ref:`sphx_glr_tutorials_online_user_actions.py` (``../examples_sphinx-gallery/online_user_actions.py``)
- 00:00.332
* - :ref:`sphx_glr_tutorials_cluster_contraction.py` (``../examples_sphinx-gallery/cluster_contraction.py``)
- 00:04.307
- 0.0
* - :ref:`sphx_glr_tutorials_erdos_renyi.py` (``../examples_sphinx-gallery/erdos_renyi.py``)
- 00:00.313
- 00:03.508
- 0.0
* - :ref:`sphx_glr_tutorials_connected_components.py` (``../examples_sphinx-gallery/connected_components.py``)
- 00:00.216
* - :ref:`sphx_glr_tutorials_bridges.py` (``../examples_sphinx-gallery/bridges.py``)
- 00:02.530
- 0.0
* - :ref:`sphx_glr_tutorials_complement.py` (``../examples_sphinx-gallery/complement.py``)
- 00:00.201
- 0.0
* - :ref:`sphx_glr_tutorials_generate_dag.py` (``../examples_sphinx-gallery/generate_dag.py``)
- 00:00.194
- 00:02.393
- 0.0
* - :ref:`sphx_glr_tutorials_visualize_communities.py` (``../examples_sphinx-gallery/visualize_communities.py``)
- 00:00.176
- 00:02.157
- 0.0
* - :ref:`sphx_glr_tutorials_bridges.py` (``../examples_sphinx-gallery/bridges.py``)
- 00:00.169
* - :ref:`sphx_glr_tutorials_stochastic_variability.py` (``../examples_sphinx-gallery/stochastic_variability.py``)
- 00:01.960
- 0.0
* - :ref:`sphx_glr_tutorials_spanning_trees.py` (``../examples_sphinx-gallery/spanning_trees.py``)
- 00:00.161
* - :ref:`sphx_glr_tutorials_online_user_actions.py` (``../examples_sphinx-gallery/online_user_actions.py``)
- 00:01.750
- 0.0
* - :ref:`sphx_glr_tutorials_isomorphism.py` (``../examples_sphinx-gallery/isomorphism.py``)
- 00:00.153
* - :ref:`sphx_glr_tutorials_connected_components.py` (``../examples_sphinx-gallery/connected_components.py``)
- 00:01.728
- 0.0
* - :ref:`sphx_glr_tutorials_quickstart.py` (``../examples_sphinx-gallery/quickstart.py``)
- 00:00.142
* - :ref:`sphx_glr_tutorials_isomorphism.py` (``../examples_sphinx-gallery/isomorphism.py``)
- 00:01.376
- 0.0
* - :ref:`sphx_glr_tutorials_minimum_spanning_trees.py` (``../examples_sphinx-gallery/minimum_spanning_trees.py``)
- 00:00.137
- 00:01.135
- 0.0
* - :ref:`sphx_glr_tutorials_spanning_trees.py` (``../examples_sphinx-gallery/spanning_trees.py``)
- 00:01.120
- 0.0
* - :ref:`sphx_glr_tutorials_generate_dag.py` (``../examples_sphinx-gallery/generate_dag.py``)
- 00:00.939
- 0.0
* - :ref:`sphx_glr_tutorials_quickstart.py` (``../examples_sphinx-gallery/quickstart.py``)
- 00:00.902
- 0.0
* - :ref:`sphx_glr_tutorials_simplify.py` (``../examples_sphinx-gallery/simplify.py``)
- 00:00.079
- 00:00.840
- 0.0
* - :ref:`sphx_glr_tutorials_bipartite_matching_maxflow.py` (``../examples_sphinx-gallery/bipartite_matching_maxflow.py``)
- 00:00.073
- 00:00.674
- 0.0
* - :ref:`sphx_glr_tutorials_articulation_points.py` (``../examples_sphinx-gallery/articulation_points.py``)
- 00:00.067
* - :ref:`sphx_glr_tutorials_shortest_path_visualisation.py` (``../examples_sphinx-gallery/shortest_path_visualisation.py``)
- 00:00.609
- 0.0
* - :ref:`sphx_glr_tutorials_topological_sort.py` (``../examples_sphinx-gallery/topological_sort.py``)
- 00:00.058
* - :ref:`sphx_glr_tutorials_articulation_points.py` (``../examples_sphinx-gallery/articulation_points.py``)
- 00:00.396
- 0.0
* - :ref:`sphx_glr_tutorials_bipartite_matching.py` (``../examples_sphinx-gallery/bipartite_matching.py``)
- 00:00.058
- 00:00.370
- 0.0
* - :ref:`sphx_glr_tutorials_shortest_path_visualisation.py` (``../examples_sphinx-gallery/shortest_path_visualisation.py``)
- 00:00.052
* - :ref:`sphx_glr_tutorials_topological_sort.py` (``../examples_sphinx-gallery/topological_sort.py``)
- 00:00.319
- 0.0
* - :ref:`sphx_glr_tutorials_maxflow.py` (``../examples_sphinx-gallery/maxflow.py``)
- 00:00.052
- 00:00.234
- 0.0
Loading