Add notebook for visualizing projections with PyVis

adamnsch · adamnsch · commit a696278c8def · 2024-10-15T16:19:08.000+02:00
diff --git a/README.md b/README.md
@@ -85,6 +85,7 @@ Full end-to-end examples in Jupyter ready-to-run notebooks can be found in the [
 * [Load data to a projected graph via graph construction](examples/load-data-via-graph-construction.ipynb)
 * [Heterogeneous Node Classification with HashGNN and Autotuning](https://github.com/neo4j/graph-data-science-client/tree/main/examples/heterogeneous-node-classification-with-hashgnn.ipynb)
 * [Perform inference using pre-trained KGE models](examples/kge-predict-transe-pyg-train.ipynb)
+* [Visualize GDS Projections with PyVis](examples/visualize-with-pyvis.ipynb)
 
 
 ## Documentation
diff --git a/doc/modules/ROOT/pages/tutorials/visualize-with-pyvis.adoc b/doc/modules/ROOT/pages/tutorials/visualize-with-pyvis.adoc
@@ -0,0 +1,201 @@
+// DO NOT EDIT - AsciiDoc file generated automatically
+
+= GDS Projection Visualization with PyVis
+
+
+https://colab.research.google.com/github/neo4j/graph-data-science-client/blob/main/examples/import-sample-export-gnn.ipynb[image:https://colab.research.google.com/assets/colab-badge.svg[Open
+In Colab]]
+
+
+This Jupyter notebook is hosted
+https://github.com/neo4j/graph-data-science-client/blob/main/examples/visualize-with-pyvis.ipynb[here]
+in the Neo4j Graph Data Science Client Github repository.
+
+The notebook exemplifies how to visualize a graph projection in the GDS
+Graph Catalog using the `graphdatascience`
+(https://neo4j.com/docs/graph-data-science-client/current/[docs]) and
+`pyvis` (https://pyvis.readthedocs.io/en/latest/index.html[docs])
+libraries.
+
+== Prerequisites
+
+Running this notebook requires a Neo4j server with GDS installed. We
+recommend using Neo4j Desktop with GDS, or AuraDS.
+
+Also required are of course the Python libraries `graphdatascience` and
+`pyvis`:
+
+[source, python, role=no-test]
+----
+%pip install graphdatascience pyvis
+----
+
+== Setup
+
+We start by importing our dependencies and setting up our GDS client
+connection to the database.
+
+[source, python, role=no-test]
+----
+from graphdatascience import GraphDataScience
+import os
+from pyvis.network import Network
+----
+
+[source, python, role=no-test]
+----
+# Get Neo4j DB URI, credentials and name from environment if applicable
+NEO4J_URI = os.environ.get("NEO4J_URI", "bolt://localhost:7687")
+NEO4J_AUTH = None
+NEO4J_DB = os.environ.get("NEO4J_DB", "neo4j")
+if os.environ.get("NEO4J_USER") and os.environ.get("NEO4J_PASSWORD"):
+    NEO4J_AUTH = (
+        os.environ.get("NEO4J_USER"),
+        os.environ.get("NEO4J_PASSWORD"),
+    )
+gds = GraphDataScience(NEO4J_URI, auth=NEO4J_AUTH, database=NEO4J_DB)
+----
+
+== Sampling Cora
+
+Next we use the
+https://neo4j.com/docs/graph-data-science-client/current/common-datasets/#_cora[built-in
+Cora loader] to get the data into GDS. The nodes in the Cora dataset is
+represented by academic papers, and the relationships connecting them
+are citations.
+
+We will then sample a smaller representative subgraph from it that is
+more suitable for visualization.
+
+[source, python, role=no-test]
+----
+G = gds.graph.load_cora()
+----
+
+Let’s make sure we constructed the correct graph.
+
+[source, python, role=no-test]
+----
+print(f"Metadata for our loaded Cora graph `G`: {G}")
+print(f"Node labels present in `G`: {G.node_labels()}")
+----
+
+It’s looks correct! Now let’s go ahead and sample the graph.
+
+We use the random walk with restarts sampling algorithm to get a smaller
+graph that structurally represents the full graph. In this example we
+will use the algorithm’s default parameters, but check out
+https://neo4j.com/docs/graph-data-science/current/management-ops/graph-creation/sampling/rwr/[the
+algorithm’s docs] to see how you can for example specify the size of the
+subgraph, and choose which start node around which the subgraph will be
+sampled.
+
+[source, python, role=no-test]
+----
+G_sample, _ = gds.alpha.graph.sample.rwr("cora_sample", G, randomSeed=42, concurrency=1)
+----
+
+We should have somewhere around 0.15 * 2708 ~ 406 nodes in our sample.
+And let’s see how many relationships we got.
+
+[source, python, role=no-test]
+----
+print(f"Number of nodes in our sample: {G_sample.node_count()}")
+print(f"Number of relationships in our sample: {G_sample.relationship_count()}")
+----
+
+Let’s also compute
+https://neo4j.com/docs/graph-data-science/current/algorithms/page-rank/[PageRank]
+on our sample graph, in order to get an importance score that we call
+``rank'' for each node. It will be interesting for context when we
+visualize the graph.
+
+[source, python, role=no-test]
+----
+gds.pageRank.mutate(G_sample, mutateProperty="rank")
+----
+
+== Exporting the sampled Cora graph
+
+We can now export the topology and node properties of our sampled graph
+that we want to visualize.
+
+Let’s start by fetching the relationships.
+
+[source, python, role=no-test]
+----
+sample_topology_df = gds.beta.graph.relationships.stream(G_sample)
+display(sample_topology_df)
+----
+
+We get the right amount of rows, one for each expected relationship. So
+that looks good.
+
+Next we should fetch the node properties we are interested in. Each node
+will have a ``subject'' property which will be an integer 0,…,6 that
+indicates which of seven academic subjects the paper represented by the
+nodes belong to. We will also fetch the PageRank property ``rank'' that
+we computed above.
+
+[source, python, role=no-test]
+----
+sample_node_properties_df = gds.graph.nodeProperties.stream(
+    G_sample,
+    ["subject", "rank"],
+    separate_property_columns=True,
+)
+display(sample_node_properties_df)
+----
+
+Now that we have all the data we want to visualize, we can create a
+network with PyVis. We color each node according to its ``subject'', and
+size it according to its ``rank''.
+
+[source, python, role=no-test]
+----
+net = Network(notebook = True,
+cdn_resources="remote",
+                bgcolor = "#222222",
+                font_color = "white",
+                height = "750px", # Modify according to your screen size
+                width = "100%",
+)
+
+# Seven suitable light colors, one for each "subject"
+subject_to_color = ["#80cce9", "#fbd266", "#a9eebc", "#e53145", "#d2a6e2", "#f3f3f3", "#ff91af"]
+
+# Add all the nodes
+for _, node in sample_node_properties_df.iterrows():
+    net.add_node(int(node["nodeId"]), color=subject_to_color[int(node["subject"])], value=node["rank"])
+
+# Add all the relationships
+net.add_edges(zip(sample_topology_df["sourceNodeId"], sample_topology_df["targetNodeId"]))
+
+net.show("cora-sample.html")
+----
+
+Unsuprisingly we can see that papers largely seem clustered by academic
+subject. We also note that some nodes appear larger in size, indicating
+that they have a higher centrality score according to PageRank.
+
+We can scroll over the graphic to zoom in/out, and ``click and drag''
+the background to navigate to different parts of the network. If we
+click on a node, it will be highlighted along with the relationships
+connected to it. And if we ``click and drag'' a node, we can move it.
+
+Additionally one could enable more sophisticated navigational features
+for searching and filtering by providing `select_menu = True` and
+`filter_menu = True` respectively to the PyVis `Network` constructor
+above. Check out the
+https://pyvis.readthedocs.io/en/latest/index.html[PyVis documentation]
+for this.
+
+== Cleanup
+
+We remove the Cora graphs from the GDS graph catalog to free up memory.
+
+[source, python, role=no-test]
+----
+_ = G_sample.drop()
+_ = G.drop()
+----
diff --git a/examples/visualize-with-pyvis.ipynb b/examples/visualize-with-pyvis.ipynb