Skip to content

Commit a696278

Browse files
committed
Add notebook for visualizing projections with PyVis
1 parent 83e5d02 commit a696278

File tree

3 files changed

+535
-0
lines changed

3 files changed

+535
-0
lines changed

README.md

+1
Original file line numberDiff line numberDiff line change
@@ -85,6 +85,7 @@ Full end-to-end examples in Jupyter ready-to-run notebooks can be found in the [
8585
* [Load data to a projected graph via graph construction](examples/load-data-via-graph-construction.ipynb)
8686
* [Heterogeneous Node Classification with HashGNN and Autotuning](https://github.com/neo4j/graph-data-science-client/tree/main/examples/heterogeneous-node-classification-with-hashgnn.ipynb)
8787
* [Perform inference using pre-trained KGE models](examples/kge-predict-transe-pyg-train.ipynb)
88+
* [Visualize GDS Projections with PyVis](examples/visualize-with-pyvis.ipynb)
8889

8990

9091
## Documentation
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,201 @@
1+
// DO NOT EDIT - AsciiDoc file generated automatically
2+
3+
= GDS Projection Visualization with PyVis
4+
5+
6+
https://colab.research.google.com/github/neo4j/graph-data-science-client/blob/main/examples/import-sample-export-gnn.ipynb[image:https://colab.research.google.com/assets/colab-badge.svg[Open
7+
In Colab]]
8+
9+
10+
This Jupyter notebook is hosted
11+
https://github.com/neo4j/graph-data-science-client/blob/main/examples/visualize-with-pyvis.ipynb[here]
12+
in the Neo4j Graph Data Science Client Github repository.
13+
14+
The notebook exemplifies how to visualize a graph projection in the GDS
15+
Graph Catalog using the `graphdatascience`
16+
(https://neo4j.com/docs/graph-data-science-client/current/[docs]) and
17+
`pyvis` (https://pyvis.readthedocs.io/en/latest/index.html[docs])
18+
libraries.
19+
20+
== Prerequisites
21+
22+
Running this notebook requires a Neo4j server with GDS installed. We
23+
recommend using Neo4j Desktop with GDS, or AuraDS.
24+
25+
Also required are of course the Python libraries `graphdatascience` and
26+
`pyvis`:
27+
28+
[source, python, role=no-test]
29+
----
30+
%pip install graphdatascience pyvis
31+
----
32+
33+
== Setup
34+
35+
We start by importing our dependencies and setting up our GDS client
36+
connection to the database.
37+
38+
[source, python, role=no-test]
39+
----
40+
from graphdatascience import GraphDataScience
41+
import os
42+
from pyvis.network import Network
43+
----
44+
45+
[source, python, role=no-test]
46+
----
47+
# Get Neo4j DB URI, credentials and name from environment if applicable
48+
NEO4J_URI = os.environ.get("NEO4J_URI", "bolt://localhost:7687")
49+
NEO4J_AUTH = None
50+
NEO4J_DB = os.environ.get("NEO4J_DB", "neo4j")
51+
if os.environ.get("NEO4J_USER") and os.environ.get("NEO4J_PASSWORD"):
52+
NEO4J_AUTH = (
53+
os.environ.get("NEO4J_USER"),
54+
os.environ.get("NEO4J_PASSWORD"),
55+
)
56+
gds = GraphDataScience(NEO4J_URI, auth=NEO4J_AUTH, database=NEO4J_DB)
57+
----
58+
59+
== Sampling Cora
60+
61+
Next we use the
62+
https://neo4j.com/docs/graph-data-science-client/current/common-datasets/#_cora[built-in
63+
Cora loader] to get the data into GDS. The nodes in the Cora dataset is
64+
represented by academic papers, and the relationships connecting them
65+
are citations.
66+
67+
We will then sample a smaller representative subgraph from it that is
68+
more suitable for visualization.
69+
70+
[source, python, role=no-test]
71+
----
72+
G = gds.graph.load_cora()
73+
----
74+
75+
Let’s make sure we constructed the correct graph.
76+
77+
[source, python, role=no-test]
78+
----
79+
print(f"Metadata for our loaded Cora graph `G`: {G}")
80+
print(f"Node labels present in `G`: {G.node_labels()}")
81+
----
82+
83+
It’s looks correct! Now let’s go ahead and sample the graph.
84+
85+
We use the random walk with restarts sampling algorithm to get a smaller
86+
graph that structurally represents the full graph. In this example we
87+
will use the algorithm’s default parameters, but check out
88+
https://neo4j.com/docs/graph-data-science/current/management-ops/graph-creation/sampling/rwr/[the
89+
algorithm’s docs] to see how you can for example specify the size of the
90+
subgraph, and choose which start node around which the subgraph will be
91+
sampled.
92+
93+
[source, python, role=no-test]
94+
----
95+
G_sample, _ = gds.alpha.graph.sample.rwr("cora_sample", G, randomSeed=42, concurrency=1)
96+
----
97+
98+
We should have somewhere around 0.15 * 2708 ~ 406 nodes in our sample.
99+
And let’s see how many relationships we got.
100+
101+
[source, python, role=no-test]
102+
----
103+
print(f"Number of nodes in our sample: {G_sample.node_count()}")
104+
print(f"Number of relationships in our sample: {G_sample.relationship_count()}")
105+
----
106+
107+
Let’s also compute
108+
https://neo4j.com/docs/graph-data-science/current/algorithms/page-rank/[PageRank]
109+
on our sample graph, in order to get an importance score that we call
110+
``rank'' for each node. It will be interesting for context when we
111+
visualize the graph.
112+
113+
[source, python, role=no-test]
114+
----
115+
gds.pageRank.mutate(G_sample, mutateProperty="rank")
116+
----
117+
118+
== Exporting the sampled Cora graph
119+
120+
We can now export the topology and node properties of our sampled graph
121+
that we want to visualize.
122+
123+
Let’s start by fetching the relationships.
124+
125+
[source, python, role=no-test]
126+
----
127+
sample_topology_df = gds.beta.graph.relationships.stream(G_sample)
128+
display(sample_topology_df)
129+
----
130+
131+
We get the right amount of rows, one for each expected relationship. So
132+
that looks good.
133+
134+
Next we should fetch the node properties we are interested in. Each node
135+
will have a ``subject'' property which will be an integer 0,…,6 that
136+
indicates which of seven academic subjects the paper represented by the
137+
nodes belong to. We will also fetch the PageRank property ``rank'' that
138+
we computed above.
139+
140+
[source, python, role=no-test]
141+
----
142+
sample_node_properties_df = gds.graph.nodeProperties.stream(
143+
G_sample,
144+
["subject", "rank"],
145+
separate_property_columns=True,
146+
)
147+
display(sample_node_properties_df)
148+
----
149+
150+
Now that we have all the data we want to visualize, we can create a
151+
network with PyVis. We color each node according to its ``subject'', and
152+
size it according to its ``rank''.
153+
154+
[source, python, role=no-test]
155+
----
156+
net = Network(notebook = True,
157+
cdn_resources="remote",
158+
bgcolor = "#222222",
159+
font_color = "white",
160+
height = "750px", # Modify according to your screen size
161+
width = "100%",
162+
)
163+
164+
# Seven suitable light colors, one for each "subject"
165+
subject_to_color = ["#80cce9", "#fbd266", "#a9eebc", "#e53145", "#d2a6e2", "#f3f3f3", "#ff91af"]
166+
167+
# Add all the nodes
168+
for _, node in sample_node_properties_df.iterrows():
169+
net.add_node(int(node["nodeId"]), color=subject_to_color[int(node["subject"])], value=node["rank"])
170+
171+
# Add all the relationships
172+
net.add_edges(zip(sample_topology_df["sourceNodeId"], sample_topology_df["targetNodeId"]))
173+
174+
net.show("cora-sample.html")
175+
----
176+
177+
Unsuprisingly we can see that papers largely seem clustered by academic
178+
subject. We also note that some nodes appear larger in size, indicating
179+
that they have a higher centrality score according to PageRank.
180+
181+
We can scroll over the graphic to zoom in/out, and ``click and drag''
182+
the background to navigate to different parts of the network. If we
183+
click on a node, it will be highlighted along with the relationships
184+
connected to it. And if we ``click and drag'' a node, we can move it.
185+
186+
Additionally one could enable more sophisticated navigational features
187+
for searching and filtering by providing `select_menu = True` and
188+
`filter_menu = True` respectively to the PyVis `Network` constructor
189+
above. Check out the
190+
https://pyvis.readthedocs.io/en/latest/index.html[PyVis documentation]
191+
for this.
192+
193+
== Cleanup
194+
195+
We remove the Cora graphs from the GDS graph catalog to free up memory.
196+
197+
[source, python, role=no-test]
198+
----
199+
_ = G_sample.drop()
200+
_ = G.drop()
201+
----

0 commit comments

Comments
 (0)