Skip to content
This repository was archived by the owner on Nov 18, 2023. It is now read-only.

Commit 26303e1

Browse files
authored
Grakn 1.5 migration (#58)
## What is the goal of this PR? - Update the documentation for KGCN - Migrate to use Grakn commit 20750ca0a46b4bc252ad81edccdfd8d8b7c46caa and Python grakn-client commit 5459d5d88a30631c5ebdac3a9b0d5ea6f184c8ae ## What are the changes implemented in this PR? - KGCN README improvements, corrections, fixes including updated diagrams - CI updates to use Grakn distributions hosted on GCP for unit, integration and end-to-end tests
1 parent 02015f5 commit 26303e1

File tree

21 files changed

+83
-76
lines changed

21 files changed

+83
-76
lines changed

.circleci/config.yml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -19,10 +19,10 @@ jobs:
1919
- run: sudo apt-get update
2020
- run: pyenv install 3.6.3
2121
- run: pyenv global 3.6.3
22-
- run: wget https://github.com/graknlabs/grakn/releases/download/v1.4.3/grakn-core-1.4.3.zip
23-
- run: unzip grakn-core-1.4.3.zip
24-
- run: nohup grakn-core-1.4.3/grakn server start
25-
- run: grakn-core-1.4.3/graql console -k test_schema -f kglib/kgcn/test_data/schema.gql
22+
- run: wget https://storage.googleapis.com/kglib/grakn-core-all-20750ca0a46b4bc252ad81edccdfd8d8b7c46caa.zip
23+
- run: unzip grakn-core-all-20750ca0a46b4bc252ad81edccdfd8d8b7c46caa.zip
24+
- run: nohup grakn-core-all/grakn server start
25+
- run: cd grakn-core-all && ./grakn console -k test_schema -f ../kglib/kgcn/test_data/schema.gql
2626
- run: bazel test //kglib/... --test_output=streamed --force_python PY3 --python_path $(which python)
2727

2828
test-deploy-pip:

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
0.1
1+
0.1a3

WORKSPACE

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,6 @@ load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_file")
4242

4343
http_file(
4444
name = "animaltrade_dist",
45-
urls = ["https://github.com/graknlabs/kglib/releases/download/v0.1a1/grakn-animaltrade.zip", # TODO How to update to the latest relase each time?
45+
urls = ["https://storage.googleapis.com/kglib/grakn-core-animaltrade-20750ca0a46b4bc252ad81edccdfd8d8b7c46caa.zip", # TODO How to update to the latest relase each time?
4646
]
4747
)

examples/BUILD

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ py_library(
3636
requirement('grakn-kglib'),
3737

3838
# Grakn deps
39-
requirement('grakn'),
39+
requirement('grakn-client'),
4040
requirement('grpcio'),
4141

4242
# TensorFlow deps

examples/kgcn/animal_trade/prediction_schema.gql

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ prediction-score sub attribute, datatype double;
2424

2525
traded-item has endangerment-level;
2626

27-
value-prediction sub relationship,
27+
value-prediction sub relation,
2828
has prediction-score,
2929
relates predicted-value,
3030
relates predicting-kgcn-model;
@@ -49,7 +49,7 @@ match $t1 isa traded-item, has endangerment-level $el1 via $r1; $el1 1; $vp1(pre
4949

5050
define
5151

52-
suspicious-activity-detection sub relationship,
52+
suspicious-activity-detection sub relation,
5353
relates suspicious-activity,
5454
relates cause-of-suspicion;
5555

examples/kgcn/animal_trade/schema.gql

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ define
6161
has unit-of-measurement,
6262
plays quantification-measurement;
6363

64-
exchange sub relationship,
64+
exchange sub relation,
6565
relates receiving-country,
6666
relates providing-country,
6767
relates exchanged-item,
@@ -80,11 +80,11 @@ define
8080
relates imported-item as exchanged-item,
8181
plays corresponding-import;
8282

83-
import-export-correspondence sub relationship,
83+
import-export-correspondence sub relation,
8484
relates corresponding-import,
8585
relates corresponding-export;
8686

87-
quantification sub relationship,
87+
quantification sub relation,
8888
relates quantified-subject,
8989
relates quantification-measurement;
9090

@@ -111,7 +111,7 @@ define
111111
plays originated-species,
112112
plays sub-taxon;
113113

114-
hierarchy sub relationship,
114+
hierarchy sub relation,
115115
relates superior,
116116
relates subordinate;
117117

@@ -131,11 +131,11 @@ define
131131
relates containing-continent as container,
132132
relates contained-country as containee;
133133

134-
species-origination sub relationship,
134+
species-origination sub relation,
135135
relates originating-country,
136136
relates originated-species;
137137

138-
taxon-membership sub relationship,
138+
taxon-membership sub relation,
139139
relates member-item,
140140
relates taxonomic-group;
141141

examples/kgcn/animal_trade/test/end_to_end_test.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -82,7 +82,7 @@ def test_end_to_end(self):
8282
'external/animaltrade_dist/file/downloaded-unzipped'])
8383

8484
# Start Grakn
85-
sub.run(['external/animaltrade_dist/file/downloaded-unzipped/grakn-animaltrade/grakn', 'server', 'start'])
85+
sub.run(['external/animaltrade_dist/file/downloaded-unzipped/grakn-core-animaltrade-1.5.0/grakn', 'server', 'start'])
8686

8787
modes = (TRAIN, EVAL)
8888

kglib/BUILD

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -138,7 +138,7 @@ py_library(
138138
srcs = glob(['__init__.py', 'kgcn/**/*.py']),
139139
deps = [
140140
# Grakn deps
141-
requirement('grakn'),
141+
requirement('grakn-client'),
142142
requirement('grpcio'),
143143

144144
# TensorFlow deps

kglib/kgcn/README.md

Lines changed: 20 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -13,15 +13,15 @@ A KGCN can be used to create vector representations, *embeddings*, of any labell
1313

1414
Often, data doesn't fit well into a tabular format. There are many benefits to storing complex and interrelated data in a knowledge graph, not least that the context of each datapoint can be stored in full.
1515

16-
However, many existing machine learning techniques rely upon an *input vector for each example*. This can make it difficult to directly apply many conventional machine learning techniques over a knowledge graph.
16+
However, many existing machine learning techniques rely upon the existence of an *input vector for each example*. Creating such a vector to represent a node in a knowledge graph is non-trivial.
1717

18-
In order to make use of the wealth of existing ideas, tools and pipelines in machine learning, we need a method of building a vector to describe a datapoint in a knowledge graph. In this way we can leverage contextual information from a knowledge graph for machine learning.
18+
In order to make use of the wealth of existing ideas, tools and pipelines in machine learning, we need a method of building these vectors. In this way we can leverage contextual information from a knowledge graph for machine learning.
1919

20-
This is what a KGCN can achieve. Given an example datapoint taken from a knowledge graph, it can examine the nodes in the vicinity of an example, its *context*. Based on this context it can determine a vector representation, an *embedding*, for that example.
20+
This is what a KGCN can achieve. Given an example node in a knowledge graph, it can examine the nodes in the vicinity of that example, its *context*. Based on this context it can determine a vector representation, an *embedding*, for that example.
2121

2222
**There are two broad learning tasks a KGCN is suitable for:**
2323

24-
**1. Supervised learning from a knowledge graph for prediction e.g. multi-class classification (currently implemented), regression, link prediction**
24+
**1. Supervised learning from a knowledge graph for prediction e.g. multi-class classification (implemented), regression, link prediction**
2525
**2. Unsupervised creation of Knowledge Graph Embeddings, e.g. for clustering and node comparison tasks**
2626

2727
![KGCN Process](readme_images/KGCN_process.png)
@@ -46,7 +46,8 @@ In order to build a *useful* representation, a KGCN needs to perform some learni
4646
The following is a template of what must be defined in order to instantiate a KGCN, optimised for a downstream learning task of multi-class classification:
4747

4848
```python
49-
import kglib.kgcn.embed.model as model
49+
import kglib.kgcn.core.model as model
50+
import kglib.kgcn.learn.classify as classify
5051
import tensorflow as tf
5152
import grakn
5253

@@ -65,10 +66,16 @@ kgcn = model.KGCN(neighbour_sample_sizes,
6566
batch_size)
6667

6768
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
68-
classifier = learn.classify.SupervisedKGCNClassifier(kgcn, optimizer, num_classes, log_dir,
69-
max_training_steps=max_training_steps)
7069

71-
training_feed_dict = classifier.get_feed_dict(session, training_things, labels=training_labels)
70+
classifier = classify.SupervisedKGCNClassifier(kgcn,
71+
optimizer,
72+
num_classes,
73+
log_dir,
74+
max_training_steps=max_training_steps)
75+
76+
training_feed_dict = classifier.get_feed_dict(session,
77+
training_things,
78+
labels=training_labels)
7279

7380
classifier.train(training_feed_dict)
7481

@@ -80,17 +87,17 @@ There is also a [full example](https://github.com/graknlabs/kglib/tree/master/ex
8087

8188
## Methodology
8289

83-
The ideology behind this project is described [here](https://blog.grakn.ai/knowledge-graph-convolutional-networks-machine-learning-over-reasoned-knowledge-9eb5ce5e0f68), and a [video of the presentation](https://youtu.be/Jx_Twc75ka0?t=368). The principles of the implementation are based on [GraphSAGE](http://snap.stanford.edu/graphsage/), from the Stanford SNAP group, heavily adapted to work over a knowledge graph. Instead of working on a typical property graph, a KGCN learns from the context of a *typed hypergraph*, **Grakn**. Additionally, it learns from facts deduced by Grakn's *automated logical reasoner*. From this point onwards some understanding of [Grakn's docs](http://dev.grakn.ai) is assumed.
90+
The ideology behind this project is described [here](https://blog.grakn.ai/knowledge-graph-convolutional-networks-machine-learning-over-reasoned-knowledge-9eb5ce5e0f68), and a [video of the presentation](https://youtu.be/Jx_Twc75ka0?t=368). The principles of the implementation are based on [GraphSAGE](http://snap.stanford.edu/graphsage/), from the Stanford SNAP group, heavily adapted to work over a knowledge graph. Instead of working on a typical property graph, a KGCN learns from contextual data stored in a *typed hypergraph*, **Grakn**. Additionally, it learns from facts deduced by Grakn's *automated logical reasoner*. From this point onwards some understanding of [Grakn's docs](http://dev.grakn.ai) is assumed.
8491

8592
Now we introduce the key components and how they interact.
8693

8794
### KGCN
8895

89-
A KGCN is responsible for deriving embeddings for a set of Things (and thereby directly learn to classify them). We start by querying Grakn to find a set of labelled examples. Following that, we gather data about the context of each example Thing. We do this by considering their *k-hop* neighbours.
96+
A KGCN is responsible for deriving embeddings for a set of Things (and thereby directly learn to classify them). We start by querying Grakn to find a set of labelled examples. Following that, we gather data about the context of each example Thing. We do this by considering their neighbours, and their neighbours' neighbours, recursively, up to K hops away.
9097

91-
![methodology](readme_images/methodology.png)We retrieve the data concerning this neighbourhood from Grakn (diagram above). This information includes the *type hierarchy*, *roles*, and *attribute* values of each neighbouring Thing encountered, and any inferred neighbours (represented above by dotted lines).
98+
![methodology](readme_images/methodology.png)We retrieve the data concerning this neighbourhood from Grakn (diagram above). This information includes the *type hierarchy*, *roles*, and *attribute value* of each neighbouring Thing encountered, and any inferred neighbours (represented above by dotted lines). This data is compiled into arrays to be ingested by a neural network.
9299

93-
Via operations Aggregate and Combine, a single vector representation is built for a Thing. This process can be chained recursively over k-hops of neighbouring Things. This builds a representation for a Thing of interest that contains information extracted from a wide context.
100+
Via operations Aggregate and Combine, a single vector representation is built for a Thing. This process can be chained recursively over *K* hops of neighbouring Things. This builds a representation for a Thing of interest that contains information extracted from a wide context.
94101

95102
![chaining](readme_images/chaining.png)
96103

@@ -104,7 +111,7 @@ In order to feed a TensorFlow neural network, we need regular array structures o
104111

105112
- Id
106113
- Type
107-
- Meta-Type (either Entity or Relationship or Attribute)
114+
- Meta-Type (either Entity or Relation or Attribute)
108115
- Data-type (if it's an attribute)
109116
- Value (if it's an attribute)
110117
- The Role that connects the example to that neighbour

kglib/kgcn/core/ingest/encode/encode.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -83,7 +83,7 @@ def __init__(self, schema_tx):
8383
"https://tfhub.dev/google/nnlm-en-dim128-with-normalization/1", 128)
8484

8585
data_types = list(neighbour.DATA_TYPE_NAMES)
86-
data_types.insert(0, NO_DATA_TYPE) # For the case where an entity or relationship is encountered
86+
data_types.insert(0, NO_DATA_TYPE) # For the case where an entity or relation is encountered
8787
data_types_traversal = {data_type: data_types for data_type in data_types}
8888

8989
# Later a hierarchy could be added to data_type meaning. e.g. long and double are both numeric

0 commit comments

Comments
 (0)