You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+2
Original file line number
Diff line number
Diff line change
@@ -6,6 +6,8 @@
6
6
7
7
- Added support for multi-vector collection in Qdrant driver.
8
8
- Added a `Pipeline.stream` method to stream pipeline progress.
9
+
- Added a new semantic match resolver to the KG Builder for entity resolution based on spaCy embeddings and cosine similarities so that nodes with similar textual properties get merged.
10
+
- Added a new fuzzy match resolver to the KG Builder for entity resolution based on RapiFuzz string fuzzy matching.
Copy file name to clipboardExpand all lines: docs/source/index.rst
+4-1
Original file line number
Diff line number
Diff line change
@@ -99,7 +99,10 @@ List of extra dependencies:
99
99
- **qdrant**: store vectors in Qdrant
100
100
- **experimental**: experimental features mainly from the Knowledge Graph creation pipelines.
101
101
- Warning: this requires `pygraphviz`. Installation instructions can be found `here <https://pygraphviz.github.io/documentation/stable/install.html>`_.
102
-
102
+
- nlp:
103
+
- **spaCy**: load spaCy trained models for nlp pipelines, used by `SpaCySemanticMatchResolver` component from the Knowledge Graph creation pipelines.
104
+
- fuzzy-matching:
105
+
- **rapidfuzz**: apply fuzzy matching using string similarity, used by `FuzzyMatchResolver` component from the Knowledge Graph creation pipelines.
Copy file name to clipboardExpand all lines: docs/source/user_guide_kg_builder.rst
+16-5
Original file line number
Diff line number
Diff line change
@@ -1028,22 +1028,33 @@ without making assumptions about entity similarity. The Entity Resolver
1028
1028
is responsible for refining the created knowledge graph by merging entity
1029
1029
nodes that represent the same real-world object.
1030
1030
1031
-
In practice, this package implements a simple resolver that merges nodes
1032
-
with the same label and identical "name" property.
1031
+
In practice, this package implements three resolvers:
1032
+
1033
+
- a simple resolver that merges nodes with the same label and identical "name" property;
1034
+
- two similarity-based resolvers that merge nodes with the same label and similar set of textual properties (by default they use the "name" property):
1035
+
1036
+
- a semantic match resolver, which is based on spaCy embeddings and cosine similarities of embedding vectors. This resolver is ideal for higher quality KG resolution using static embeddings.
1037
+
- a fuzzy match resolver, which is based on RapidFuzz for Rapid fuzzy string matching using the Levenshtein Distance. This resolver offers faster ingestion speeds by using string similarity measures, at the potential cost of resolution precision.
1033
1038
1034
1039
.. warning::
1035
1040
1036
-
The `SinglePropertyExactMatchResolver` **replaces** the nodes created by the KG writer.
1041
+
- The `SinglePropertyExactMatchResolver`, `SpaCySemanticMatchResolver`, and `FuzzyMatchResolver` **replace** the nodes created by the KG writer.
1042
+
1043
+
- Check the :ref:`installation` section to make sure you have the required dependencies installed when using `SpaCySemanticMatchResolver`, and `FuzzyMatchResolver`.
1037
1044
1038
1045
1039
-
It can be used like this:
1046
+
The resolvers can be used like this:
1040
1047
1041
1048
.. code:: python
1042
1049
1043
1050
from neo4j_graphrag.experimental.components.resolver import (
0 commit comments