Releases: RTXteam/RTX-KG2
KG2.10.3
Date: 2026.02.24
KG2 v2.10.3 introduces major improvements to canonicalization, build determinism, packaging, and pipeline integration. This release consolidates previously separated build stages into a unified, reproducible Snakemake workflow.
Architectural Change: Unified Build Pipeline
Historically, the build process (≤ 2.10.2) was split into two distinct phases: production of the KG2pre files, and then a separate, manual production of KG2c files.
Starting in 2.10.3:
- Canonicalization and conflation are integrated into the main build pipeline.
- The build now produces:
- KG2pre files (nodes, edges)
- KG2 normalized files (nodes, edges)
- KG2 conflated files (nodes, edges)
Canonicalization and conflation are now performed using three scripts located in RTXteam/RTX-KG2/process. The scripts kg2pre_to_kg2c_nodes.py and kg2pre_to_kg2c_edges.py each take a local copy of the Babel SQLite database along with the corresponding KG2pre nodes or edges file and produce canonicalized versions of those files. The conflate_kg2c.py script then operates on the canonicalized nodes and edges to generate the final conflated KG2c nodes and edges files.
The new pipeline:
ETL → Merge graphs → Simplify graph → Normalize graph → Conflate graph → Deploy to PloverDB
Canonicalization Changes
Canonicalization is now fully driven by a local Babel SQLite database.
- Babel version:
babel-sqlite-20250901-p1 - The old (bespoke) KG2c canonicalization algorithm is deprecated.
- Babel is now the single source of truth for identifier equivalence.
Conflation is deterministic given:
- Babel database version
- Normalized graph inputs
- Biolink version
- Git tag
Graph Statistics
Normalized Graph (KG2pre)
- Nodes: 5,964,484 (~2.7 GB gzipped)
- Edges: 36,756,612 (~35 GB gzipped)
Canonicalized Graph (KG2c)
- Conflated Nodes: 5,841,477 (~2.1 GB gzipped)
- Conflated Edges: 36,277,511 (~35 GB gzipped)
Schema & Core Dependencies
- Biolink Model version: 4.2.5
- Babel version:
babel-sqlite-20250901-p1 - Python requirement: Python ≥ 3.12
- Build host:
kg2103build.rtx.ai - Build directory:
/home/ubuntu/kg2-build
Major Knowledge Source Versions
- SemMedDB: 43 (2023)
- UMLS: 2023AA
- ChEMBL: 35
- DrugBank: 5.1.10
- Ensembl: 106
- Reactome: 93
- UniProtKB: 2025_03
- DrugCentral: 11012023
- KEGG: 115.0
Deprecated Sources (Removed in v2.10.3)
- DisGeNET
- Guide to Pharmacology
- Therapeutic Target Database
- PathWhiz
- Experimental Factor Ontology
Packaging Updates
The following components are now distributed via PyPI and used during build:
biolink_helperstitch_proj.local_babel
Local repository copies are no longer used.
Checksums (MD5)
Canonicalized (KG2c)
- kg2-conflated-2.10.3-edges.jsonl.gz
40dfdf4fe14af24db19735d7ce434572 - kg2-conflated-2.10.3-nodes.jsonl.gz
05172c6c359c4bc412b76db3c2a35d1b
Normalized (KG2pre)
- kg2-normalized-2.10.3-edges.jsonl.gz
33164b17ea3d5bd23f1f3001623dc527 - kg2-normalized-2.10.3-nodes.jsonl.gz
cd5d1a1e691ae98fc3d2da87402e133d
Breaking / Behavioral Changes
- Canonicalization now fully Babel-driven.
- The old (bespoke) KG2c canonicalization algorithm is deprecated.
- Build pipeline unified under Snakemake.
- Python ≥ 3.12 required.
- Deprecated sources removed.
Deployment
Artifacts from this release are deployed to PloverDB in the CI environment.
KG2.10.2
Update for KG2.10.2pre build
KG2.10.1
https://github.com/RTXteam/RTX-KG2/blob/master/docs/kg2-versions.md#2101:
2.10.1
Date: 2024.9.02
Counts:
- Nodes: 8,507,201
- Edges: 57,418,405
Issues:
- Issue #140
- Issue #387
- Issue #388
- Issue #390
- Issue #392
- Issue #393
- Issue #398
- Issue #399
- Issue #400
- Issue #404
- Issue #405
- Additional issues that arose during the build: #408 (Comment)
Build info:
- Biolink Model version: 4.2.1
- InfoRes Registry version: 0.2.8
- Build host:
kg2101build.rtx.ai - Build directory:
/home/ubuntu/kg2-build - Build code branch:
midjuly24work - Neo4j endpoint CNAME:
kg2endpoint-kg2-10-1.rtx.ai - Neo4j endpoint hostname:
kg2endpoint4.rtx.ai - Tracking issue for the build: #408
- Major knowledge source versions:
- SemMedDB:
43 (2023) - UMLS:
2023AA - ChEMBL:
33 - DrugBank:
5.1.10 - Ensembl:
106 - Reactome:
80 - UniProtKB:
2024_04 - DrugCentral:
52 - KEGG:
111.0
- SemMedDB:
KG2.10.0
https://github.com/RTXteam/RTX-KG2/blob/master/docs/kg2-versions.md#2100:
2.10.0
Date: 2024.07.11
Counts:
- Nodes: 8,566,249
- Edges: 57,650,718
Issues:
- Issue #358
- Issue #383 - temporary patch for
DRUGBANK:drug-interaction - Additional issues that arose during the build: #395 (Comment)
Build info:
- Biolink Model version: 4.2.0
- InfoRes Registry version: 0.2.8
- Build host:
kg2100build.rtx.ai - Build directory:
/home/ubuntu/kg2-build - Build code branch:
kg2100build - Neo4j endpoint CNAME:
kg2endpoint-kg2-10-0.rtx.ai - Neo4j endpoint hostname:
kg2endpoint3.rtx.ai - Tracking issue for the build: #395
- Major knowledge source versions:
- SemMedDB:
43 (2023) - UMLS:
2023AA - ChEMBL:
33 - DrugBank:
5.1.10 - Ensembl:
106 - GO annotations:
2024-6-14 - UniProtKB:
2024_03 - DrugCentral:
52 - KEGG:
111.0
- SemMedDB:
KG2.9.3
https://github.com/RTXteam/RTX-KG2/blob/master/docs/kg2-versions.md#293:
2.9.3
Date: 2024.07.03
Counts:
- Nodes: 8,566,172
- Edges: 57,646,688
Issues:
- Issue #378
- Issue #380
- Issue #383 - included in code, but not mapped into predicates
- Issue #385
- Issue #389 - major code restructure
- Issue #390 - partially done
Build info:
- Biolink Model version: 4.2.0
- InfoRes Registry version: 0.2.8
- Build host:
kg2erica2.rtx.ai - Build directory:
/home/ubuntu/kg2-build - Build code branch:
archiving2 - Neo4j endpoint CNAME: N/A
- Neo4j endpoint hostname: N/A
- Tracking issue for the build: N/A
- Major knowledge source versions:
- SemMedDB:
43 (2023) - UMLS:
2023AA - ChEMBL:
33 - DrugBank:
5.1.10 - Ensembl:
106 - GO annotations:
2024-6-14 - UniProtKB:
2024_03 - DrugCentral:
52 - KEGG:
111.0
- SemMedDB:
KG2.8.6
KG2.8.4
For version information, please see this document:
https://github.com/RTXteam/RTX-KG2/blob/master/kg2-versions.md#284