Skip to content

Releases: RTXteam/RTX-KG2

KG2.10.3

24 Feb 21:28

Choose a tag to compare

Date: 2026.02.24

KG2 v2.10.3 introduces major improvements to canonicalization, build determinism, packaging, and pipeline integration. This release consolidates previously separated build stages into a unified, reproducible Snakemake workflow.


Architectural Change: Unified Build Pipeline

Historically, the build process (≤ 2.10.2) was split into two distinct phases: production of the KG2pre files, and then a separate, manual production of KG2c files.

Starting in 2.10.3:

  • Canonicalization and conflation are integrated into the main build pipeline.
  • The build now produces:
    • KG2pre files (nodes, edges)
    • KG2 normalized files (nodes, edges)
    • KG2 conflated files (nodes, edges)

Canonicalization and conflation are now performed using three scripts located in RTXteam/RTX-KG2/process. The scripts kg2pre_to_kg2c_nodes.py and kg2pre_to_kg2c_edges.py each take a local copy of the Babel SQLite database along with the corresponding KG2pre nodes or edges file and produce canonicalized versions of those files. The conflate_kg2c.py script then operates on the canonicalized nodes and edges to generate the final conflated KG2c nodes and edges files.

The new pipeline:

ETL → Merge graphs → Simplify graph → Normalize graph → Conflate graph → Deploy to PloverDB


Canonicalization Changes

Canonicalization is now fully driven by a local Babel SQLite database.

  • Babel version: babel-sqlite-20250901-p1
  • The old (bespoke) KG2c canonicalization algorithm is deprecated.
  • Babel is now the single source of truth for identifier equivalence.

Conflation is deterministic given:

  • Babel database version
  • Normalized graph inputs
  • Biolink version
  • Git tag

Graph Statistics

Normalized Graph (KG2pre)

  • Nodes: 5,964,484 (~2.7 GB gzipped)
  • Edges: 36,756,612 (~35 GB gzipped)

Canonicalized Graph (KG2c)

  • Conflated Nodes: 5,841,477 (~2.1 GB gzipped)
  • Conflated Edges: 36,277,511 (~35 GB gzipped)

Schema & Core Dependencies

  • Biolink Model version: 4.2.5
  • Babel version: babel-sqlite-20250901-p1
  • Python requirement: Python ≥ 3.12
  • Build host: kg2103build.rtx.ai
  • Build directory: /home/ubuntu/kg2-build

Major Knowledge Source Versions

  • SemMedDB: 43 (2023)
  • UMLS: 2023AA
  • ChEMBL: 35
  • DrugBank: 5.1.10
  • Ensembl: 106
  • Reactome: 93
  • UniProtKB: 2025_03
  • DrugCentral: 11012023
  • KEGG: 115.0

Deprecated Sources (Removed in v2.10.3)

  • DisGeNET
  • Guide to Pharmacology
  • Therapeutic Target Database
  • PathWhiz
  • Experimental Factor Ontology

Packaging Updates

The following components are now distributed via PyPI and used during build:

  • biolink_helper
  • stitch_proj.local_babel

Local repository copies are no longer used.


Checksums (MD5)

Canonicalized (KG2c)

  • kg2-conflated-2.10.3-edges.jsonl.gz
    40dfdf4fe14af24db19735d7ce434572
  • kg2-conflated-2.10.3-nodes.jsonl.gz
    05172c6c359c4bc412b76db3c2a35d1b

Normalized (KG2pre)

  • kg2-normalized-2.10.3-edges.jsonl.gz
    33164b17ea3d5bd23f1f3001623dc527
  • kg2-normalized-2.10.3-nodes.jsonl.gz
    cd5d1a1e691ae98fc3d2da87402e133d

Breaking / Behavioral Changes

  • Canonicalization now fully Babel-driven.
  • The old (bespoke) KG2c canonicalization algorithm is deprecated.
  • Build pipeline unified under Snakemake.
  • Python ≥ 3.12 required.
  • Deprecated sources removed.

Deployment

Artifacts from this release are deployed to PloverDB in the CI environment.

KG2.10.2

04 Apr 18:22
39f09c4

Choose a tag to compare

Update for KG2.10.2pre build

KG2.10.1

08 Sep 21:50
e6fbe9f

Choose a tag to compare

https://github.com/RTXteam/RTX-KG2/blob/master/docs/kg2-versions.md#2101:

2.10.1

Date: 2024.9.02

Counts:

  • Nodes: 8,507,201
  • Edges: 57,418,405

Issues:

Build info:

  • Biolink Model version: 4.2.1
  • InfoRes Registry version: 0.2.8
  • Build host: kg2101build.rtx.ai
  • Build directory: /home/ubuntu/kg2-build
  • Build code branch: midjuly24work
  • Neo4j endpoint CNAME: kg2endpoint-kg2-10-1.rtx.ai
  • Neo4j endpoint hostname: kg2endpoint4.rtx.ai
  • Tracking issue for the build: #408
  • Major knowledge source versions:
    • SemMedDB: 43 (2023)
    • UMLS: 2023AA
    • ChEMBL: 33
    • DrugBank: 5.1.10
    • Ensembl: 106
    • Reactome: 80
    • UniProtKB: 2024_04
    • DrugCentral: 52
    • KEGG: 111.0

KG2.10.0

12 Jul 02:01

Choose a tag to compare

https://github.com/RTXteam/RTX-KG2/blob/master/docs/kg2-versions.md#2100:

2.10.0

Date: 2024.07.11

Counts:

  • Nodes: 8,566,249
  • Edges: 57,650,718

Issues:

  • Issue #358
  • Issue #383 - temporary patch for DRUGBANK:drug-interaction
  • Additional issues that arose during the build: #395 (Comment)

Build info:

  • Biolink Model version: 4.2.0
  • InfoRes Registry version: 0.2.8
  • Build host: kg2100build.rtx.ai
  • Build directory: /home/ubuntu/kg2-build
  • Build code branch: kg2100build
  • Neo4j endpoint CNAME: kg2endpoint-kg2-10-0.rtx.ai
  • Neo4j endpoint hostname: kg2endpoint3.rtx.ai
  • Tracking issue for the build: #395
  • Major knowledge source versions:
    • SemMedDB: 43 (2023)
    • UMLS: 2023AA
    • ChEMBL: 33
    • DrugBank: 5.1.10
    • Ensembl: 106
    • GO annotations: 2024-6-14
    • UniProtKB: 2024_03
    • DrugCentral: 52
    • KEGG: 111.0

KG2.9.3

09 Jul 18:19
1c842b0

Choose a tag to compare

https://github.com/RTXteam/RTX-KG2/blob/master/docs/kg2-versions.md#293:

2.9.3

Date: 2024.07.03

Counts:

  • Nodes: 8,566,172
  • Edges: 57,646,688

Issues:

  • Issue #378
  • Issue #380
  • Issue #383 - included in code, but not mapped into predicates
  • Issue #385
  • Issue #389 - major code restructure
  • Issue #390 - partially done

Build info:

  • Biolink Model version: 4.2.0
  • InfoRes Registry version: 0.2.8
  • Build host: kg2erica2.rtx.ai
  • Build directory: /home/ubuntu/kg2-build
  • Build code branch: archiving2
  • Neo4j endpoint CNAME: N/A
  • Neo4j endpoint hostname: N/A
  • Tracking issue for the build: N/A
  • Major knowledge source versions:
    • SemMedDB: 43 (2023)
    • UMLS: 2023AA
    • ChEMBL: 33
    • DrugBank: 5.1.10
    • Ensembl: 106
    • GO annotations: 2024-6-14
    • UniProtKB: 2024_03
    • DrugCentral: 52
    • KEGG: 111.0

KG2.8.6

13 Sep 15:01

Choose a tag to compare

KG2.8.4

24 Jul 21:53

Choose a tag to compare

For version information, please see this document:

https://github.com/RTXteam/RTX-KG2/blob/master/kg2-versions.md#284

KG2.8.0

12 Jan 23:43

Choose a tag to compare

Compliant with Biolink 3.0.0

KG2.7.5

24 Feb 22:48
64fa422

Choose a tag to compare

Compatible with Biolink Model 2.2.11

KG2.6.5

21 Jun 17:07

Choose a tag to compare

#14 adjustment based on in-build failure