Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
.git
.env
.idea
.DS_Store
.pytest_cache
.venv
__pycache__
*.egg-info
dist
5 changes: 0 additions & 5 deletions .env

This file was deleted.

70 changes: 70 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# Copy or rename this file to ".env" to use it for environment variable configurations.
#
# ATTENTION: The only required environment variables are ORION_STORAGE and ORION_GRAPHS. The rest are optional and it's
# usually fine to leave them commented out or delete them, as the ORION config module will assign defaults.

# ---- Storage & Output ----

# Directory for source data downloads and ingest pipeline files
ORION_STORAGE=~/ORION_storage/

# Directory for final graph releases
ORION_GRAPHS=~/ORION_graphs/

# Directory for log files (if unset, logs go to stdout only)
# ORION_LOGS=

# Base URL utilized to generate URI identifiers utilized by metadata.
# For example, ROBOKOP graphs use https://robokop.renci.org/
# ORION_OUTPUT_URL=https://localhost

# ---- Graph Spec ----

# Local graph spec filename (set one of ORION_GRAPH_SPEC or ORION_GRAPH_SPEC_URL, not both)
# ORION_GRAPH_SPEC=example-graph-spec.yaml

# URL pointing to a remote graph spec file
# ORION_GRAPH_SPEC_URL=

# ---- Mode ----

# Enable test/debug mode (sets log level to DEBUG and runs ingests with a smaller subset of data if possible)
# ORION_TEST_MODE=false

# ---- Biolink Model ----

# Biolink model version (optional - don't set this and ORION will use the latest)
# BL_VERSION=v4.3.4

# ---- Normalization URLs ----

# Edge normalization / BioLink Lookup URL
# EDGE_NORMALIZATION_URL=https://bl-lookup-sri.renci.org

# Node normalization URL
# NODE_NORMALIZATION_URL=https://nodenormalization-sri.renci.org

# ---- LitCoin / Bagel (may be removed in the future) ----

# Name resolution service URL
# NAMERES_URL=https://name-resolution-sri.renci.org

# SapBERT service URL
# SAPBERT_URL=https://babel-sapbert.apps.renci.org

# Shared source data path for LitCoin pipeline
# SHARED_SOURCE_DATA_PATH=/tmp/shared_data

# LitCoin predicate mapping service URL
# LITCOIN_PRED_MAPPING_URL=https://pred-mapping.apps.renci.org

# Bagel service URL
# BAGEL_URL=https://bagel.apps.renci.org

# Bagel service credentials
# BAGEL_SERVICE_USERNAME=
# BAGEL_SERVICE_PASSWORD=

# OpenAI credentials for LitCoin GPT features
# OPENAI_API_KEY=
# OPENAI_API_ORGANIZATION=
15 changes: 7 additions & 8 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,29 +11,28 @@ jobs:
push_to_registry:
name: Push Docker image to GitHub Packages tagged with "latest" and version number.
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
steps:
- name: Check out the repo
uses: actions/checkout@v4
- name: Get the version
id: get_version
run: echo ::set-output name=VERSION::${GITHUB_REF/refs\/tags\//}
- name: Login to ghcr
uses: docker/login-action@f4ef78c080cd8ba55a85445d5b36e214a81df20a
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Extract metadata (tags, labels) for Docker
id: meta
uses: docker/metadata-action@9ec57ed1fcdbf14dcef7dfbe97b2010124a938b7
uses: docker/metadata-action@v5
with:
images:
ghcr.io/${{ github.repository }}
- name: Push to GitHub Packages
uses: docker/build-push-action@3b5e8027fcad23fda98b2e3ac259d8d67585f671
uses: docker/build-push-action@v6
with:
context: .
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
build-args: VERSION=${{ steps.get_version.outputs.VERSION }}
labels: ${{ steps.meta.outputs.labels }}
2 changes: 0 additions & 2 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,8 @@ jobs:
- name: create env params
run: |
echo "ROBOKOP_HOME=$PWD" >> $GITHUB_ENV
mkdir -p $PWD/tests/workspace/logs
mkdir -p $PWD/tests/workspace/storage
mkdir -p $PWD/tests/workspace/graphs
echo "ORION_LOGS=$PWD/tests/workspace/logs" >> $GITHUB_ENV
echo "ORION_STORAGE=$PWD/tests/workspace/storage" >> $GITHUB_ENV
echo "ORION_GRAPHS=$PWD/tests/workspace/graphs" >> $GITHUB_ENV

Expand Down
45 changes: 25 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,31 +42,34 @@ After installation, the following commands are available (prefix with `uv run` i

### Configuring ORION

ORION uses three directories for its data, configured via environment variables:
ORION is configured via environment variables, which can be set directly or through an `.env` file.

| Variable | Purpose |
|---|--------------------------------------|
| `ORION_STORAGE` | Data ingest pipeline storage |
| `ORION_GRAPHS` | Knowledge graph outputs |
| `ORION_LOGS` | Log files |

You can set these up manually or use the provided script:
In most cases, you can simply use this provided script to set up a local environment. It will create directories for ORION outputs next to where ORION was installed and set env vars pointing to them.

```bash
source ./set_up_test_env.sh
source ./set_up_dev_env.sh
```

#### Graph Spec
For more customization and settings, use an .env file. Copy or rename the `.env.example` file to `.env`.

A Graph Spec yaml file defines which sources to include in a knowledge graph. Set one of the following environment variables (not both):
Then uncommment and edit `.env` as desired to set values for your environment.

```bash
# Option 1: Name of a file in the graph_specs/ directory
export ORION_GRAPH_SPEC=example-graph-spec.yaml
| Variable | Purpose | Default |
|---|------------------------------------------------------------|---|
| `ORION_STORAGE` | Path to a directory for data ingest pipeline storage | (required) |
| `ORION_GRAPHS` | Path to a directory for Knowledge Graph outputs | (required) |
| `ORION_LOGS` | Path to a Log file directory (if unset, logs go to stdout) | `None` |
| `ORION_GRAPH_SPEC` | Graph Spec filename from `graph_specs/` | `example-graph-spec.yaml` |
| `ORION_GRAPH_SPEC_URL` | URL to a remote Graph Spec file | |

# Option 2: URL pointing to a Graph Spec yaml file
export ORION_GRAPH_SPEC_URL=https://stars.renci.org/var/data_services/graph_specs/default-graph-spec.yaml
```
Configuration is managed by [pydantic-settings](https://docs.pydantic.dev/latest/concepts/pydantic_settings/) — environment variables override `.env` file values, and sensible defaults are provided where possible. See `orion/config.py` for the full list of settings.

#### Graph Spec

A Graph Spec yaml file defines which sources to include in a knowledge graph. Set one of the following (not both):

- `ORION_GRAPH_SPEC` - name of a file in the `graph_specs/` directory
- `ORION_GRAPH_SPEC_URL` - URL pointing to a Graph Spec yaml file

Here is a simple Graph Spec example:

Expand Down Expand Up @@ -100,6 +103,8 @@ See the `graph_specs/` directory for more examples.

### Running with Docker

Make sure environment variables are set or an `.env` file is configured with at least `ORION_STORAGE`, and `ORION_GRAPHS` pointing to valid host directories. The compose file reads these env vars and mounts those directories as volumes in the container.

Build the image:

```bash
Expand All @@ -115,19 +120,19 @@ docker compose up
Build a specific graph:

```bash
docker compose run --rm orion orion-build Example_Graph
docker compose run orion orion-build Example_Graph
```

Run the ingest pipeline for a single data source:

```bash
docker compose run --rm orion orion-ingest DrugCentral
docker compose run orion orion-ingest DrugCentral
```

See available data sources and options:

```bash
docker compose run --rm orion orion-ingest -h
docker compose run orion orion-ingest -h
```

### Development
Expand Down
34 changes: 9 additions & 25 deletions docker-compose-worker.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,40 +5,24 @@ services:
dockerfile: Dockerfile
container_name: orion-worker
command: [celery, "-A", "celery_worker.celery_app", "worker", "--loglevel=info", "-Q", "orion"]
env_file:
- path: .env
required: false
environment:
- CELERY_BROKER_URL=redis://redis:6379/0
- CELERY_RESULT_BACKEND=redis://redis:6379/0
- SHARED_SOURCE_DATA_PATH=/tmp/shared_data
# override paths from env, use paths volumes are mounted to inside the container
- ORION_STORAGE=/ORION_storage
- ORION_GRAPHS=/ORION_graphs
- ORION_LOGS=/ORION_logs
- BAGEL_SERVICE_USERNAME=fake-username-do-not-commit-a-real-one!!!
- BAGEL_SERVICE_PASSWORD=fake-password-do-not-commit-a-real-one!!!
- ORION_GRAPH_SPEC
- ORION_GRAPH_SPEC_URL
- ORION_OUTPUT_URL
- EDGE_NORMALIZATION_ENDPOINT
- NODE_NORMALIZATION_ENDPOINT
- NAMERES_URL
- SAPBERT_URL
- LITCOIN_PRED_MAPPING_URL
- BL_VERSION
- PHAROS_DB_HOST
- PHAROS_DB_USER
- PHAROS_DB_PASSWORD
- PHAROS_DB_NAME
- PHAROS_DB_PORT
- DRUGCENTRAL_DB_HOST
- DRUGCENTRAL_DB_USER
- DRUGCENTRAL_DB_PASSWORD
- DRUGCENTRAL_DB_NAME
- DRUGCENTRAL_DB_PORT
- SHARED_SOURCE_DATA_PATH=/tmp/shared_data
# specific to celery
- CELERY_BROKER_URL=redis://redis:6379/0
- CELERY_RESULT_BACKEND=redis://redis:6379/0
volumes:
- .:/ORION
- "${SHARED_SOURCE_DATA_PATH}:/tmp/shared_data"
- "${ORION_STORAGE}:/ORION_storage"
- "${ORION_GRAPHS}:/ORION_graphs"
- "${ORION_LOGS}:/ORION_logs"
- "${SHARED_SOURCE_DATA_PATH}:/tmp/shared_data"
user: 1000:7474
networks:
- app-network
Expand Down
26 changes: 4 additions & 22 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,33 +3,15 @@ services:
build:
context: .
command: [orion-build, all]
env_file:
- path: .env
required: false
environment:
# override paths from env, use paths volumes are mounted to inside the container
- ORION_STORAGE=/ORION_storage
- ORION_GRAPHS=/ORION_graphs
- ORION_LOGS=/ORION_logs
- ORION_GRAPH_SPEC
- ORION_GRAPH_SPEC_URL
- ORION_OUTPUT_URL
- EDGE_NORMALIZATION_ENDPOINT
- NODE_NORMALIZATION_ENDPOINT
- NAMERES_URL
- SAPBERT_URL
- BL_VERSION
- PHAROS_DB_HOST
- PHAROS_DB_USER
- PHAROS_DB_PASSWORD
- PHAROS_DB_NAME
- PHAROS_DB_PORT
- DRUGCENTRAL_DB_HOST
- DRUGCENTRAL_DB_USER
- DRUGCENTRAL_DB_PASSWORD
- DRUGCENTRAL_DB_NAME
- DRUGCENTRAL_DB_PORT
volumes:
- .:/ORION
- "${ORION_STORAGE}:/ORION_storage"
- "${ORION_GRAPHS}:/ORION_graphs"
- "${ORION_LOGS}:/ORION_logs"
user: 7474:7474


8 changes: 6 additions & 2 deletions docs/ORION.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,11 @@
{
"cell_type": "code",
"id": "g6i460bvtda",
"source": "%%bash\ncd ~/ORION_root/ORION/\nsource ./set_up_test_env.sh",
"source": [
"%%bash\n",
"cd ~/ORION_root/ORION/\n",
"source ./set_up_dev_env.sh"
],
"metadata": {},
"execution_count": null,
"outputs": []
Expand Down Expand Up @@ -130,4 +134,4 @@
},
"nbformat": 4,
"nbformat_minor": 5
}
}
12 changes: 6 additions & 6 deletions helm/orion/templates/graph-builder.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -70,15 +70,15 @@ spec:
- name: BL_VERSION
value: {{ .Values.orion.normalization.bl_version }}
{{- if .Values.orion.normalization.nodeNormEndpoint }}
- name: NODE_NORMALIZATION_ENDPOINT
- name: NODE_NORMALIZATION_URL
value: {{ .Values.orion.normalization.nodeNormEndpoint }}
{{- end }}
{{- if .Values.orion.normalization.edgeNormEndpoint }}
- name: EDGE_NORMALIZATION_ENDPOINT
- name: EDGE_NORMALIZATION_URL
value: {{ .Values.orion.normalization.edgeNormEndpoint }}
{{- end }}
{{- if .Values.orion.normalization.nameResolverEndpoint }}
- name: NAMERES_ENDPOINT
- name: NAMERES_URL
value: {{ .Values.orion.normalization.nameResolverEndpoint }}
{{- end }}
{{- if .Values.orion.normalization.sapbertEndpoint }}
Expand Down Expand Up @@ -157,15 +157,15 @@ spec:
- name: BL_VERSION
value: {{ .Values.orion.normalization.bl_version }}
{{- if .Values.orion.normalization.nodeNormEndpoint }}
- name: NODE_NORMALIZATION_ENDPOINT
- name: NODE_NORMALIZATION_URL
value: {{ .Values.orion.normalization.nodeNormEndpoint }}
{{- end }}
{{- if .Values.orion.normalization.edgeNormEndpoint }}
- name: EDGE_NORMALIZATION_ENDPOINT
- name: EDGE_NORMALIZATION_URL
value: {{ .Values.orion.normalization.edgeNormEndpoint }}
{{- end }}
{{- if .Values.orion.normalization.nameResolverEndpoint }}
- name: NAMERES_ENDPOINT
- name: NAMERES_URL
value: {{ .Values.orion.normalization.nameResolverEndpoint }}
{{- end }}
{{- if .Values.orion.normalization.sapbertEndpoint }}
Expand Down
4 changes: 3 additions & 1 deletion orion/biolink_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,9 @@
from requests.adapters import HTTPAdapter, Retry
from functools import cache

BIOLINK_MODEL_VERSION = os.environ.get("BL_VERSION", "v4.3.4")
from orion.config import config

BIOLINK_MODEL_VERSION = config.BL_VERSION

def get_biolink_model_toolkit(biolink_version: str = None) -> Toolkit:
version = biolink_version if biolink_version else BIOLINK_MODEL_VERSION
Expand Down
Loading
Loading