Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,12 @@ og_scripts/
# Archived test files from semsql_custom_prefixes
semsql_custom_prefixes/archived/

# Version tracking files (these change during runs)
ontology_versions/ontology_versions.json
ontology_versions/download_history.log
ontology_versions_test/ontology_versions.json
ontology_versions_test/download_history.log

# Large ontology files
ontology_versions/backups/*.owl
ontology_versions_test/
Expand Down Expand Up @@ -89,7 +95,6 @@ logs/*
# Global log exclusion (but allow specific ones above)
*.log
!logs/cdm_ontologies_test_20250704_001300.log
!ontology_versions*/download_history.log

# Cache directory
.cache/
Expand Down
59 changes: 59 additions & 0 deletions SEED_UNIFIED_CHANGES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# Seed Unified Ontology Integration

## Changes Made

### 1. Updated Ontology Source List
**File**: `config/ontologies_source_seed_unified.txt`
- **Removed**: `seed` and `modelseed` from in-house ontologies section
- **Added**: `https://github.com/ModelSEED/ModelSEEDTemplates/raw/template_ontology/templates/ontology/ontology/seed_unified.owl.gz`

### 2. Updated Custom Prefixes
**File**: `semsql_custom_prefixes/custom_prefixes_seed_unified.csv`
- **Added**: `seed.role,https://pubseed.theseed.org/RoleEditor.cgi?page=ShowRole&Role=`
- **Added**: `seed.subsystem,https://pubseed.theseed.org/SubsysEditor.cgi?page=ShowSubsystem&subsystem=`
- **Added**: `seed.complex,https://modelseed.org/biochem/complexes/`
- **Added**: `https://modelseed.org/ontology/enables_reaction,enables_reaction`
- **Added**: `https://modelseed.org/ontology/has_role,has_role`
- **Added**: `https://modelseed.org/ontology/has_complex,has_complex`
- **Added**: `https://modelseed.org/ontology/reaction_type,reaction_type`

## Analysis of seed_unified.owl

The unified ontology contains:
- **66,961** reaction references (`seed.reaction`)
- **45,706** compound references (`seed.compound`)
- **14,197** complex references (`seed.complex`) - NEW
- **61,636** role references (`pubseed.role`) - NEW
- **1,324** subsystem references (`pubseed.subsystem`) - NEW

## File Management on Remote Machine

### Option 1: Keep old files (Recommended)
- Leave `seed.owl` and `modelseed.owl` in `ontology_data_owl/`
- They won't be processed since they're not in the source list
- Provides backup in case of issues

### Option 2: Remove old files
```bash
# On remote machine
cd /scratch/jplfaria/KBase_CDM_Ontologies/ontology_data_owl
mv seed.owl seed.owl.backup
mv modelseed.owl modelseed.owl.backup
```

## Testing the Changes

To test with the new configuration:
```bash
# Use the new config files
export ONTOLOGIES_SOURCE_FILE=config/ontologies_source_seed_unified.txt
export CUSTOM_PREFIXES_FILE=semsql_custom_prefixes/custom_prefixes_seed_unified.csv
make docker-test
```

## What This Achieves

1. **Consolidation**: Single unified ontology replaces separate seed + modelseed
2. **Enhanced Coverage**: Includes complexes, CHEBI links, and subsystem/role mappings
3. **Better Integration**: Direct URLs to source databases for better traceability
4. **Reduced Redundancy**: Eliminates potential conflicts between seed and modelseed
12 changes: 10 additions & 2 deletions cdm_ontologies/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -116,10 +116,18 @@ def run_all(args):
if summary:
summary.start_step("Analyze Core Ontologies", 1)
try:
analyze_core_ontologies(str(repo_path))
stats = analyze_core_ontologies(str(repo_path))
timestamp_print("Step 1: Completed analyzing core ontologies")
if summary:
summary.end_step("Analyze Core Ontologies", "SUCCESS")
details = {
'main_ontologies': stats.get('main_ontologies', 0) if stats else 0,
'non_base_ontologies': stats.get('non_base_ontologies', 0) if stats else 0,
'analyzed': stats.get('analyzed', 0) if stats else 0,
'downloaded': stats.get('downloaded', 0) if stats else 0,
'skipped': stats.get('skipped', 0) if stats else 0,
'failed': stats.get('failed', 0) if stats else 0
}
summary.end_step("Analyze Core Ontologies", "SUCCESS", details)
except Exception as e:
logging.error(f"Failed to analyze core ontologies: {e}")
timestamp_print(f"Step 1: Failed - {e}")
Expand Down
1 change: 1 addition & 0 deletions config/ontologies_merged_test.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,5 @@ credit.owl
envo.owl
iao-base.owl
pato-base.owl
rhea.owl
ro-base.owl
5 changes: 2 additions & 3 deletions config/ontologies_source.txt
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,6 @@ https://w3id.org/biopragmatics/resources/credit/credit.owl
https://w3id.org/biopragmatics/resources/ror/ror.owl.gz
https://w3id.org/biopragmatics/resources/interpro/interpro.owl
#In-house Ontologies (manually added to ontologies_data_owl)
seed
https://github.com/ModelSEED/ModelSEEDTemplates/raw/template_ontology/templates/ontology/ontology/seed_unified.owl.gz
metacyc
kegg
modelseed
kegg
2 changes: 2 additions & 0 deletions ontology_versions/download_history.log
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,5 @@
2025-06-25T15:55:58.797515 | envo.owl | no_change | c327bf33 | https://purl.obolibrary.org/obo/envo.owl
2025-06-25T15:56:07.389206 | go.owl | updated | c73e50c9 | https://purl.obolibrary.org/obo/go.owl
2025-06-25T15:57:45.104176 | ncbitaxon.owl | updated | 52388979 | https://purl.obolibrary.org/obo/ncbitaxon.owl
2025-07-10T10:44:39.430677 | eccode.owl.gz | error | N/A | https://purl.biopragmatics.com/pyobo/eccode/eccode.owl.gz | ERROR: HTTPSConnectionPool(host='purl.biopragmatics.com', port=443): Max retries exceeded with url: /pyobo/eccode/eccode.owl.gz (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x104069f50>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known'))
2025-07-10T10:44:40.687240 | rhea.owl | new | be24c68b | https://w3id.org/biopragmatics/resources/rhea/rhea.owl.gz
12 changes: 12 additions & 0 deletions ontology_versions/ontology_versions.json
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,18 @@
}
]
},
"rhea.owl": {
"checksum": "be24c68bfbe6b8308aca1eb895954957c257c38e82c0f9dd51e4708ceb2d6982",
"last_checked": "2025-07-10T10:44:40.686852",
"last_updated": "2025-07-10T10:44:40.686843",
"previous_checksum": null,
"remote_etag": "W/\"e125d6bd47c952c0361ee8ddee429e5f3c58dbf3ea5fd9fc2ccfec02c4354cb5",
"remote_modified": null,
"remote_size": "7836195",
"size_bytes": 0,
"url": "https://w3id.org/biopragmatics/resources/rhea/rhea.owl.gz",
"version_history": []
},
"ro-base.owl": {
"checksum": "cba7d8111c554a00244a8f42d397074a2ad093cc659652c61497313a9611ed4a",
"last_updated": "2025-06-25T15:55:51.758359",
Expand Down
44 changes: 41 additions & 3 deletions scripts/analyze_core_ontologies.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
from pathlib import Path
from datetime import datetime
import re
from enhanced_download import download_ontology_safe, get_output_directories, is_test_mode
from enhanced_download import download_ontology_safe, download_ontology_with_versioning, get_output_directories, is_test_mode

def normalize_iri(iri):
"""Normalize IRI to extract the base ontology prefix and standardize to lowercase."""
Expand Down Expand Up @@ -186,6 +186,11 @@ def analyze_core_ontologies(repo_path):
non_base_dir = os.path.join(ontology_data_path, 'non-base-ontologies')
os.makedirs(non_base_dir, exist_ok=True)

# Initialize counters
downloaded_count = 0
skipped_count = 0
failed_count = 0

# Process main directory ontologies
for entry in main_dir_ontologies:
# Check if it's a URL or a local filename
Expand All @@ -195,7 +200,14 @@ def analyze_core_ontologies(repo_path):
output_path = os.path.join(ontology_data_path, filename)

print(f"Checking core ontology: {filename}")
if not download_ontology(entry, output_path, repo_path):
success, status, message = download_ontology_with_versioning(entry, output_path, repo_path)

if status == "skipped":
skipped_count += 1
elif status in ["new", "updated"]:
downloaded_count += 1
elif not success:
failed_count += 1
print(f"⚠️ Failed to download {filename}, skipping analysis")
continue

Expand Down Expand Up @@ -273,7 +285,14 @@ def analyze_core_ontologies(repo_path):
output_path = os.path.join(non_base_dir, filename)

print(f"Checking non-base ontology: {filename}")
if not download_ontology(entry, output_path, repo_path):
success, status, message = download_ontology_with_versioning(entry, output_path, repo_path)

if status == "skipped":
skipped_count += 1
elif status in ["new", "updated"]:
downloaded_count += 1
elif not success:
failed_count += 1
print(f"⚠️ Failed to download {filename}, skipping analysis")
continue

Expand Down Expand Up @@ -357,6 +376,25 @@ def analyze_core_ontologies(repo_path):
f.write(f"{term}\n")

print("\nAnalysis complete!")

# Return statistics for summary
stats = {
'main_ontologies': len(main_dir_ontologies),
'non_base_ontologies': len(non_base_ontologies),
'analyzed': len(analysis_results),
'downloaded': downloaded_count,
'skipped': skipped_count,
'failed': failed_count
}

# Update run summary if available
from run_summary import get_summary
summary = get_summary()
if summary:
for key, value in stats.items():
summary.add_processing_result(f"core_analysis_{key}", value)

return stats

if __name__ == "__main__":
# If run directly, analyze the current directory
Expand Down
92 changes: 92 additions & 0 deletions scripts/check_java_env.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
#!/usr/bin/env python3
"""Quick environment check for Java processes without psutil dependency."""

import subprocess
import os
import sys

def check_java_processes():
"""Check for Java processes using ps command."""
print("CDM Ontologies Java Process Environment Check")
print("=" * 60)

# Check current user
print(f"Current user: {os.environ.get('USER', 'unknown')}")
print(f"Python version: {sys.version}")
print(f"Platform: {sys.platform}")

# Check if we're in Docker
if os.path.exists('/.dockerenv'):
print("Running in Docker container: YES")
else:
print("Running in Docker container: NO")

print("\n" + "-" * 60)
print("Checking for Java processes using 'ps'...")
print("-" * 60)

try:
# Try different ps commands
commands = [
("ps aux | grep -i java | grep -v grep", "Standard ps aux"),
("ps -ef | grep -i java | grep -v grep", "System V style ps"),
("pgrep -la java", "pgrep for java processes"),
]

found_any = False
for cmd, desc in commands:
print(f"\n{desc} ({cmd}):")
try:
result = subprocess.run(cmd, shell=True, capture_output=True, text=True)
if result.stdout.strip():
print(result.stdout)
found_any = True
else:
print(" No Java processes found with this command")
except Exception as e:
print(f" Error: {e}")

if not found_any:
print("\n⚠️ No Java processes found. This is normal if no Java tools are running.")
print("\nTo test with a Java process, you could:")
print("1. Start the CDM pipeline in another terminal")
print("2. Or run a simple Java command like: java -version")

except Exception as e:
print(f"Error checking processes: {e}")

# Check Java installation
print("\n" + "-" * 60)
print("Checking Java installation...")
print("-" * 60)

try:
result = subprocess.run(['java', '-version'], capture_output=True, text=True)
print("Java is installed:")
print(result.stderr if result.stderr else result.stdout)
except FileNotFoundError:
print("❌ Java not found in PATH")

# Check for psutil
print("\n" + "-" * 60)
print("Checking Python package requirements...")
print("-" * 60)

try:
import psutil
print("✅ psutil is installed (version: {})".format(psutil.__version__))
except ImportError:
print("❌ psutil is NOT installed")
print("\nTo install psutil, run one of:")
print(" pip install psutil")
print(" pip3 install psutil")
print(" python -m pip install psutil")
print(" conda install psutil (if using conda)")

# Check if we're in a virtual environment
if hasattr(sys, 'real_prefix') or (hasattr(sys, 'base_prefix') and sys.base_prefix != sys.prefix):
print("\n⚠️ You appear to be in a virtual environment.")
print("Make sure to install psutil in this environment.")

if __name__ == "__main__":
check_java_processes()
Loading