fix: implement cross-process synchronization and compressed file handling by jplfaria · Pull Request #5 · kbaseincubator/KBase_CDM_Ontologies

jplfaria · 2025-07-11T19:27:49Z

Summary

Fixed bug where compressed files (eccode.owl.gz, rhea.owl.gz, ror.owl.gz) were being re-downloaded unnecessarily
Implemented cross-process synchronization for run summary tracking
Fixed memory monitoring to track actual Java process usage
Added step details tracking with download/skip/fail counts
Fixed disk usage tracking to use output directory mount point

Changes

Compressed File Handling

Updated enhanced_download.py to properly track decompressed filenames in version info
Check actual decompressed file path when determining if download is needed
Added comprehensive unit tests for compressed file handling

Run Summary Cross-Process Sync

Implemented atomic file writes to prevent corruption
Added automatic state reload when file is modified by other processes
Ensures child processes can update summary and changes are visible to parent

Version Tracking

Added version tracking files to .gitignore to prevent git conflicts
Files now tracked: ontology_versions.json, download_history.log

Test Results

All 91 tests passing with the new functionality.

🤖 Generated with Claude Code

- Fixed issue where compressed files were being re-downloaded unnecessarily - Version tracking now correctly uses decompressed filename as key - Added unit tests for compressed file handling - Ensures .gz files are properly tracked even after decompression This fixes the bug where eccode.owl.gz, rhea.owl.gz, and ror.owl.gz were being re-downloaded despite already existing in decompressed form. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

- Added version tracking files to .gitignore to prevent git pull conflicts - Fixed cross-process run summary tracking by adding @auto_save decorators - Immediately save state when initializing summary for child process access - Fixed memory monitoring to update run summary with actual usage - Added step details tracking with download/skip/fail counts - Fixed disk usage tracking to use output directory mount point - Updated analyze_core_ontologies to return and track statistics This addresses all the issues identified in the production run where: - Ontology download counts showed as 0 - Memory usage showed as 0GB - Step details were missing - Disk usage was not tracked correctly 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

- Add atomic file writes to prevent corruption during concurrent access - Implement automatic state reload when file is modified by other processes - Add _reload_if_needed() method to sync state before critical operations - Update get_summary() to always reload from file for latest state - Add comprehensive test for cross-process synchronization behavior - Fix test expectations to match new reload behavior This ensures that child processes (like analyze_core_ontologies.py) can properly update the run summary and have their changes reflected in the parent process summary. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

- Add multiple detection methods for Java processes (name, exe path, cmdline) - Improve process type classification with ROBOT sub-commands - Add diagnostic tools for debugging memory detection issues - Fix logging to show process details when Java processes are found - Add comprehensive tests for all detection methods New tools added: - debug_memory_monitor.py: Comprehensive process analysis tool - test_memory_detection.py: Quick verification script for production This should fix the 'Task=0.0GB' issue by detecting Java processes across different environments including Docker containers. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

- Check for Java processes using ps commands - Verify Java installation - Check psutil availability - Provide installation instructions if missing This helps diagnose memory monitoring issues on systems where psutil is not yet installed. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

- create_ontology_test_package.py: Python script with full processing - create_test_package_simple.sh: Shell script using CDM pipeline tools Both scripts create a package containing: - seed.owl, modelseed.owl, and merged modelseed_unified.owl - SemanticSQL databases for each - TSV and Parquet exports of statements and entailed_edge tables - JSON conversion of seed.owl Usage: python scripts/create_ontology_test_package.py /scratch/jplfaria/ontologies_play # or bash scripts/create_test_package_simple.sh /scratch/jplfaria/ontologies_play 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

These scripts belong in a separate test-utilities branch to avoid polluting the main development branch with testing tools. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

- Replace seed + modelseed with unified seed_unified.owl.gz in ontology source - Add 7 new custom prefixes for seed.complex, seed.role, seed.subsystem - Add ModelSEED ontology relation prefixes (enables_reaction, has_role, etc.) - Keep documentation in SEED_UNIFIED_CHANGES.md for rollback reference This unified ontology consolidates seed + modelseed into a single file with enhanced ModelSEED database links and subsystem/role mappings.

jplfaria and others added 8 commits July 10, 2025 11:00

revert: remove test package scripts from dev branch

a1afe8e

These scripts belong in a separate test-utilities branch to avoid polluting the main development branch with testing tools. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

jplfaria merged commit 02324d0 into main Aug 21, 2025
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: implement cross-process synchronization and compressed file handling#5

fix: implement cross-process synchronization and compressed file handling#5
jplfaria merged 8 commits into
mainfrom
dev

jplfaria commented Jul 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jplfaria commented Jul 11, 2025

Summary

Changes

Compressed File Handling

Run Summary Cross-Process Sync

Version Tracking

Test Results

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant