Skip to content

fix: implement cross-process synchronization and compressed file handling#5

Merged
jplfaria merged 8 commits into
mainfrom
dev
Aug 21, 2025
Merged

fix: implement cross-process synchronization and compressed file handling#5
jplfaria merged 8 commits into
mainfrom
dev

Conversation

@jplfaria
Copy link
Copy Markdown
Collaborator

Summary

  • Fixed bug where compressed files (eccode.owl.gz, rhea.owl.gz, ror.owl.gz) were being re-downloaded unnecessarily
  • Implemented cross-process synchronization for run summary tracking
  • Fixed memory monitoring to track actual Java process usage
  • Added step details tracking with download/skip/fail counts
  • Fixed disk usage tracking to use output directory mount point

Changes

Compressed File Handling

  • Updated enhanced_download.py to properly track decompressed filenames in version info
  • Check actual decompressed file path when determining if download is needed
  • Added comprehensive unit tests for compressed file handling

Run Summary Cross-Process Sync

  • Implemented atomic file writes to prevent corruption
  • Added automatic state reload when file is modified by other processes
  • Ensures child processes can update summary and changes are visible to parent

Version Tracking

  • Added version tracking files to .gitignore to prevent git conflicts
  • Files now tracked: ontology_versions.json, download_history.log

Test Results

All 91 tests passing with the new functionality.

🤖 Generated with Claude Code

jplfaria and others added 8 commits July 10, 2025 11:00
- Fixed issue where compressed files were being re-downloaded unnecessarily
- Version tracking now correctly uses decompressed filename as key
- Added unit tests for compressed file handling
- Ensures .gz files are properly tracked even after decompression

This fixes the bug where eccode.owl.gz, rhea.owl.gz, and ror.owl.gz were
being re-downloaded despite already existing in decompressed form.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Added version tracking files to .gitignore to prevent git pull conflicts
- Fixed cross-process run summary tracking by adding @auto_save decorators
- Immediately save state when initializing summary for child process access
- Fixed memory monitoring to update run summary with actual usage
- Added step details tracking with download/skip/fail counts
- Fixed disk usage tracking to use output directory mount point
- Updated analyze_core_ontologies to return and track statistics

This addresses all the issues identified in the production run where:
- Ontology download counts showed as 0
- Memory usage showed as 0GB
- Step details were missing
- Disk usage was not tracked correctly

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add atomic file writes to prevent corruption during concurrent access
- Implement automatic state reload when file is modified by other processes
- Add _reload_if_needed() method to sync state before critical operations
- Update get_summary() to always reload from file for latest state
- Add comprehensive test for cross-process synchronization behavior
- Fix test expectations to match new reload behavior

This ensures that child processes (like analyze_core_ontologies.py) can
properly update the run summary and have their changes reflected in the
parent process summary.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add multiple detection methods for Java processes (name, exe path, cmdline)
- Improve process type classification with ROBOT sub-commands
- Add diagnostic tools for debugging memory detection issues
- Fix logging to show process details when Java processes are found
- Add comprehensive tests for all detection methods

New tools added:
- debug_memory_monitor.py: Comprehensive process analysis tool
- test_memory_detection.py: Quick verification script for production

This should fix the 'Task=0.0GB' issue by detecting Java processes
across different environments including Docker containers.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Check for Java processes using ps commands
- Verify Java installation
- Check psutil availability
- Provide installation instructions if missing

This helps diagnose memory monitoring issues on systems where
psutil is not yet installed.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- create_ontology_test_package.py: Python script with full processing
- create_test_package_simple.sh: Shell script using CDM pipeline tools

Both scripts create a package containing:
- seed.owl, modelseed.owl, and merged modelseed_unified.owl
- SemanticSQL databases for each
- TSV and Parquet exports of statements and entailed_edge tables
- JSON conversion of seed.owl

Usage:
  python scripts/create_ontology_test_package.py /scratch/jplfaria/ontologies_play
  # or
  bash scripts/create_test_package_simple.sh /scratch/jplfaria/ontologies_play

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
These scripts belong in a separate test-utilities branch to avoid
polluting the main development branch with testing tools.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Replace seed + modelseed with unified seed_unified.owl.gz in ontology source
- Add 7 new custom prefixes for seed.complex, seed.role, seed.subsystem
- Add ModelSEED ontology relation prefixes (enables_reaction, has_role, etc.)
- Keep documentation in SEED_UNIFIED_CHANGES.md for rollback reference

This unified ontology consolidates seed + modelseed into a single file
with enhanced ModelSEED database links and subsystem/role mappings.
@jplfaria jplfaria merged commit 02324d0 into main Aug 21, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant