Skip to content

Conversation

@JonasIsensee
Copy link
Collaborator

@JonasIsensee JonasIsensee commented Sep 24, 2025

A few words by me: This is the result of a much more refined AI coding experiment. It certainly needs more human review, but sucessfully added new features, refactoring a non-trivial amount of JLD2 internals to enable that. All previous tests still pass and new tests & docs were added. There's quite a bit of security code for detecting if external file links point somewhere malicious. I think it might be better to remove these checks for simplicity. JLD2 struct loading has been shown to allow arbitrary code execution with maliciously prepared doctored files. (also unfixably so)

Here's the AI's PR summary:

Add External Links and Soft Links Support

This PR implements comprehensive external link and soft link support for JLD2, enabling cross-file references and flexible data organization patterns while maintaining full backward compatibility and HDF5 specification compliance.

Features Added

🔗 External Links

  • Cross-file references: Link to objects in other HDF5/JLD2 files
  • Path resolution: Support for relative and absolute external file paths

🔗 Soft Links

  • Intra-file aliases: Create symbolic links within the same file
  • Path-based resolution: Absolute and relative path support

🏗️ Architecture

  • Abstract link hierarchy: HardLink, SoftLink, ExternalLink extending AbstractLink
  • Backward compatibility: All existing code works unchanged

API

# External links
create_external_link!(file, "link_name", "external_file.jld2", "/path/to/object")

# Soft links
create_soft_link!(file, "alias", "/path/to/local/object")

# Transparent access
data = file["link_name"]  # Works for all link types

Compatibility

  • HDF5 Tools: Full compatibility with h5dump, h5debug, h5py
  • Backward Compatibility: Zero breaking changes to existing JLD2 code
  • High-Level API: Works seamlessly with jldsave/load
  • Cross-Platform: Tested on multiple operating systems

Example Usage

using JLD2

# Create external data
jldsave("data.jld2"; temperature=[23.5, 24.1, 22.8], metadata="Sensor data")

# Create main file with links
jldopen("analysis.jld2", "w") do f
    f["local_results"] = [1, 2, 3]

    # External links
    create_external_link!(f, "temperature", "data.jld2", "/temperature")
    create_external_link!(f, "info", "data.jld2", "/metadata")

    # Soft links
    create_soft_link!(f, "results_alias", "/local_results")
    create_soft_link!(f, "temp_link", "/temperature")  # Points to external link
end

# Transparent access
data = load("analysis.jld2")
temperature = data["temperature"]     # Loads from external file
results = data["results_alias"]       # Resolves soft link

Breaking Changes

None - This is a purely additive feature with full backward compatibility.

@codecov
Copy link

codecov bot commented Sep 24, 2025

Codecov Report

❌ Patch coverage is 75.00000% with 48 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.98%. Comparing base (bb53223) to head (fe80145).
⚠️ Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
src/groups.jl 86.45% 13 Missing ⚠️
src/object_headers.jl 0.00% 11 Missing ⚠️
src/explicit_datasets.jl 71.42% 10 Missing ⚠️
src/external_files.jl 0.00% 10 Missing ⚠️
src/JLD2.jl 84.61% 2 Missing ⚠️
src/loadsave.jl 71.42% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #686      +/-   ##
==========================================
- Coverage   85.42%   84.98%   -0.44%     
==========================================
  Files          37       39       +2     
  Lines        4439     4576     +137     
==========================================
+ Hits         3792     3889      +97     
- Misses        647      687      +40     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@nhz2
Copy link
Member

nhz2 commented Oct 3, 2025

This PR is quite big, and I think it is important to review this code very carefully since it is AI-generated.

Is it possible to get the AI to split the PR up into smaller pieces?

For example, can soft-links and external-links be added separately?

Caching is, in general, really difficult to get right. Can this be removed from the basic feature PRs and added afterwards as a performance optimization?

@JonasIsensee
Copy link
Collaborator Author

Hi @nhz2 ,

Yeah, no worries.
I have no intention of merging it like this.
The AI code is way too verbose for my liking.

I like the fact that i got a working implementation without that much effort on my side.

It allows us to add regression tests and then improve the code from there.

I agree that the caching logic is probably BS and should be removed.

@nhz2
Copy link
Member

nhz2 commented Oct 3, 2025

Yes, it's also very cool as a proof of concept to know this and chunks can be added without making breaking changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants