Skip to content

Conversation

benjeffery
Copy link
Member

@benjeffery benjeffery commented Sep 17, 2025

Fixes #95
Metadata and strings were being stored in zarr as int8. This was fine for json as it uses ascii codes below 128. However for struct the bytes can take the full 0-255 range.

Files written with old tszip are not readable without changing their encoding on disk.

@codecov-commenter
Copy link

codecov-commenter commented Sep 17, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 96.80%. Comparing base (6407cc1) to head (a7f0cfb).
⚠️ Report is 7 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #115      +/-   ##
==========================================
- Coverage   97.32%   96.80%   -0.53%     
==========================================
  Files           6        7       +1     
  Lines         337      375      +38     
  Branches       56       60       +4     
==========================================
+ Hits          328      363      +35     
- Misses          7        8       +1     
- Partials        2        4       +2     
Flag Coverage Δ
tszip ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@benjeffery benjeffery marked this pull request as ready for review September 17, 2025 11:53
@jeromekelleher
Copy link
Member

I think we need a bit more on nailing down what the blast radius is here.

What happens to existing files, written with current versions of tszip?

Was this an write-time bug (fails when trying to write non-ascii strings) or a read-time bug (fails when trying to read) or a silent data corruption issue (you get back garbage with no error raised)?

Non-ascii strings are probably rare in practice but we do need more details here.

@benjeffery benjeffery force-pushed the fix-metadata branch 4 times, most recently from 9a11916 to 69b6745 Compare September 17, 2025 14:02
@benjeffery
Copy link
Member Author

@jeromekelleher Good point - I've made it load the old boken files and made a more detailed changelog.

@jeromekelleher
Copy link
Member

jeromekelleher commented Sep 17, 2025

These examples are quite porky, adding 6MB to the repo. Do they need to be this big?

The CHANGELOG message needs a bit more work, still not entirely clear to me the consequences of this bug

@benjeffery
Copy link
Member Author

Reading that changelog today it looks like the ramblings of a madman, sorry! I've cleaned it up and trimmed the test file to 50KB.

@benjeffery benjeffery added this pull request to the merge queue Sep 18, 2025
Merged via the queue into tskit-dev:main with commit 66c9286 Sep 18, 2025
9 checks passed
@benjeffery benjeffery deleted the fix-metadata branch September 18, 2025 08:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tszip can't decompress with struct metadata
3 participants