Skip to content

std.zip: allow extraction of a wider range of zip files #24137

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

wooster0
Copy link

Sometimes you have to deal with ZIP files in the wild that, although they have some invalid metadata, actually decompress just fine.

Ultimately, it really depends on the ZIP file itself.
Here's an example:
test.zip

$ unzip test.zip
Archive:  test.zip
zip.tt:  mismatching "local" filename (zip.txt),
         continuing with "central" filename version
  inflating: zip.tt
  inflating: zipup.c

As you can see, there is some kind of mismatch with one of the filenames (I purposely corrupted the ZIP file slightly and renamed a file inside it called zip.txt to zip.tt with some invalid byte between "t" and "t"), but it still extracts the files just fine and if you look inside the extracted files nothing is actually wrong.

Now try the same with std.zip:

const std = @import("std");

pub fn main() anyerror!void {
    try std.zip.extract(std.fs.cwd(), (try std.fs.cwd().openFile("test.zip", .{})).seekableStream(), .{});
}
$ zig run x.zig
error: ZipMismatchModTime
lib/std/zip.zig:467:25: 0x10fb3f6 in extract (x)
                        return error.ZipMismatchModTime;
                        ^
lib/std/zip.zig:629:23: 0x10fd47a in extract__anon_24087 (x)
        const crc32 = try entry.extract(seekable_stream, options, &filename_buf, dest);
                      ^
x.zig:4:5: 0x10f5e59 in main (x)
    try std.zip.extract(std.fs.cwd(), (try std.fs.cwd().openFile("test.zip", .{})).seekableStream(), .{});

However, when using the new option added in this PR:

const std = @import("std");

pub fn main() anyerror!void {
    try std.zip.extract(std.fs.cwd(), (try std.fs.cwd().openFile("test.zip", .{})).seekableStream(), .{ .best_effort = true });
}
$ zig run x.zig
$ ls
 test.zip   x.zig  'zip.t'$'\377''t'   zipup.c

It extracts it just fine! Just like unzip. It doesn't give the API user any warnings or anything to give to the end user though.

Now you might suggest running zip -F or whatever but remember I'm talking about any kind of ZIP file there might be in the wild where it's not convenient to have to do something like that first. The point is to extract on a best-effort basis unless actually not possible (i.e. when unzip would report an error and abort instead of just printing a warning).

So I guess what I'm saying is that std.zip should either have something like warnings just like unzip in this example and/or it should have some way to not immediately stop everything just because one little byte in the ZIP file doesn't match.

The only workaround I can think of for this is commenting out the relevant return errors in my local copy of the standard library.

So this PR attempts to fix this with a best_effort option.

@wooster0
Copy link
Author

wooster0 commented Jun 10, 2025

Alternatively, to make it stricter, maybe instead have a validation option which is a structure of booleans individually for different kinds of errors?
This way you can configure the extractor so that it works on the ZIP files that you want to extract. So you might turn off only so much validation that it is able to extract the ZIP files you want to extract.
I think this would probably be better than the all-or-nothing option that best_effort is.

@andrewrk andrewrk closed this Jun 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants