Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corrupted squashfs when running mksquashfs with zstd compression. #306

Open
gs0510 opened this issue Mar 12, 2025 · 4 comments
Open

Corrupted squashfs when running mksquashfs with zstd compression. #306

gs0510 opened this issue Mar 12, 2025 · 4 comments
Assignees
Labels
triage issue being assessed

Comments

@gs0510
Copy link

gs0510 commented Mar 12, 2025

Hi! We have been using mksquashfs with zstd compression, and after a few iterations of compression + we see a corruption in squashfs file. The error message is:

zstd uncompress failed with error code 10

FATAL ERROR: writer: failed to read/uncompress file temp/node_modules/.pnpm/[email protected]/node_modules/typescript/lib/typescriptServices.d.ts

Looking at the kernel logs, we see:

Squashfs error: FATAL ERROR: Can't find a valid SQUASHFS superblock on <some-random-id>

Interestingly, this only happens after a few iterations of running mksquashfs and then using the file. And creating another squashfs file from the mounted squashfs file. This only happens for zstd and not any other compression algorithm. Would you know what could be causing this problem? Thank you!

squashfs-tools version: 4.6

zstd version: 1.5.5

uncompressed cache size: 2.5 Gb

compressed cache size: 536Mb

@gs0510 gs0510 changed the title Corrupted squashfs when the Corrupted squashfs when running mksquashfs with zstd compression. Mar 12, 2025
@plougher plougher self-assigned this Mar 12, 2025
@plougher
Copy link
Owner

The error messages do not support your conclusion.

zstd uncompress failed with error code 10

FATAL ERROR: writer: failed to read/uncompress file temp/node_modules/.pnpm/[email protected]/node_modules/typescript/lib/typescriptServices.d.ts

This error message is generated by Unsquashfs, and not Mksquashfs.

Squashfs error: FATAL ERROR: Can't find a valid SQUASHFS superblock on <some-random-id>

This error message is generated by the kernel, and not Mksquashfs. It is also completely unrelated to zstd compression issues because the superblock is not compressed.

What your error messages do show is that you have a corrupted Squashfs file. What your error messages do not show is that it is Mksquashfs that corrupted them.

Before blaming Mksquashfs and filing a bug report here, you need to know and show that it is Mksquashfs at fault.

At the moment the most likely explanation is you have either

  1. A random process deleting or over-writing files after Mksquashfs wrote them
  2. I/O errors on the underlying filesystem, or disk.

What you should do before I do anything more here is:

  1. Checksum the Squashfs file immediately after Mksquashfs wrote it.
  2. When you discover corruption, check the file against the checksum.

If the checksums don't match, then the issue is not with Mksquashfs.

Marking as invalid because so far there is no evidence the issue lies with Mksquashfs.

@plougher
Copy link
Owner

It looks like you're updating or regenerating a Squashfs file by running Unsquashfs on it, and then running Mksquashfs on it.

My gut feeling here is that the Mksquashfs is over-writing the Squashfs file while the Unsquashfs is extracting it, or afterwards trying to mount it, e.g.

  1. Run "Unsquashfs SQFS.IMG"
  2. Then in parallel run "Mksquashfs some-dir SQFS.IMG"
  3. The Mksquashfs is writing to the same file as Usquashfs is reading from, which causes the Unsquashfs to fail.
  4. The script running Unsquashfs notices it has failed, and at this point SQFS.IMG will no longer have a valid SQUASHFS superblock. So if it tries to re-run Unsquashfs or mount SQFS.IMG, the superblock will be invalid.

This is basically the same as doing:

% mount -t squashfs SQFS.IMG /mnt
% mksquashfs /mnt SQFS.IMG

You can't read and output to the same Squashfs file simultaneously.

@gs0510
Copy link
Author

gs0510 commented Mar 14, 2025

Thank you for your answer. We are only see this error for zstd compressed SquashFS files. We are currently using mksquashfs with gzip compression and do not see similar errors. We’ve tried a few of the other compression algorithms as well without any errors.

We never run unsquashfs, that was only done locally to check for the error with an already faulty SquashFS file.

We create the SquashFS files with mksquashfs and read from an already mounted SquashFS that’s part of an overlay mount. We have verified that the checksum of the created file, matches the one we later use elsewhere, i.e. no mutations to the archives in storage or on the wire. When reading from a faulty SquashFS file where the error occurs, we see the following in the the kernel logs:

SQUASHFS error: Failed to read block 0x1e259616: -5

SQUASHFS error: zstd decompression error: 2

Does this help at all? Let me know if you have any more questions. Thank you.

@plougher
Copy link
Owner

Thank you for your answer. We are only see this error for zstd compressed SquashFS files. We are currently using mksquashfs with gzip compression and do not see similar errors. We’ve tried a few of the other compression algorithms as well without any errors.

If you only get the issue with zstd, then you should be raising the issue with the developers of zstd (Facebook). They will know if there's any bugs or incompatibilities with the versions of the zstd libraries used by Mksquashfs and the kernel.

SQUASHFS error: Failed to read block 0x1e259616: -5

SQUASHFS error: zstd decompression error: 2

Zstd compression error 2, means "Zstd decompression failed: Restored data doesn't match checksum"

Does this help at all? Let me know if you have any more questions. Thank you.

No, that just says the data was corrupt, it doesn't say why.

Whether it is a bug in zstd or not, the next step is to test the latest development version (by cloning the repository and compiling from source). You may be hitting a bug already fixed.

If that doesn't fix the problem, then you should do one, two or all of the following:

  1. Create a reproducer that can be used to create the faulty Squashfs file, and make it available for download. This will require a data set which given to Mksquashfs and with the correct options creates a faulty Squashfs file. The data set will need to be small as possible and not contain confidential/sensitive data. This will allow myself and/or Facebook to determine where the bug is.

  2. You can also try to isolate the problem with different data-sets to determine if there's a specific file or pattern of files that triggers the bug.

  3. You can try using different Mksquashfs options to see if the corruption goes away. The first options to try are "-no-duplicates", then "-no-fragments". You can also try varying the amount of processors used (-processors) or the memory used (-mem). "-processors 1" will use minimal parallelism, which will eliminate data races as the cause.

The problem with these kinds of bugs is it is almost impossible to find them without a reproducer that can be used to investigate the problem. Without that it is like looking for a needle in a haystack.

@plougher plougher added triage issue being assessed and removed invalid labels Mar 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage issue being assessed
Projects
None yet
Development

No branches or pull requests

2 participants