Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SquashFS superblock #308

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
117 changes: 117 additions & 0 deletions filesystem/squashfs_superblock.ksy
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
meta:
id: squashfs_superblock
title: SquashFS superblock
file-extension:
- squashfs
- snap
- sqfs
endian: le
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
endian: le
xref:
justsolve: Squashfs
wikidata: Q389314
endian: le

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not rename the field to fileformats? justsolve doesn't sound like a descriptive name for that wiki.

Copy link
Member

@generalmimon generalmimon Jan 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not rename the field to fileformats? justsolve doesn't sound like a descriptive name for that wiki.

Well, that would be a question for a new issue, but I don't think it's much beneficial. Right now, justsolve is an established identifier, so it doesn't make sense to change it arbitrarily.

In general, I don't care about the exact identifiers, it's just important to keep them consistent (so that no two different KSY specs use another names to refer to the same thing).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This justsolve contains broken links, doesn't link to the real spec and assumes that .sfs extension without any references. I am not too enthusiastic to add it as a reference. I found more relevant results with Google than digging for the justsolve keyword and following it there.

I can't say that this xref to justsolve is human friendly, and if it is for robots, then wikidata is a better way to link various related resources, IMO.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assumes that .sfs extension without any references

Fair enough.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@KOLANICH KOLANICH Mar 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW KS allows switcheable endianness.

Copy link

@AgentD AgentD Mar 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SquashFS 4.0 is strictly little endian. The older versions of SquashFS could be either.

This definition is only there for compatibility, since unsquashfs supports unpacking older formats.

There are some big endian 4.0 images floating around, typically created by cheap plastic router vendors who thought it's a great idea to hack up the format. But those usually also contain other modifications as well.

license: CC0-1.0
xref:
wikidata: Q389314
Comment on lines +10 to +11
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
xref:
wikidata: Q389314

See #308 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the above comment, I meant that the xref key should be under file-extension and above license (see the style guide), and given that I moved it above (https://github.com/kaitai-io/kaitai_struct_formats/pull/308/files#r550780445), this one can be removed.

doc: |
SquashFS is a compressed filesystem in a file, that can be read-only
accessed from low memory devices. It is popular for booting LiveCDs and
packing self-contained binaries. SquashFS format is used by Ubuntu .snap
packages. SquashFS is natively supported by Linux Kernel.
doc-ref: https://github.com/AgentD/squashfs-tools-ng/blob/master/doc/format.txt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should link to a specific revision, not to master that can change over time:

Suggested change
doc-ref: https://github.com/AgentD/squashfs-tools-ng/blob/master/doc/format.txt
doc-ref: https://github.com/AgentD/squashfs-tools-ng/blob/b76eae1/doc/format.txt

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not to the original squashfs project as well?

https://github.com/plougher/squashfs-tools/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess because https://github.com/plougher/squashfs-tools/blob/master/COPYING

Again, this is not a problem. Also, for interoperability you are allowed to look at this in many jurisdictions without any problem at all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depends heavily on court decisions on per-case basis. And I don't think that in the case where source code is available and when one explicitly claims he bases his work on that code, the court gonna decide that the work is independent and not derivative one.

Copyright trolls are copyright trolls, and choosing GPL when not forced to, is one of the markers of a copyright troll.

Also, for interoperability you are allowed to look at this in many jurisdictions without any problem at all.

It is in the jurisdictions where reversing for interoperability is fair use ... There are ones in which there is no concepts of fair use, but reversing is allowed ... with a lot of limitations .... only for the cases almost never happening in practice.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depends heavily on court decisions on per-case basis. And I don't think that in the case where source code is available and when one explicitly claims he bases his work on that code, the court gonna decide that the work is independent and not derivative one.

I would find it very hard to believe that a header file that contains no code, just definitions and a description of an on disk structure, would be copyrightable.

Copyright trolls are copyright trolls, and choosing GPL when not forced to, is one of the markers of a copyright troll.

This doesn't make sense. Who is choosing GPL here when not forced to?

Copy link
Contributor

@KOLANICH KOLANICH Mar 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who is choosing GPL here when not forced to?

Not "here", but in general. Some people prefer to license their software to GPL and other licenses designed to create legal troubles to other people. Not everyone considers it as OK. In the current legal system the best way to deal with such software and people is to avoid them.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vehemently disagree. I suggest take this discussion offline as to not clutter the comments :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adrianherrera because https://github.com/plougher/squashfs-tools/ is not a link to reference doc.

seq:
- id: superblock
type: superblock
Comment on lines +18 to +20
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not move the superblock definition of seq directly here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess because this spec should be called squashfs, not squashfs_superblock.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. It should. But for now I am unable to overcome the complexity to develop it further.

enums:
compressor:
1: gzip
2: lzo
3: lzma
4: xz
5: lz4
6: zstd
types:
flags:
seq:
# bit count starts from the largest, byte count form the smallest
- id: nfs_export_table # 0x0080
type: b1
- id: data_deduplicated # 0x0040
type: b1
- id: fragments_always_generated # 0x0020
type: b1
- id: fragments_not_used # 0x0010
type: b1
- id: fragments_uncompresed # 0x0008
type: b1
- type: b1 # 0x0004 bit is unused
- id: data_blocks_uncompresed # 0x0002
type: b1
- id: inodes_uncompresed # 0x0001
type: b1
# bits 0x0100 are below, starting from the largest
- type: b4 # bits 11110000 are unused
- id: id_table_uncompresed # 0x0800
type: b1
- id: compressor_options_present # 0x0400
type: b1
- id: xattrs_absent # 0x0200
type: b1
- id: xattrs_uncompressed # 0x0100
type: b1
superblock:
seq:
- id: magic
contents: 'hsqs'
- id: inode_count
type: u4
- id: mod_time
type: u4
doc: Unix times of last modification.
- id: block_size
type: u4
doc: |
The size of a data block in bytes. Must be a power of two between
4096 (4k) and 1048576 (1 MiB).
- id: frag_count
type: u4
doc: The number of entries in the fragment table.
- id: compressor
Copy link
Contributor

@KOLANICH KOLANICH Aug 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be a enum?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most likely. I am still learning the .ksy format. In the spec it is.

 |      |               | 1     | GZIP | just zlib deflate (no gzip headers!) |
 |      |               | 2     | LZO  |                                      |
 |      |               | 3     | LZMA | LZMA version 1                       |
 |      |               | 4     | XZ   | LZMA version 2 (no XZ headers!)      |
 |      |               | 5     | LZ4  |                                      |
 |      |               | 6     | ZSTD |                                      |

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, we have a lib of decompressors

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What for?

Copy link
Contributor

@KOLANICH KOLANICH Aug 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zlib (raw + gzip), lzma (both versions), lz4, zstd

see https://github.com/kaitai-io/kaitai_compress

also: https://github.com/KOLANICH/transformerz.py

type: u2
enum: compressor
doc: Compressor used for both data and meta data blocks.
- id: block_log
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to add doc to each field non-fully described by its title. And also crosslink it to squashfs impls using -orig-id

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The field name -orig-id is not self-descriptive either. :D It is not in the .ksy spec. What is it for?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First: we usually call a field an entry in seq.

I don't remember we have the established terminology for the stuff within fields and instances, but I'd call it an attribute. Attrs beginning with - are not validated by the compiler against specs, but there are some conventions. -orig-id is used for names in original specs and source code. I.e. if in the original spec some field is called so, you can and should put it there. If in the reference impl a variable or a struct member to which the val is parsed is called so, you can also use it. Multiple names can be inserted using YAML arrays.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Original name in spec.

 | u16  | block log     | The log2 of the block size. If the two fields do not|
 |      |               | agree, the archive is considered corrupted.         |

Added the doc , but https://ide.kaitai.io/ doesn't render anything specific for it. JS code also doesn't contain the comment.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AgentD I copied comments for the parser from your https://github.com/AgentD/squashfs-tools-ng/blob/master/doc/format.txt spec. It is okay for you that the parser with these comments is CC-0?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that it may make sense to add that fact into https://github.com/AgentD/squashfs-tools-ng/blob/master/COPYING.md , if he is OK with relicensing that doc.

Copy link

@AgentD AgentD Aug 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AgentD I copied comments for the parser from your https://github.com/AgentD/squashfs-tools-ng/blob/master/doc/format.txt spec. It is okay for you that the parser with these comments is CC-0?

I have no problem with that. In fact, one of the main reasons I wanted to document the SquashFS format was to enable others to create implementations without getting tangled up in a license mess (as reading the GPL'd code as a starting point would).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AgentD, could you explicitly license the docs under a permissive license then, please?

type: u2
doc: |
The log2 of the block size. If the two fields do not agree, the
archive is considered corrupted.
- id: flags
type: flags
- id: id_count
type: u2
doc: The number of entries in the ID lookup table.
- id: version_major
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW it may make sense to create a version struct.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there were other versions, but there aren't.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even now, because it will allow to put them in the same field called version. It is easier to read.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the .ksy file to closely follow the spec with no surprises.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is a good approach. IMHO we must always improve if we can.

type: u2
valid:
eq: 4
- id: version_minor
type: u2
valid:
eq: 0
- id: root_inode_ref
type: u8
doc: A reference to the inode of the root directory.
- id: bytes_used
type: u8
doc: |
The number of bytes used by the archive. Because SquashFS
archives must be padded to a multiple of the underlying device
block size, this can be less than the actual file size.
- id: id_table_start
type: u8
- id: xattr_id_table_start
type: u8
- id: inode_table_start
type: u8
- id: directory_table_start
type: u8
- id: fragment_table_start
type: u8
- id: export_table_start
type: u8