-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SquashFS superblock #308
base: master
Are you sure you want to change the base?
SquashFS superblock #308
Conversation
The superblock is parsed till the end, but lacks details. |
type: u2 | ||
- id: id_count | ||
type: u2 | ||
- id: version_major |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW it may make sense to create a version
struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there were other versions, but there aren't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even now, because it will allow to put them in the same field called version
. It is easier to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer the .ksy file to closely follow the spec with no surprises.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is a good approach. IMHO we must always improve if we can.
type: u4 | ||
- id: frag_count | ||
type: u4 | ||
- id: compressor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it be a enum?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most likely. I am still learning the .ksy format. In the spec it is.
| | | 1 | GZIP | just zlib deflate (no gzip headers!) |
| | | 2 | LZO | |
| | | 3 | LZMA | LZMA version 1 |
| | | 4 | XZ | LZMA version 2 (no XZ headers!) |
| | | 5 | LZ4 | |
| | | 6 | ZSTD | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, we have a lib of decompressors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
zlib (raw + gzip), lzma (both versions), lz4, zstd
type: u4 | ||
- id: compressor | ||
type: u2 | ||
- id: block_log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be nice to add doc
to each field non-fully described by its title. And also crosslink it to squashfs impls using -orig-id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The field name -orig-id
is not self-descriptive either. :D It is not in the .ksy spec. What is it for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First: we usually call a field
an entry in seq
.
I don't remember we have the established terminology for the stuff within fields
and instance
s, but I'd call it an attribute. Attrs beginning with -
are not validated by the compiler against specs, but there are some conventions. -orig-id
is used for names in original specs and source code. I.e. if in the original spec some field is called so, you can and should put it there. If in the reference impl a variable or a struct member to which the val is parsed is called so, you can also use it. Multiple names can be inserted using YAML arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Original name in spec.
| u16 | block log | The log2 of the block size. If the two fields do not|
| | | agree, the archive is considered corrupted. |
Added the doc
, but https://ide.kaitai.io/ doesn't render anything specific for it. JS code also doesn't contain the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AgentD I copied comments for the parser from your https://github.com/AgentD/squashfs-tools-ng/blob/master/doc/format.txt spec. It is okay for you that the parser with these comments is CC-0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that it may make sense to add that fact into https://github.com/AgentD/squashfs-tools-ng/blob/master/COPYING.md , if he is OK with relicensing that doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AgentD I copied comments for the parser from your https://github.com/AgentD/squashfs-tools-ng/blob/master/doc/format.txt spec. It is okay for you that the parser with these comments is CC-0?
I have no problem with that. In fact, one of the main reasons I wanted to document the SquashFS format was to enable others to create implementations without getting tangled up in a license mess (as reading the GPL'd code as a starting point would).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AgentD, could you explicitly license the docs under a permissive license then, please?
filesystem/squashfs_superblock.ksy
Outdated
- id: block_log | ||
type: u2 | ||
- id: flags | ||
type: u2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably be a bit-sized type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bit mapping in 16-bit little endian block in Kaitai is not trivial. May need to write a script to avoid mistakes.
0x0001 | Inodes are stored uncompressed. |
0x0002 | Data blocks are stored uncompressed. |
0x0008 | Fragments are stored uncompressed. |
0x0010 | Fragments are not used. |
0x0020 | Fragments are always generated. |
0x0040 | Data has been deduplicated. |
0x0080 | NFS export table exists. |
0x0100 | Xattrs are stored uncompressed. |
0x0200 | There are no Xattrs in the archive. |
0x0400 | Compressor options are present. |
0x0800 | The ID table is uncompressed.
There are only 11 bits used, so it may seem that the mapping should start with 5 bit padding. But that's a mistake. Because the first 8 bits that Kaitai sees are from the lower byte, because the flag stored as little endian. The thing that goes first are these 7 bits.
0x0001 | Inodes are stored uncompressed. |
0x0002 | Data blocks are stored uncompressed. |
0x0008 | Fragments are stored uncompressed. |
0x0010 | Fragments are not used. |
0x0020 | Fragments are always generated. |
0x0040 | Data has been deduplicated. |
0x0080 | NFS export table exists. |
The bits in Kaitai will be written from the end. To make it 8 bits, there should be 1 padding bit for the start of Kaitai structure. Wrong. The bit is actually 0x0004 missing in the middle. I don't know if you can compare it with spec manually, but for me it is hard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. PTAL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abitrolly Note that you already can read bit integers also in little-endian byte order, see kaitai-io/kaitai_struct#155 (comment). Looking at the hex values of the bits in your comment, it seems that you can read them exactly in the order as they're defined in the spec (if the whole flag is stored in little-endian bit order as you said), but rather look at the comment above to see the actual bit layout of little-endian bit integers.
You'll need a snapshot compiler version installed though, or use the devel Web IDE that comes preinstalled with the latest 0.9-SNAPSHOT out-of-the-box.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@generalmimon I don't find bit masking syntax more readable. Any attempt to work with bits without getting the byte or the word out of the flow first looks complicated. I coud use something like this.
seq:
- id: flags
type: u2
flags:
- pos: 0x0400
id: someflag
- pos: 0b00000010
id: another
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the hex bit masks specified here are meant to be used on a single u2le
parsed previously:
0x0001 | Inodes are stored uncompressed. |
0x0002 | Data blocks are stored uncompressed. |
0x0008 | Fragments are stored uncompressed. |
0x0010 | Fragments are not used. |
0x0020 | Fragments are always generated. |
0x0040 | Data has been deduplicated. |
0x0080 | NFS export table exists. |
0x0100 | Xattrs are stored uncompressed. |
0x0200 | There are no Xattrs in the archive. |
0x0400 | Compressor options are present. |
0x0800 | The ID table is uncompressed.
One can declare the structure in the .ksy
spec in the exact same order using little-endian bit integers like this:
types:
flags:
meta:
bit-endian: le
seq:
- id: inodes_uncompresed # 0x0001
type: b1
- id: data_blocks_uncompresed # 0x0002
type: b1
- type: b1
doc: 0x0004 bit is unused.
- id: fragments_uncompresed # 0x0008
type: b1
- id: fragments_not_used # 0x0010
type: b1
- id: fragments_always_generated # 0x0020
type: b1
- id: data_deduplicated # 0x0040
type: b1
- id: nfs_export_table # 0x0080
type: b1
- id: xattrs_uncompressed # 0x0100
type: b1
- id: xattrs_absent # 0x0200
type: b1
- id: compressor_options_present # 0x0400
type: b1
- id: id_table_uncompresed # 0x0800
type: b1
- type: b4 # it's recommended to omit this last unused attribute if this whole `flags` type will be used with `size: 2` everywhere
doc: bits 11110000 are unused
I don't find bit masking syntax more readable. Any attempt to work with bits without getting the byte or the word out of the flow first looks complicated.
I accept your opinion. Yeah, I realize that may be difficult to understand the pattern how the little-endian bit integers work at first glance, but once you get it, it starts to pay off with brevity, simplicity and flexibility.
And as you can see on the example above, it doesn't have to be hard to understand at all.
It also seamlessly fits into the KS concept of declarativity, meaning that you just describe the actual bit layout of the fields, without imperatively saying how to read (or write) them. This makes serialization (kaitai-io/kaitai_struct#27) much easier as well - the bit layout is everything you need to know to be able to write the values back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@generalmimon is this format already implemented? I don't see bit-endian
here http://doc.kaitai.io/ksy_reference.html
How make sure the size is set to u2le
for parsing this type? Maybe set it in meta
?
meta:
bit-endian: le
size: u2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this format already implemented? I don't see
bit-endian
here http://doc.kaitai.io/ksy_reference.html
Yes, it is, but unfortunately the documentation for this feature hasn't been written yet, sorry for that. Hopefully I will add the section about the little-endian bit integers to the User Guide soon.
I explained the bit layout of the little-endian bit-sized integers on a simple example in comment kaitai-io/kaitai_struct#155 (comment), so I suggest you read it first. If you have any questions, feel free to ask on our Gitter channel.
And please note that the KSY reference is now disabled, refer to the KSY syntax diagram instead. I'm going to add the bit-endian
key to the KSY schema soon, so it'll be available there.
How make sure the size is set to
u2le
for parsing this type? Maybe set it inmeta
?
No, you have to make sure that every seq
or instance
attribute with type: flags
has also size: 2
(i.e. the flags
type will be wrapped in its own substream with 2 bytes in size). Omitting the last padding field while some attribute doesn't wrap flags
to 2-byte substream can cause wrong readings on further bit-sized integer readings using the same stream (because the internal bit position pointer bitsLeft
stays on the wrong place in the middle of the partially-consumed byte, and just jumping out of the flags
type doesn't reset it).
Though the compiler inserts a alignToByte()
call if it detects a byte-aligned field (anything with size: [0-9]+
, or type: [us][1248]
) after a not byte-aligned bit-sized integer (b[0-9]+
) in the same seq
, the detection algorithm is currently quite dumb, and it doesn't recognize if
conditions, subtypes, repetitions and type-switchings which have to be handled with special care. See kaitai-io/kaitai_struct#743 (comment) for more info.
So that's why I suggest wrapping every subtype with seq
that might not be byte-aligned in its own substream, because you're safe this way. It's sort of a workaround for this bug. But even when it's resolved, I like to make the byte size
explicit because you immediately see how many bytes the bitfield occupies in the stream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I'm still hoping for some response here, @abitrolly. I'd be glad if you tried the approach I presented in #308 (comment), if it works for your format.
If there are any difficulties, feel free to share them, I'm here to help you.
To make it simple, you can ignore the whole thing with omitting the last padding field and adding size
, just take the spec from #308 (comment) as it is and remove my recommendation comment.
I might be a little biased as I implemented the feature 😃, but I think that using the little-endian bit integers makes sense here and can improve the readability of the spec. Currently you use big-endian bit integers, which enforces declaring the individual bits in unusual order, as you might've noticed. Your comment bit count starts from the largest, byte count form the smallest just reflects the mess. The sequence 0x0080
, 0x0040
, ... 0x0001
, bits 11110000
, 0x0800
, 0x0100
seems to me unnecessarily illogical and chaotic.
To be clear, it's still much better than extracting the bits manually using value instances and bitwise AND &
and bitshift >>
operations, because big-endian bit-integers can be at least called declarative, but there's a more organized way of doing it, so why not try it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@generalmimon the events from 9th of August, 2020 in Belarus were too disturbing to do coding. There is a lot of info I need to reread, and because it didn't fit in my head last time, I don't think it will fit in those spare 15 minutes I have now either. If there is already a new tutorial about bit mapping in Kaitai, I may be able to find some time to experiment with that. Right now my focus is on getting my ksykaitai
helper released.
@KOLANICH while all the comments are valid, the problem I stuck with is how to make sure that this field is parsed as unix timestamp? |
Feel free to rewrite the history and squash the commits into 1 commit, BTW. I don't insist on preserving my authorshiip in such a small change. |
What do you mean? A Unix timestamp is just a count of ticks, a number. KS doesn't convert timestamps into calendar time. |
@KOLANICH I mean that it is not visible that the field is unix timestamp, and as a result, the generated code will have to figure out how to deal with format conversion itself. |
BTW, why not just |
@KOLANICH because I am not sure when there will be next time to add the rest. |
@abitrolly, once the spec is published by the name, it is not easy to change it |
@KOLANICH |
@abitrolly, I guess, yes. |
Anything standing by to get this merged? |
seq: | ||
- id: superblock | ||
type: superblock |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not move the superblock
definition of seq
directly here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess because this spec should be called squashfs
, not squashfs_superblock
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. It should. But for now I am unable to overcome the complexity to develop it further.
file-extension: | ||
- squashfs | ||
- snap | ||
endian: le |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
endian: le | |
xref: | |
justsolve: Squashfs | |
wikidata: Q389314 | |
endian: le |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not rename the field to fileformats
? justsolve
doesn't sound like a descriptive name for that wiki.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not rename the field to
fileformats
?justsolve
doesn't sound like a descriptive name for that wiki.
Well, that would be a question for a new issue, but I don't think it's much beneficial. Right now, justsolve
is an established identifier, so it doesn't make sense to change it arbitrarily.
In general, I don't care about the exact identifiers, it's just important to keep them consistent (so that no two different KSY specs use another names to refer to the same thing).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This justsolve
contains broken links, doesn't link to the real spec and assumes that .sfs
extension without any references. I am not too enthusiastic to add it as a reference. I found more relevant results with Google than digging for the justsolve
keyword and following it there.
I can't say that this xref
to justsolve
is human friendly, and if it is for robots, then wikidata is a better way to link various related resources, IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assumes that
.sfs
extension without any references
Fair enough.
xref: | ||
wikidata: Q389314 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xref: | |
wikidata: Q389314 |
See #308 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the above comment, I meant that the xref
key should be under file-extension
and above license
(see the style guide), and given that I moved it above (https://github.com/kaitai-io/kaitai_struct_formats/pull/308/files#r550780445), this one can be removed.
Co-authored-by: Petr Pučil <[email protected]>
This reflects the field size in spec as agreed in review comments .
file-extension: | ||
- squashfs | ||
- snap | ||
endian: le |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assumes that
.sfs
extension without any references
Fair enough.
xref: | ||
wikidata: Q389314 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the above comment, I meant that the xref
key should be under file-extension
and above license
(see the style guide), and given that I moved it above (https://github.com/kaitai-io/kaitai_struct_formats/pull/308/files#r550780445), this one can be removed.
accessed from low memory devices. It is popular for booting LiveCDs and | ||
packing self-contained binaries. SquashFS format is used by Ubuntu .snap | ||
packages. SquashFS is natively supported by Linux Kernel. | ||
doc-ref: https://github.com/AgentD/squashfs-tools-ng/blob/master/doc/format.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should link to a specific revision, not to master
that can change over time:
doc-ref: https://github.com/AgentD/squashfs-tools-ng/blob/master/doc/format.txt | |
doc-ref: https://github.com/AgentD/squashfs-tools-ng/blob/b76eae1/doc/format.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not to the original squashfs project as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess because https://github.com/plougher/squashfs-tools/blob/master/COPYING
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess because https://github.com/plougher/squashfs-tools/blob/master/COPYING
Again, this is not a problem. Also, for interoperability you are allowed to look at this in many jurisdictions without any problem at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depends heavily on court decisions on per-case basis. And I don't think that in the case where source code is available and when one explicitly claims he bases his work on that code, the court gonna decide that the work is independent and not derivative one.
Copyright trolls are copyright trolls, and choosing GPL when not forced to, is one of the markers of a copyright troll.
Also, for interoperability you are allowed to look at this in many jurisdictions without any problem at all.
It is in the jurisdictions where reversing for interoperability is fair use ... There are ones in which there is no concepts of fair use, but reversing is allowed ... with a lot of limitations .... only for the cases almost never happening in practice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depends heavily on court decisions on per-case basis. And I don't think that in the case where source code is available and when one explicitly claims he bases his work on that code, the court gonna decide that the work is independent and not derivative one.
I would find it very hard to believe that a header file that contains no code, just definitions and a description of an on disk structure, would be copyrightable.
Copyright trolls are copyright trolls, and choosing GPL when not forced to, is one of the markers of a copyright troll.
This doesn't make sense. Who is choosing GPL here when not forced to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Who is choosing GPL here when not forced to?
Not "here", but in general. Some people prefer to license their software to GPL and other licenses designed to create legal troubles to other people. Not everyone considers it as OK. In the current legal system the best way to deal with such software and people is to avoid them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I vehemently disagree. I suggest take this discussion offline as to not clutter the comments :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adrianherrera because https://github.com/plougher/squashfs-tools/ is not a link to reference doc.
file-extension: | ||
- squashfs | ||
- snap | ||
endian: le |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually SquashFS can also be big endian:
https://github.com/plougher/squashfs-tools/blob/master/squashfs-tools/squashfs_fs.h#L30
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW KS allows switcheable endianness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SquashFS 4.0 is strictly little endian. The older versions of SquashFS could be either.
This definition is only there for compatibility, since unsquashfs
supports unpacking older formats.
There are some big endian 4.0 images floating around, typically created by cheap plastic router vendors who thought it's a great idea to hack up the format. But those usually also contain other modifications as well.
As a general remark: there are many variants of SquashFS that were made by different vendors over the years. Some of these have different headers (example: DD-WRT uses 'hsqt' as a header), others have the same headers as official squashfs but miss essential parts of for example the superblock. I would actually advise against trying to include those :-) |
Co-authored-by: Petr Pučil <[email protected]>
I have used this PR as a base for a plain java squashfs reader. I will create a PR as soon as I have a stable format. If anyone is interested in having a look, the format is currently located here: https://github.com/tisoft/jsquashfs/blob/main/src/main/kaitai/squashfs.ksy |
@tisoft that's awesome. Would be nice to get this finished by someone, because I am unlikely to get back to it as I don't need to solve the original problem - comparing |
I have added my extensions this PR in #596. I'm open to comments there. 😄 |
https://github.com/AgentD/squashfs-tools-ng/blob/master/doc/format.txt