Skip to content

Conversation

@not-wlan
Copy link

@not-wlan not-wlan commented Oct 10, 2020

I'm working on something that requires raw access to the section table and figured I might as well add this to the library so others may make use of this.
I tried building upon the existing code but I didn't want to break the signature of header.coff_header.sections either.

This shouldn't break anything and should also add some validation for broken PEs.

Copy link
Owner

@m4b m4b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool Thank you for this PR! Just a doc comment would be nice on the new public field; I don't know much about section_tables are they optional? What do they contain? Might be good for minor summary of this in the doc comment?

Also just curious, what are you using the section tables for ? :)

@not-wlan
Copy link
Author

Also just curious, what are you using the section tables for ? :)

Microsofts CodeView debug format (aka PDB) contains a copy of the raw section headers. I'm currently working on generating those for already existing binaries to make reverse engineering them easier.
I've written a Rust module to take care of the actual generation using LLVM as the backend for generating the PDB file. In this scenario I'm currently parsing the same PE twice, once in Rust to get basic information like the image base and well the parsed sections. Ideally I'd like to skip the 2nd parse inside my wrapper though and just pass the data from goblin. I realize that I'm able to calculate that offset in my code as well figured that others might need it as well.

I don't know much about section_tables are they optional? What do they contain? Might be good for minor summary of this in the doc comment?

I figured that wasn't necessary considering that goblin is already parsing them and fails if they're not present or corrupted.

@m4b
Copy link
Owner

m4b commented Oct 14, 2020

I figured that wasn't necessary considering that goblin is already parsing them and fails if they're not present or corrupted.

So I guess what I’m asking is there other structure we need to pull out of the section tables or is &[u8] the best we can do / depends on other information ?

@m4b
Copy link
Owner

m4b commented Oct 14, 2020

I’m asking specifically because I’m wondering if it’s the best public api to expose because if it isn’t and it needs to change then it’s a breaking change. (And it’s very well possible it is the best, in which case, great !)

@not-wlan
Copy link
Author

not-wlan commented Oct 14, 2020

Oh this definitely isn't the best way to achieve what I'm trying to do. The structure behind that is IMAGE_SECTION_HEADER which is already being parsed into a struct

pub sections: Vec<section_table::SectionTable>,

The issue with that being that it copies the raw data into a custom struct

#[repr(C)]
#[derive(Debug, PartialEq, Clone, Default)]
pub struct SectionTable {
pub name: [u8; 8],
pub real_name: Option<String>,
pub virtual_size: u32,
pub virtual_address: u32,
pub size_of_raw_data: u32,
pub pointer_to_raw_data: u32,
pub pointer_to_relocations: u32,
pub pointer_to_linenumbers: u32,
pub number_of_relocations: u16,
pub number_of_linenumbers: u16,
pub characteristics: u32,
}

that isn't 1:1 what's in the binary so I can't pass it to LLVM. I'm definitely duplicating behaviour here. A better (read: cleaner) solution would be to rewrite the section header parsing to produce the correct structure or even better, pass references to the original buffer back. I avoided doing that because that'd definitely be a breaking change.
If you're down for doing that then I'd be willing to implement that as well :)

@m4b
Copy link
Owner

m4b commented Nov 28, 2020

@not-wlan sorry I got busy and didn't notice your follow up; it's too bad, just released a new (breaking 0.3).

So if i understood your proposal correctly, you want the section table we already parse to be 1:1 binary compatible so you can pass that back to llvm? Could you highlight what needs to change right now in the SectionTable struct we parse in order for it to be compatible? I assume it's just the name perhaps? Maybe we could make the SectionTable binary compatible, then have a fn name() accessor ?

@m4b
Copy link
Owner

m4b commented Nov 28, 2020

Alternatively, I think I'm ok with merging a version of this PR which tucks away the raw bytes in the struct as a private field (which e.g., mach-o does), and then we add a function like fn raw_section_table(&self) -> Result<&[u8]> which does the parsing you do in the main?
Alternatively, if we already fail when the section table is invalid, it might be better to do that parsing early (though it's preferred less and less to be less strict when parsing binaries, and allow user to handle errors however they might want).

@m4b
Copy link
Owner

m4b commented Jan 31, 2021

ping @not-wlan re my last comment, and in general, are you still interested in this feature? I'll leave it open for now, let me know whenever you find the time, no rush, thanks and thanks again for the PR and discussion!

@RaitoBezarius
Copy link
Contributor

ping @not-wlan re my last comment, and in general, are you still interested in this feature? I'll leave it open for now, let me know whenever you find the time, no rush, thanks and thanks again for the PR and discussion!

I'm interested into getting this PR back in shape if needed. :)
I agree with your last comment.

@m4b
Copy link
Owner

m4b commented Aug 15, 2025

@kkent030315 do you know if this is even relevant anymore? if yes, feel free to reopen ,just triaging very old PRs

@m4b m4b closed this Aug 15, 2025
@kkent030315
Copy link
Contributor

@kkent030315 do you know if this is even relevant anymore? if yes, feel free to reopen ,just triaging very old PRs

Perhaps, yes. This is somewhat relevant to the #470 (comment) where PeCtx<'a> is introduced and has some issues with how section table is designed in current version of goblin. It would be nice to have lifetime'd byte slice representing the section table instead of Vec.

that isn't 1:1 what's in the binary so I can't pass it to LLVM. I'm definitely duplicating behaviour here. A better (read: cleaner) solution would be to rewrite the section header parsing to produce the correct structure or even better, pass references to the original buffer back. I avoided doing that because that'd definitely be a breaking change.

I 100% agreed, but is unfortunately a breaking change as mentioned :(

--

On somewhat also relevant I believe we at some point should make every parsers "deferred" as possible as we can (i.e., it's threoretically possible to remove alloc from import/export parser by that PeCtx<'a>). It's a bit hard to consume goblin when it's partly failed at irreverant parsers when calling PE::parse and should significantly speed up initial parsing, it would be all breaking change to do so however.

@m4b
Copy link
Owner

m4b commented Aug 17, 2025

Cool thanks for analysis I’ll take a closer look at PR which introduces ctx. And as for not being as strict, as I’m sure you’ve seen there’s another larger PR which is adding more ability to not parse as strictly (if desired).

lastly, yes for both ergonomic but also perf reasons it’s better to lazily parse or not store in vecs if possible. When I started this library impl Iterator wasn’t even close to landing or even something on the table!

that being said I’m not in favor of an enormous rewrite of the base structures just because I don’t find that particularly compelling, and would be disruptive to the various crates downstream.

I’ve been saying this for years but I’d like to flesh out the last remaining important breakage, shore up the rest of the crate to prepare for backwards compatibility changes (eg non_exhaustive, etc) and then add those features and finally do a 1.0 release :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants