Skip to content

Conversation

@zero318
Copy link
Contributor

@zero318 zero318 commented Jun 30, 2022

No description provided.

@zero318 zero318 changed the title Format improvements Format improvements (WIP) Jun 30, 2022

#[derive(Debug, Copy, Clone, PartialEq, Eq, Hash)]
pub struct IntFormat {
pub unsigned: bool,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might prefer signed rather than unsigned to avoid double negatives

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with unsigned since it's more of an exception to the norm than anything. I tend to structure my bools with true as the "do something different" value so that they don't need to be negated whenever being read, but I guess that doesn't matter as much with match since you have to be explicit in both cases.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true as "do something different" makes sense in C-likes where it's easy to zero-initialize things. It doesn't make so much sense in rust.


| ArgEncoding::Padding
| ArgEncoding::Padding { .. }
=> Ok(ast::Expr::from(raw.expect_int())),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this is unreachable now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's definitely not unreachable since the padding args are only dropped after being raised thanks to some sort of logic with @arg0. I don't know how important the positioning of that is since you'd left a comment about it being important to reference the original args list, so I chose to leave it there for now. If it's possible to implement the "verify the bytes are 0 or fallback to blob" check at at line 541 before raising all the arguments, then it would be unreachable for sure.

let time = f.read_i32()?;
let opcode = f.read_u16()?;
let size = f.read_u16()? as usize;
let size = f.read_i16()? as usize;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems to me this is just begging for an Out of Memory error without providing any advantages

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The field is signed though, read with MOVSX. If a size is greater than 0x7FFF, it'll index backwards (and probably explode)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't matter whether it's signed in game because it gets cast immediately to an unsigned type. So unless you plan to add range checking (which you certainly can do), you're literally just trading one incorrect behavior for another...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Range checking would definitely be ideal, though IDK how you'd want the error to be formatted.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems similar to the check in anm

@ExpHP ExpHP marked this pull request as ready for review July 15, 2022 23:05
@ExpHP ExpHP marked this pull request as draft July 15, 2022 23:05
(Th08, 128, Some(("S(imm)f(imm)f(imm)f(imm)f(imm)", None))), // Argument 1 is unread
(Th08, 129, Some(("b(imm)---", None))),
(Th08, 130, Some((r#"s(imm;enum="EclSub")--"#, None))),
(Th08, 130, Some((r#"s(imm;extend;enum="EclSub")--"#, None))),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"extend" is too short of a name for something that is probably only ever going to be used once. Probably want something with multiple words, meaning we might need to pick now whether we want snake_case or kebab-case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There're quite a few places where the game engines will read a full dword from the signature in order to check for variables but then only read the lower bits of the resulting value. Ideally extend could be replaced with a more general purpose attribute applicable to those situations as well since that would eliminate the need for extended padding anyway. Maybe this could be written as E(imm;range="s") or something?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also thinking this should probably be a dword-sized value with an attribute that limits its value size. However, I really do not want to see abi letters get used inside string literals.

src/llir/abi.rs Outdated
/// stored in the arg0 header field of the instruction in EoSD and PCB ECL. (which is mapped to the
/// `@arg0` pseudo-argument in raw instruction syntax)
Integer { size: u8, ty_color: Option<TypeColor>, arg0: bool, immediate: bool, format: ast::IntFormat },
Integer { size: u8, ty_color: Option<TypeColor>, arg0: bool, immediate: bool, extend: bool, format: ast::IntFormat },
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is getting out of hand, let's break this up over lines.

///
/// The first argument may have `arg0` if it is two bytes large. This indicates that the argument is
/// stored in the arg0 header field of the instruction in EoSD and PCB ECL. (which is mapped to the
/// `@arg0` pseudo-argument in raw instruction syntax)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document extend.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What sort of formatting should be used for that? The only currently documented attributes are mask and furibug, which don't even use the same format.

Copy link
Owner

@ExpHP ExpHP Jul 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh bugger, I forgot to document enum. That's the only thing I see missing from here.

It should be in prose in the documentation of Integer, just like the documentation for arg0 that I put this review comment under. (basically, "put it here!")

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, imm is also missing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And hex isn't included either. There're enough attributes that there should probably be a list with the names at the front.

Copy link
Owner

@ExpHP ExpHP Jul 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we probably want to make a markdown list

///
/// Supported attributes:
///
/// * **`imm`**: lorem ipsum dolor sit amet
/// i have a barnicle in my backseat
/// * **`arg0`**: blah blah blah blah

etc

defs::Signature {
return_ty: sp!(value::ExprType::Void),
params: abi.encodings.iter().enumerate().flat_map(|(index, enc)| {
params: abi.encodings.iter().filter(|enc| !matches!(*enc, ArgEncoding::Padding { .. })).enumerate().flat_map(|(index, enc)| {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is too noisy. Funnily, this is a flat_map and yet it doesn't even look like anything is currently taking advantage of that; but this is exactly the sort of use case for that, so let's make use of it:

Suggested change
params: abi.encodings.iter().filter(|enc| !matches!(*enc, ArgEncoding::Padding { .. })).enumerate().flat_map(|(index, enc)| {
params: abi.encodings.iter().enumerate().flat_map(|(index, enc)| {

And change the Padding match arm to:

                | ArgEncoding::Padding { .. }
                => return None,  // hide from signature

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was concerned something like that would end up inserting a None into the resulting collection. Didn't know it would skip over it. Seems like there could be issues in the future though if index is ever used for anything though.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops I missed the fact that this would change indices, lemme read this over again.

Copy link
Owner

@ExpHP ExpHP Jul 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was concerned something like that would end up inserting a None into the resulting collection.

flat_map flattens the iterators being returned by the closures. Option is an iterator of zero or one items, so None produces no items. Technically, this should be using .filter_map instead (you can just change the .flat_map to .filter_map and it should still compile); this is just an alias for Option-based flat_maps that is clearer in intent.

Oops I missed the fact that this would change indices, lemme read this over again.

So the index is only used to generate arg names. I would pull that out.

// in a line of code outside the loop
let arg_names = (1..).map(|i| ident!("arg_{i}"));

// inside the loop, change this line:
let name = sp!(abi_span => ctx.resolutions.attach_fresh_res(arg_names.next().unwrap()));

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and of course get rid of the enumerate()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make more sense to just add another function to the InstrAbi implementation similar to arg_encodings that returns the filtered version? This seems like the sort of thing that would end up continually popping up throughout the code base otherwise. (I would've done that in the first place, but rust complained about the filter output not being compatible with VeclikeIterator or something and I don't know how to make it happy.)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmmm. Crap, that's a pain.

Writing this function is probably a good idea. It will have to return the slightly weaker DoubleEndedIterator instead of VeclikeIterator. (basically because filter can't be ExactSizeIterator, which is the other half of VeclikeIterator)

match size {
1 => args_blob.write_u8(0).expect("Cursor<Vec> failed?!"),
1 => {
if (args_blob.position() & 3) == 0 { small_padding_value = 0i8 }
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid one-line ifs. I reserve these for extreme cases (usually only when there are multiple similar ifs in a row and I want to emphasize the similarities between them)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll freely admit the entire logic for handling that was a 5 minute quick fix that bolts a random functionality where it doesn't quite belong. I'd love to replace the logic with something better that doesn't feel as dirty.

Copy link
Owner

@ExpHP ExpHP Jul 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the complexity of this function increases (which it is no doubt going to, without end), we're going to want to start factoring out pieces of it into separable units.

This thing right here is easily separated out into a Finite State Machine. You could have something like:

/// Finite-state machine helper for the value of `-` padding.
struct SmallPaddingFillHelper {
    fill_value: i8,
}

impl SmallPaddingFillHelper {
    /// Update the helper in response to the next arg encoding.
    ///
    /// Returns the current fill value for `-` padding.
    fn step(&mut self, enc: &ArgEncoding, blob_position: u64) -> i8 {
        todo!("update self.fill_value");
        self.fill_value
    }
}

and then you could put let small_padding_fill = padding_helper.step(enc, args_blob.position()); near the top of the loop

Copy link
Contributor Author

@zero318 zero318 Jul 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant more that I'd rather leave the padding bytes as exclusively 0, eliminate the padding fill variable entirely, and let the values be handled by a range sort of attribute within a single larger value. That way there isn't any more multi-iteration mutable state than there needs to be.

let mut param_mask: raw::ParamMask = 0;
let mut current_param_mask_bit: raw::ParamMask = 1;

let mut small_padding_value = 0i8;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to see tests for:

  • basic functionality of extend
  • padding value resetting to 0 at dword boundaries. (s(extend)------)
  • resetting to zero on positive values

That last one would involve

10 s(extend)b-
ins_10(-2, 1)

args_blob should compile to blobify![-2i16, 1i16]. You will need to impl Blobify for i16 here:

impl Blobify for i32 {
(frankly this should probably be using a macro to gen impls for all integer types)

ColorFormat::Argb4444 => Rc::new(Argb8888::encode(&Argb4444::decode(bytes))),
ColorFormat::Argb8332 => Rc::new(Argb8888::encode(&Argb8332::decode(bytes))),
ColorFormat::Gray8 => Rc::new(Argb8888::encode(&Gray8::decode(bytes))),
ColorFormat::Rgb332 => Rc::new(Rgb332::encode(&Rgb332::decode(bytes))),
Copy link
Owner

@ExpHP ExpHP Mar 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be Argb8888::encode?

src/llir/abi.rs Outdated
arg0: bool,
immediate: bool,
extend: bool,
format: ast::IntFormat
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im anal about trailing commas :P

)));
}
}
pub(crate) fn validate(&self, _ctx: &CompilerContext) -> Result<(), ErrorReported> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was removed because its only purpose was to handle the padding arguments from old STD. This was refactored into the _ and - formats usable in any script type rather than optional arguments, so this check is unnecessary


let sprite_id = sprite.id.unwrap_or(*next_auto_sprite_id);
*next_auto_sprite_id = sprite_id + 1;
*next_auto_sprite_id = sprite_id.wrapping_add(1);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ZUN did something extremely cursed in a few old engine files and uses -1 as the sprite ID to intentionally overwrite sprites from a different ANM. This didn't break previously because truth didn't dump the ID field to roundtrip it but blew up once the field was added.

}
Ok(abi_parts::PaddingInfo { count, index: first_index })
fn find_and_remove_padding(&self, arg_encodings: &mut Vec<(usize, &ArgEncoding)>) {
arg_encodings.retain(|(_, enc)| !matches!(*enc, ArgEncoding::Padding { .. }));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was also simplified by changing padding to be format letters instead of optional args

ExpHP and others added 9 commits August 25, 2025 22:13
At some point, zero had attempted to merge some commits from main
into his branch, but it did not get saved in git as a merge.
This attempts to correct that.

A change on my branch to add decompilation of string literals to
enum consts was undone.  We decided it didn't make sense.

At the end there was this ScalarValueMap type left over that seemed
no longer used, though I'm surprised it wasn't used in other places...
@ExpHP ExpHP changed the base branch from main to format_improvements_test August 26, 2025 02:27
@ExpHP ExpHP changed the base branch from format_improvements_test to main August 26, 2025 02:36
/// `C` in mapfile. 4-byte integer immediate or register, printed as hex when immediate.
Color,
/// `_` in mapfile. Unused 4-byte space.
/// `-` in mapfile. Unused 1-byte space.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forgot to mention before. The first line of a docstring is special. It shows up as the summary of the item.

@ExpHP ExpHP mentioned this pull request Aug 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants