Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Padding for union type #1281

Open
gitoleg opened this issue Jan 13, 2025 · 10 comments
Open

Padding for union type #1281

gitoleg opened this issue Jan 13, 2025 · 10 comments
Labels
bug Something isn't working IR design Considerations around the design of ClangIR IR difference A difference in ClangIR-generated LLVM IR that could complicate reusing original CodeGen tests

Comments

@gitoleg
Copy link
Collaborator

gitoleg commented Jan 13, 2025

There is a problem with union type: it has no padding. And it's important to have one.

Example:

  typedef union {    
     uint16_t f0;
     signed   f1 : 11;
     unsigned f2 : 2;
     signed   f3 : 5;
  } U;

  static U g0 = {65534};
  static U g1[2] = {{65534}, {65534}};
  static uint16_t *g2 = &g1[1].f0;

This is how the original LLVM IR looks like:

  @g0 = global %union.U { i16 -2, [2 x i8] zeroinitializer }
  @g1 = global [2 x %union.U] [%union.U { i16 -2, [2 x i8] zeroinitializer }, %union.U0 { i16 -2, [2 x i8] zeroinitializer }]
  @g2 = global ptr getelementptr (i8, ptr @g1, i64 4)

As one can see, there is an extra padding for the union type: [2 x i8] zeroinitializer , and g0 size is 4 bytes in total. And for g2 offset is computes as 4 - since it points to the second element of g1.

LLVM IR after CIR doesn't have these padding bytes neither it has union keyword (I wonder when we lost it but it's not a problem right now).

@g0 = global { i16 } { i16 -2 }
@g1 = global [2 x { i16 }] [{ i16 } { i16 -2 }, { i16 } { i16 -2 }]
@g2 = global ptr getelementptr inbounds ([2 x { i16 }], ptr @g1, i32 0, i32 2)

g0 has size of 2 bytes, so as g1 element size as well.

The problem is in the g2 initialization: unfortunately the offset is computed from APValue methods here and is equal to 4 as in the OG, no CIR is involved.

Thus, the offset of the second element of g1 is computed as 4, and element size is 2 -> we got wrong indices for the GlobalViewAttr from the very beginning: @g2 = #cir.global_view<@g1, [2 : i32]> : !cir.ptr<!u16i>!

The g0 size is 2 because for unions we compute size as a size of the largest member and nothing else.

According to the comments we chose to track all union members and deffer padding computation to the lowering. But looks like that was not happened.

So the question is - what do you think, how to fix it properly?
I see the following ways:

  1. Make a separate type for unions. In this case we can still track all the members and have padding bytes as a separate field
  2. Add a notion about trailing padding into the struct type - as a parameter, e.g. index of the field.
  3. Try to add padding in the lowering like it was planned, e.g. in the type conversion. I'm not sure that's a good way though.

Any other thoughts, ideas?

@bcardosolopes bcardosolopes added bug Something isn't working IR difference A difference in ClangIR-generated LLVM IR that could complicate reusing original CodeGen tests IR design Considerations around the design of ClangIR labels Jan 14, 2025
@bcardosolopes
Copy link
Member

bcardosolopes commented Jan 14, 2025

Make a separate type for unions. In this case we can still track all the members and have padding bytes as a separate field

I'd be worried about this given potential duplication, but there's likely an approach where we can bake all current functionality of struct type into an interface and have union extend that.

Add a notion about trailing padding into the struct type - as a parameter, e.g. index of the field.

StructType has a member called layoutInfo, which is always of StructLayoutAttr dynamic type. layoutInfo is always computed lazily so we save extra computation to only when we have to use it. Can we bake all the needed information over there? Note that it already has an array of offsets and padding information. Perhaps it needs some extra info about unions in general?

@gitoleg
Copy link
Collaborator Author

gitoleg commented Jan 14, 2025

Can we bake all the needed information over there?

Well, I think we can - there will be a bunch of boring changes though) I think we can add an extra data member into StructTypeStorage - e.g. bool trailindPadding - and also add it in the StructType methods, use it in the printing and parsing,
various StructType::get... methods. In this case we can always infer a union size (as the size of the largest member without padding bytes taken into account) + size of padding itself.

And in the CIRRecordLayputBuilder pass extra boolean parameter into StructType::complete method to designate a union has padding.

Does it sound like a plan for you?

@bcardosolopes
Copy link
Member

I'm not 100% sure what's the best solution here, just putting extra options on the table so we can discuss a potential less intrusive solution.

Well, I think we can - there will be a bunch of boring changes though) I think we can add an extra data member into StructTypeStorage

Why touch StructTypeStorage? Isn't storing info in StructLayoutAttr enough? Why can't padding information be computed in computeSizeAndAlignment so that the extra data members can just query that? What interface OG uses to retrieve this info? Can't we just have similar one on top of StructLayoutAttr information? (Just bringing up more questions to perhaps find an alternatives here)

Also, if we have to change computeSizeAndAlignment to compute things differently for unions in a way that's transparent to the client of the StructType API, we can use RecordKind::Union for that. I'd vote for hiding more details if possible so that queries can be more generic without the client doing a lot of dance.

According to the comments we chose to track all union members and deffer padding computation to the lowering. But looks like that was not happened.

I'm not sure that decision still holds, specially in light of what you are bringing up. Perhaps we should just do it early?

@gitoleg
Copy link
Collaborator Author

gitoleg commented Jan 15, 2025

Just bringing up more questions

Let me briefly remind how it's done in the OG's CGRecordLayoutBuilder: we compute a union type layout size based on AST information. First, we find layout size:

  1. const ASTRecordLayout & ASTContext::getASTRecordLayout(const RecordDecl *D).
  2. CharUnits ASTRecordLayout::getSize()
    Then iterate over all fields to find the largest one. If the layout size is greater than the size of the largest type - the padding bytes are added to the record members. And the latter is also important I would say. So OG's union type is a type of the largest member + optional padding.

Now, coming back to CIR. The same AST methods are used in CIRRecordLayoutBuilder. So we need clang::RecordDecl to find what is a union type size. Also, we need these padding bytes present in our cir::StructType members: it would be strange and error prone if we find a way to compute size correctly but don't add the same bytes as LLVM does. This makes me think that we should not deffer padding computation and do it in CIRRecordLayoutBuilder. But unlike the OG, we chose to track all the members for unions.
So looks like that CIR union type consist of all members types and from an optional padding type in the end.

But it will be hard to distinguish the type used for padding from the other members later. That is where the idea of adding an additional boolean trailindPadding comes from. Thus, in computeSizeAndAlignment we could compute the largest member in the same fashion as we do it now but not touching the padding type we added earlier (if we did). In this case we can end up with union type size computation as size ( largestMember) + size (padding type) . Something like this. The knowledge about largest type and padding type will also be helpful in the lowering.

Another approach I tried is to extract the clang::RecordDecl from ASTRecordDeclInterface right in computeSizeAndAlignment so I could use ASTRecordLayout::getSize mentioned above and prove a union may have padding bytes. Frankly speaking, I don't like this way - it works for my toy examples but crashes in run time with our tests and with weird errors. But I still can try to go this way if you'd prefer.

Why touch StructTypeStorage? Isn't storing info in StructLayoutAttr enough?

The idea was to pass the information via StructType::complete which finally ends with mutating of the StructTypeStorage.

I'd vote for hiding more details if possible so that queries can be more generic without the client doing a lot of dance.

I'll do my best!

So ... I don't know if I wrote a good explanation. But if I did - may be now you have more thoughts?

@bcardosolopes
Copy link
Member

bcardosolopes commented Jan 15, 2025

Also, we need these padding bytes present in our cir::StructType members: it would be strange and error prone if we find a way to compute size correctly but don't add the same bytes as LLVM does. This makes me think that we should not deffer padding computation and do it in CIRRecordLayoutBuilder

Agreed!

But it will be hard to distinguish the type used for padding from the other members later.

Food for thought: are we using the other members for anything right now? Have you considered removing that altogether for something more simple? This could be one the things designed too early without use case in mind that maybe should be reintroduced later whenever there's a use case. If we remove the other members, would it make handling this a bit more clean (or in a generic way that doesn't require knowledge about the type being a union)?

That is where the idea of adding an additional boolean trailindPadding comes from. Thus, in computeSizeAndAlignment we could compute the largest member in the same fashion as we do it now but not touching the padding type we added earlier (if we did). In this case we can end up with union type size computation as size ( largestMember) + size (padding type) . Something like this. The knowledge about largest type and padding type will also be helpful in the lowering.

This is a viable solution, I have some other higher level questions about the API: if trailingPadding is set, would isPadded also return true? If the struct type is RecordKind::{Struct,Class} and trailingPadding is called, what does the answer mean? Should this method only be used if we are talking about unions?

Another approach I tried is to extract the clang::RecordDecl from ASTRecordDeclInterface ... I still can try to go this way if you'd prefer.

Nah, rather not depend on AST presence for this!

@gitoleg
Copy link
Collaborator Author

gitoleg commented Jan 16, 2025

So the solution is almost here!

Food for thought: are we using the other members for anything right now?

I would say we don't really use them - unless I missed something. And yes - the easiest solution would be just to stick to the OG and don't emit record type with all the possible members enumerated for unions and save only the largest one and padding. But since we already did the opposite - and may be it's even good, since it preserves more information from AST - I would keep it like it is.

I have some other higher level questions about the API: if trailingPadding is set, would isPadded also return true?

yes. But looks like isPadded is not used at all - it s private method for StructType.

If the struct type is RecordKind::{Struct,Class} and trailingPadding is called, what does the answer mean? Should this method only be used if we are talking about unions?

First of all, this trailingPadding means that there are extra bytes in the end - so no matter what the type is in the question. Next, we will use it in the computeSizeAndAlignment only - at least now I hope we won't need to do anything else.

Thus, with such approach it won't be matter which exact type it is - struct, class or union.

Finally, I even think we don't need trailingPadding - too narrow usage area. Instead, we may add just something more general, e.g. hasPadding parameter in the StructType::complete. In this case it will cover more cases - for example structures, which may have padding between fields. The same approach will make sense for unions also - once hasPadding set to true, we know that the padding type is the last one in the members list.

Nah, rather not depend on AST presence for this!

that makes me happy!

@bcardosolopes
Copy link
Member

we may add just something more general, e.g. hasPadding parameter in the StructType::complete. In this case it will cover more cases - for example structures, which may have padding between fields. The same approach will make sense for unions also - once hasPadding set to true, we know that the padding type is the last one in the members list.

Sweet, sounds like a plan then!

@gitoleg
Copy link
Collaborator Author

gitoleg commented Jan 17, 2025

Good!

To make sure we are on the same page - I add hasPadding to the StructType::complete and most likely will have to update StructTypeStorage and builders for StructType. And printer and parser also. Do you agree? Or now it doesn't look that good as earlier?)

@bcardosolopes
Copy link
Member

I'm still not 100% sure why you need to change the storage (versus using lazy layout stuff), but I might be missing something - maybe go ahead with your version and I can clarify extra questions on top of the PR?

@gitoleg
Copy link
Collaborator Author

gitoleg commented Jan 22, 2025

good! Will start to prepare the PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working IR design Considerations around the design of ClangIR IR difference A difference in ClangIR-generated LLVM IR that could complicate reusing original CodeGen tests
Projects
None yet
Development

No branches or pull requests

2 participants