Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EOFCREATE: Don't hash the init-container #162

Open
Tracked by #165
chfast opened this issue Sep 6, 2024 · 61 comments
Open
Tracked by #165

EOFCREATE: Don't hash the init-container #162

chfast opened this issue Sep 6, 2024 · 61 comments
Assignees

Comments

@chfast
Copy link
Member

chfast commented Sep 6, 2024

The create address derivation for EOFCREATE is based on CREATE2.

keccack256(sender_address + salt + keccak256(init-container))

where the sender_address is the logical address of the contract invoking EOFCREATE.

We identified that the keccak256(init-container) goes against the "code non-observability" because it locks in the contents of the init-container e.g. preventing re-writing it in some future upgrade.

It also seems unnecessary expensive: EOFCREATE can only pick up one of the deploy-time sub-containers.

Solution 1: Use sub-container index

The create address is already bound to the "sender address", code is immutable (no SELFDESTRUCT) so replacing the hash of the sub-container with just its index may be enough.

Solution 2: Use code's address + sub-container index

The CREATE2 scheme uses the "sender address" with may not be the address of the code (see DELEGATECALL). I'm not sure if this is desired property for CREATE2. But for EOFCREATE this looks to be a problem. A contract may deploy different contract using DELEGATECALL proxy: for EOFCREATE inside a DELEGATECALL the same sub-container index will point to different sub-container. To fix this we can replace/combine the physical code address:

  • keccak256(code_address + salt + sub-container-index)
  • keccak256(sender_address + code_address + salt + sub-container-index)
@pdobacz pdobacz mentioned this issue Sep 9, 2024
5 tasks
@axic
Copy link
Member

axic commented Oct 3, 2024

A relevant code is CREATE3: https://github.com/Vectorized/solady/blob/main/src/utils/CREATE3.sol

Should ask for feedback from library authors.

@pdobacz
Copy link
Member

pdobacz commented Oct 7, 2024

If we decide to take away initcontainer hashing (or change it somehow), we need to revisit and ask for an update to the EOF considerations for Verkle EIPs. A link a tentative PR which handles this (and a thread about initcontainer hashing) here.

  • check Verkle EIP(s) if update is necessary

@frangio
Copy link

frangio commented Oct 7, 2024

We identified that the keccak256(init-container) goes against the "code non-observability" because it locks in the contents of the init-container e.g. preventing re-writing it in some future upgrade.

A different perspective on this: the hash of the init container locks in the semantics, not the exact code contents of the contract. This doesn't prevent future rewrites of the code because they will be semantics-preserving or inherently breaking regardless of code observability.

Code observability should be removed to the extent that CODECOPY/EXTCODECOPY can cause semantics-preserving rewrites to become breaking changes indirectly. In my opinion, it should be totally okay to rewrite code even when the address is a witness of the original code on the account. In fact, it's a good thing that there's a way to recover CODECOPY via this witness in a way that doesn't risk being broken by rewrites!

I also think this ability to provide on-chain proof of the code/semantics of an account is an important primitive that we shouldn't get rid of.

@pdobacz
Copy link
Member

pdobacz commented Oct 8, 2024

Thank you for this perspective. We're gathering inputs on this one, so this feedback is very useful.

I also think this ability to provide on-chain proof of the code/semantics of an account is an important primitive that we shouldn't get rid of.

Do the Solution 1 & 2 above still qualify as getting rid of? Note that EOFCREATE in current form doesn't allow deploying arbitrary off-chain code, only that listed as one of its subcontainers. So it can be proven on-chain that a given account has code deployed from a known address' subcontainer

@frangio
Copy link

frangio commented Oct 8, 2024

Because of DELEGATECALL I don't think Solution 1 gives any guarantees about the code/semantics of an account.

Solution 2 would work if there was a way to trace back to a "root" deployer whose address was computed with codehash. If EOFCREATE doesn't do that (and CREATE2 is "removed" via EOF), I don't think it would be possible to get a root deployer like that because creation transactions don't use the codehash.

I think the current state where the code hash is directly included is better though, because it takes a single hash to compute the address rather than a tracing process involving multiple hashes. Additionally, you may only care about proving the code of an account ignoring the code of its deployer, and if you have to trace it back to the deployer you are not able to do that. Overall I think including code hash into the address directly is significantly better.

@frangio
Copy link

frangio commented Oct 8, 2024

A note about CREATE3.sol and similar patterns: this is often used to deploy contracts via CREATE2 at a deterministic address that doesn't depend on the creation parameters (or only some of them). For example Uniswap v3 does this here:

This kind of use case is actually natively addressed by EOFCREATE! The workaround will no longer be needed because the input is not included in the address formula.

The other side to this is that users of CREATE2 that do care about the creation parameters will need a new way to validate them. Either a trusted factory mixes them into the salt, or the contract exposes getters.

There is another potential use case for CREATE3.sol, which is to deploy at a deterministic address that is fully independent of the creation code. This other use case is not only not addressed by EOFCREATE, it becomes impossible to strictly implement under EOF, although it is easy to work around by deploying a proxy instead. I don't know how common this use case is honestly.

More context here: https://github.com/moodysalem/EIPs/blob/46350bb/EIPS/eip-3171.md#motivation

@chfast
Copy link
Member Author

chfast commented Oct 15, 2024

Analysis of input data for create address

It is preferable the inputs be less than 136 bytes (the keccak256 block size).

CREATE

Input length: 23–31
Prefix byte: 0xd60xde

The input is RLP encoded sender's address and sender's nonce. The encoding is variable-length with fixed encoding for the address and variable-length encoding of the nonce.

CREATE2

Input length: 85
Prefix byte: 0xff

The input is fixed length concatenation of prefix byte 0xff, sender's address (20 bytes), user-provided salt (32 bytes) and initcode hash (32 bytes). The prefix byte has been added to avoid collisions with the CREATE scheme but this is unnecessary because both schemes don't have inputs of the same lengths.

EOFCREATE solution 2

Input length: 63 or 97
Prefix byte: none

Concatenation of the addresses the sender and the code (2x 20 bytes), user-provided salt (32 bytes) and subcontainer index (1 byte). Alternatively, we can allocate 32 bytes per address for compatibility with Address Space Expansion.
None of these total lengths match any existing schemes so the prefix byte is not necessary.

@chfast
Copy link
Member Author

chfast commented Oct 15, 2024

This kind of use case is actually natively addressed by EOFCREATE! The workaround will no longer be needed because the input is not included in the address formula.

Interesting. I was thinking about extending solution 2 to hash also inputs provided to EOFCREATE. But looks there are some use cases where this is undesirable.

The other side to this is that users of CREATE2 that do care about the creation parameters will need a new way to validate them. Either a trusted factory mixes them into the salt, or the contract exposes getters.

I think the pattern may be for user to hash inputs and combine them with the salt.

@shemnon
Copy link
Contributor

shemnon commented Oct 15, 2024

What about chained EOF creates going deep? In this case would the "code address" of the second level create be the code address of the parent? If the code address is the address if the topmost container... not so much. Because then the index could be re-used at different depths to cause different contracts to be deployed at the same address based on call data (although returncontract can do that more cleanly).

Or is it the "sender address" that gets updated in nested EOFCREATES? Either way we need tests for this scenario.

@chfast
Copy link
Member Author

chfast commented Oct 15, 2024

O: CALL A
A: EOFCREATE[1](X)
C: eofcreate_addr(sender=A, code=A, salt=X, idx=1)

O: CALL A
A: DELEGATECALL B
B: EOFCREATE[2](X)
C: eofcreate_addr(sender=A, code=B, salt=X, idx=2)

O: CALL A
A: EOFCREATE[1](X)
C: eofcreate_addr(sender=A, code=A, salt=X, idx=1)
C: EOFCREATE[2](Y) (initcode execution)
D: eofcreate_addr(sender=C, code=C?, salt=Y, idx=2)
C: deployed

O: CALL C (deployed above)
C: EOFCREATE[2](Y)
E: eofcreate_addr(sender=C, code=C?, salt=Y, idx=2)
this will generate the same address and collide with D.

@pdobacz
Copy link
Member

pdobacz commented Oct 15, 2024

this will generate the same address and collide with D.

Is this a problem though? seems OK to me. We have the a deterministic address to deploy D / E at, but the code itself (contents) is not in the witness

@frangio
Copy link

frangio commented Nov 4, 2024

Wouldn't this scheme work?

keccack256(sender_address || code_address || salt || during_init || init_subcontainer_idx)

during_init would be a boolean that is true iff EOFCREATE executes during sender init.

The combination (code_address, during_init, init_subcontainer_idx) should uniquely identify a container. If during_init == true, take the init container that created code and look at subcontainer number init_subcontainer_idx. If during_init == false, take the runtime container of code and look at subcontainer number init_subcontainer_idx.

Amending @chfast's examples above:

O: CALL A
A: EOFCREATE[1](X)
C: eofcreate_addr(sender=A, code=A, salt=X, during_init=0, idx=1)

O: CALL A
A: DELEGATECALL B
B: EOFCREATE[2](X)
C: eofcreate_addr(sender=A, code=B, salt=X, during_init=0, idx=2)

O: CALL A
A: EOFCREATE[1](X)
C: eofcreate_addr(sender=A, code=A, salt=X, during_init=0, idx=1)
C: EOFCREATE[2](Y) (initcode execution)
D: eofcreate_addr(sender=C, code=C, salt=Y, during_init=1, idx=2)

O: CALL C (deployed above)
C: EOFCREATE[2](Y)
E: eofcreate_addr(sender=C, code=C, salt=Y, during_init=0, idx=2)

E no longer equal to D

Any further nested EOFCREATE would have different sender.

@pdobacz
Copy link
Member

pdobacz commented Nov 5, 2024

EDIT: this post is likely some kind of misunderstanding on my part. We mentioned originally that code address is the address of the outer-most EOFCREATE, but actually, a scheme where code_address changes during each EOFCREATE seems to avoid address conflicts better...

Having revisited @chfast 's example after a while, I think by code_address we mean the address where the outer-most EOFCREATE in a nested chain of EOFCREATEs resides (there is no other address with code in that chain yet!), so:

O: CALL A
A: EOFCREATE[1](X)
C: eofcreate_addr(sender=A, code=A, salt=X, idx=1)
C: EOFCREATE[2](Y) (initcode execution)
D: eofcreate_addr(sender=C, code=C A, salt=Y, idx=2) <----- code not C but A here
C: deployed

This already makes D != E or putting it differently:

  1. When CALL is used to change context in a chain of calls, both sender and code will change
  2. DELEGATECALL - code will change, sender won't
  3. EOFCREATE - sender will change, code won't

However, this only changes the way one runs into the D==E conflict:

O: CALL C (deployed above)
C: DELEGATECALL A
A: EOFCREATE[2](Y)
E: eofcreate_addr(sender=C, code=A, salt=Y, idx=2) == D

but initcodes used for D and E are different (at different nesting depth in A)

@pdobacz
Copy link
Member

pdobacz commented Nov 5, 2024

...or wait, maybe we actually do want the code to change on EOFCREATE too, even though there is no code at that address yet. Combined with @frangio's during_init addition seems to work to avoid conflicts.

@gumb0
Copy link
Contributor

gumb0 commented Nov 7, 2024

...or wait, maybe we actually do want the code to change on EOFCREATE too, even though there is no code at that address yet. Combined with @frangio's during_init addition seems to work to avoid conflicts.

It doesn't sound right for nested EOFCREATE to have code_address equal to the address that outer EOFCREATE will deploy... Or maybe it should be called differently then, not code_address, but executing_address

@pdobacz
Copy link
Member

pdobacz commented Nov 7, 2024

...or wait, maybe we actually do want the code to change on EOFCREATE too, even though there is no code at that address yet. Combined with @frangio's during_init addition seems to work to avoid conflicts.

It doesn't sound right for nested EOFCREATE to have code_address equal to the address that outer EOFCREATE will deploy... Or maybe it should be called differently then, not code_address, but executing_address

The problem stems from the fact that we're swapping the code executing at the context of C - during init it is the initcode, after it is the initcode's subcontainer (RETURNCONTRACTed). This makes the two instances of EOFCREATE[2](Y) mean different things and is solved by Frangio's proposal. From that PoV it kinda makes sense - it's 2 different codes, but both live at C, so they have a common code_address, so to speak.

code_address name is just a name related to DELEGATECALL (which we are addressing here). Having this in mind, with executing_address it isn't clear to me if it's the code_address or msg.recipient address (the context which "executes" some code)...

@gumb0
Copy link
Contributor

gumb0 commented Nov 7, 2024

...or wait, maybe we actually do want the code to change on EOFCREATE too, even though there is no code at that address yet. Combined with @frangio's during_init addition seems to work to avoid conflicts.

It doesn't sound right for nested EOFCREATE to have code_address equal to the address that outer EOFCREATE will deploy... Or maybe it should be called differently then, not code_address, but executing_address

The problem stems from the fact that we're swapping the code executing at the context of C - during init it is the initcode, after it is the initcode's subcontainer (RETURNCONTRACTed). This makes the two instances of EOFCREATE[2](Y) mean different things and is solved by Frangio's proposal. From that PoV it kinda makes sense - it's 2 different codes, but both live at C, so they have a common code_address, so to speak.

From my perspective in case of EOFCREATE nested in outer EOFCREATE initcode, the inner EOFCREATE's initcode doesn't "live at C" at all, it has almost nothing to do with C. C is an address that will be deployed (or not) when outer EOFCREATE finishes.
C happens to be msg.recipient when inner EOFCREATE is being executed (this is what I call "executing address")

But this is a bit of bikeshedding. I agree during_init flag seems to solve it.

@pdobacz
Copy link
Member

pdobacz commented Nov 7, 2024

inner EOFCREATE's initcode doesn't "live at C" at all, it has almost nothing to do with C

Yeah, I see your point here. But actually the bikeshedding is useful. I revisited the option with "code_address doesn't change on EOFCREATE + during_init" with this new perspective and now it seems to me it works too, I must've made a mistake somewhere yesterday, PTAL:

O: CALL A
A: EOFCREATE[1](X)
C: eofcreate_addr(sender=A, code=A, salt=X, during_init=0, idx=1)
C: EOFCREATE[2](Y) (initcode execution)
D: eofcreate_addr(sender=C, code=A, salt=Y, during_init=1, idx=2)
C: deployed

O: CALL C (deployed above)
C: EOFCREATE[2](Y)
E: eofcreate_addr(sender=C, code=C, salt=Y, during_init=0, idx=2) != D

O: CALL C (deployed above)
C: DELEGATECALL A
A: EOFCREATE[2](Y)
F: eofcreate_addr(sender=C, code=A, salt=Y, during_init=0, idx=2) != D and != E

In this version code_address matches the expectations - it is where the code lives

@gumb0
Copy link
Contributor

gumb0 commented Nov 7, 2024

In this version code_address matches the expectations - it is where the code lives

Yes, I like this version more. Seems to work and not conflict on deeper nesting levels, too.

O: CALL A
A: EOFCREATE[1]
C: eofcreate_addr(sender=A, code=A, salt=X, during_init=0, idx=1)
C: EOFCREATE[2] (initcode execution)
D: eofcreate_addr(sender=C, code=A, salt=Y, during_init=1, idx=2)
D: EOFCREATE[2] (initcode execution)
G: eofcreate_addr(sender=D, code=A, salt=Y, during_init=1, idx=2)

@frangio
Copy link

frangio commented Nov 7, 2024

This seems to work. The approach seems equivalent to a list of indices pointing at a deeply nested subcontainer of code.

I find it hard to reason about though.

I think if you see G = eofcreate_addr(sender=D, code=A, salt=Y, during_init=1, idx=2), during_init=1 means that the runtime container at G was deployed by an init container located somewhere in A, in particular subcontainer index 2 of the init container that deployed D. Recursively you arrive at C, where during_init=0 means that the runtime container at C was deployed by the subcontainer A[1].

So the init containers for each of these contracts are:

  • C: A[1]
  • D: A[1][2]
  • G: A[1][2][2]

There seems to be some redundancy here:

the runtime container at G was deployed by an init container located somewhere in A, in particular subcontainer index 2 of the init container that deployed D

A is not really used in the procedure.

So an alternative could be to remove during_init and replace during_init=1 with code_address=0, since the actual code_address is implicit in sender_address in this case.

keccack256(sender_address || code_address || salt || init_subcontainer_idx)

Note that EOFCREATE in a DELEGATECALL context always results in code_address set to the target of DELEGATECALL, because that's where the init container is located, regardless of whether the sender is being deployed.

@frangio
Copy link

frangio commented Nov 7, 2024

Since it looks like we may have solved this issue I'll resurface my previous comment. With this change we would be losing the ability to make on-chain proofs about the behavior of an account without a trusted factory (although it would be recoverable with a zk-coprocessor). I do think we need to consider whether it's okay to remove that, or if it's a primitive that applications are relying on.

I'm currently weakly leaning towards probably okay to remove.

@pdobacz
Copy link
Member

pdobacz commented Nov 7, 2024

losing the ability to make on-chain proofs about the behavior of an account without a trusted factory

Can you clarify what kind of a behavior proof? Just that codehash(address) == particular hash? Could this be substituted by address_and_subcontainer_idx_of_a_particular_factory(address) == particular address and idx? That is instead of proving code hash of an address is X, we prove where exactly the code is coming from.

@frangio
Copy link

frangio commented Nov 7, 2024

Yeah that works if you have a trusted/known factory. This is probably enough.

@gumb0
Copy link
Contributor

gumb0 commented Nov 8, 2024

So an alternative could be to remove during_init and replace during_init=1 with code_address=0, since the actual code_address is implicit in sender_address in this case.

I like this variant, too. I would reframe it as we'd have 2 different schemes depending on whether EOFCREATE is called inside initcode:

  1. Non-nested EOFCREATE: keccak256(sender_address + code_address + salt + sub-container-index)
  2. Nested EOFCREATE inside initcode: keccak256(sender_address + salt + sub-container-index)

@charles-cooper
Copy link
Contributor

how about just removing the witness entirely? i.e. keccak256(sender_address + salt). i think the user is mainly interested that other users cannot trivially produce a collision with some salt they want to deploy to, but the EVM can let them be responsible for making sure they don't produce a collision with themselves.

@charles-cooper
Copy link
Contributor

one issue with using the initcontainer index in the hash is that it makes counterfactual address calculation potentially impossible on chain. since the EOFCREATE-ing contract cannot introspect the code of the factory address (that it delegates to), it cannot counterfactually produce the target address.

@charles-cooper
Copy link
Contributor

We identified that the keccak256(init-container) goes against the "code non-observability" because it locks in the contents of the init-container e.g. preventing re-writing it in some future upgrade.

also wanted to point out that using initcontainer index rules out certain types of code rewrites as well. for instance, reordering of initcontainers or fusing them.

@frangio
Copy link

frangio commented Nov 28, 2024

We've included Foo_hash in the final_salt, so that should be impossible.

It would be possible if the target of DELEGATECALL uses the proposed raw_salt in Solidity or just handwritten EVM code.

If we say that developers should only delegate to code they trust not to do this, we're back in the argument about the degree of responsibility put on them. CREATE2 collisions triggered by DELEGATECALL targets is not something they're responsible for today in legacy code.

@kuzdogan
Copy link

During the conversation in Nov 27 call a new separate metadata section was proposed and generally agreed upon: data that can't be read by EVM and any change to this does not affect the code (spec/EIP TBD).

I'd like to point out that, if we are to keep the initcode hash in the EOFCREATE parameters, there's a benefit to leaving out the metadata section in the hash calculation. This is a common current headache in CREATE2 contracts and reason many teams choose to opt-out of the metadata hash in the bytecode.

@pdobacz
Copy link
Member

pdobacz commented Dec 2, 2024

if we are to keep the initcode hash in the EOFCREATE parameters, there's a benefit to leaving out the metadata section in the hash calculation.

I see it more like the metadata section's being there becomes a strong argument for leaving out the initcode hash entirely - in any of the variants proposed here. Calculating initcode hash from a subset of sections sounds impractical to me.

@pdobacz
Copy link
Member

pdobacz commented Dec 2, 2024

It would be possible if the target of DELEGATECALL uses the proposed raw_salt in Solidity or just handwritten EVM code.

Oh yes, impossible when final_salt is used. But I assume using raw_salt would be intentional usage, when the CREATE2/EOFCREATE usage pattern must be somehow customized, and then you depart from the "CREATE2 guarantees parity".

If we say that developers should only delegate to code they trust not to do this

Isn't it already the case only trusted code should be delegated to? That is, in broader context than just creation logic. If we have a target function delegatecall_me_i'll_create_a_contract, I'd expect it would document precisely how it will create, if it uses the non-default raw_salt inside, and the caller would decide, if it can delegate to it.

Is there a usage pattern where that wouldn't work well enough @frangio?

@charles-cooper
Copy link
Contributor

Not sure about this, but I think CREATE2 caters for fully counterfactual deployments (code not on-chain), so it needs to include the initcode hash. EOFCREATE doesn't, so it has the opportunity to improve, fully counterfactual deployments will be up to the future TXCREATE.

speaking of counterfactuality, i think the issue with keccak256(msg.sender + code_address + salt) and the variants including the subcontainer index are that they can't be computed on-chain. i'm not sure how big of an issue this is, but it is already stepping outside of the design goals of the original CREATE2, which allow smart contracts to compute counterfactual deploy addresses.

@frangio
Copy link

frangio commented Dec 2, 2024

Isn't it already the case only trusted code should be delegated to?

Yes but the way creation salts are constructed is not a property one has to audit of DELEGATECALL targets at the moment. It's not impossible to audit, it's just a new checklist item with respect to legacy.

The way I'm thinking about this is global vs local properties, where global collision avoidance should be taken care of by the EVM and local collision avoidance should be ensured by the contract code, where "contract code" is the developer and their compiler, and the property should not be breakable by an end user (eventually an attacker) under any circumstance (barring Keccak256 breaking) including if the user is able to choose a DELEGATECALL target. I recognize this last part is pretty strong so I'm not attached to it.

@frangio
Copy link

frangio commented Dec 2, 2024

i think the issue with keccak256(msg.sender + code_address + salt) and the variants including the subcontainer index are that they can't be computed on-chain

Not sure what you mean by this. It can be computed on chain if you know the parameters, which the contract should document and would already be documented anyway, among other things because the salt is often constructed and not explicit in the input.

Can you describe end to end a scenario where you see an issue?

@charles-cooper
Copy link
Contributor

charles-cooper commented Dec 2, 2024

i think the issue with keccak256(msg.sender + code_address + salt) and the variants including the subcontainer index are that they can't be computed on-chain

Not sure what you mean by this. It can be computed on chain if you know the parameters, which the contract should document and would already be documented anyway, among other things because the salt is often constructed and not explicit in the input.

Can you describe end to end a scenario where you see an issue?

i mean like you need more arguments than are required for just the eofcreate, e.g.

def create_something() -> address:
    salt: bytes32 = self._compute_salt()
    return factory.create(args, salt)  # calls EOFCREATE or delegates to to another factory

def counterfactual() -> address:  # compute the address counterfactually of factory.create()
    salt: bytes32 = self._compute_salt()
    return ... # can't, need the final code_address and potentially the initcontainer index depending on the scheme

@frangio
Copy link

frangio commented Dec 2, 2024

Ok, I was going to suggest the factory should expose a getter for counterfactual addresses because it knows those parameters. But this is not possible if the factory is upgradeable because the code_address parameter becomes time dependent, future values are not known and all past values need to be stored.

@charles-cooper
Copy link
Contributor

charles-cooper commented Dec 2, 2024

right -- so i think the point is it would break an existing property of CREATE2, which is that you can counterfactually predict the address of invoking CREATE2 from just its inputs.

@pcaversaccio
Copy link

Maybe this has already been discussed and my comment is completely off-topic, but what I personally care about is having a way to guarantee cross-chain runtime bytecode equivalence. Both proposals here keccak256(msg.sender + salt) and keccak256(msg.sender + code_address + salt) do not guarantee this. I understand that we can't introspect code in EOF and that there is no initcode hash that can be used like in CREATE2 but I just want to raise a point that I deem important.

@frangio
Copy link

frangio commented Dec 11, 2024

I'd argue you need init code equivalence because the same runtime code initialized differently can have wildly different properties.

Both proposals here keccak256(msg.sender + salt) and keccak256(msg.sender + code_address + salt) do not guarantee this.

Indeed but note that if your goal is to build a generic factory like CreateX you will not be able to do that with EOFCREATE. Generic factories would be enabled by TXCREATE, where the init code hash is available and can be used in the salt to guarantee code equivalence. At the moment it's unclear if this will ship in Osaka. See EOF Implementers Call #63 for more discussion.

@charles-cooper
Copy link
Contributor

I'd argue you need init code equivalence because the same runtime code initialized differently can have wildly different properties.

That's an interesting point, although any observable difference in chain state can result in runtime code with different properties (examples: reading block.timestamp in the initcode, or calling view functions on some external contract).

@pdobacz
Copy link
Member

pdobacz commented Dec 11, 2024

One more thing to mention is that we weren't so far considering altering creation tx hashing scheme. It's currently same as legacy (sender + sender_nonce), but could be considered for EOF to include initcode hash and salt instead of the sender_nonce, at the expense of ugly requirement to append 32 bytes of salt after the init container and before calldata.

Plot twist: instead of after the initcontainer, the salt could be... in the tx initcontainer's EIP-7834 metadata section :P.

This is slightly offtopic, b/c this thread is for EOFCREATE, but if reliable-deterministic-cross-chain addresses are a concern, and we can't afford to wait for TXCREATE, maybe that is some way out?

@pcaversaccio
Copy link

I'd argue you need init code equivalence because the same runtime code initialized differently can have wildly different properties.

Right, I think it depends on the use case. For CreateX, for example, I care about runtime bytecode equivalence everywhere (CreateX is stateless by design and has an empty constructor) to ensure that the built-in contract creation functions are equivalent everywhere deployed (important as the factory allows for cross-chain frontrun protection for example). But I can see, how init code equivalence would make sense for many other applications.

Indeed but note that if your goal is to build a generic factory like CreateX you will not be able to do that with EOFCREATE. Generic factories would be enabled by TXCREATE, where the init code hash is available and can be used in the salt to guarantee code equivalence. At the moment it's unclear if this will ship in Osaka. See EOF Implementers Call #63 for more discussion.

Interesting - well after skimming through that proposal my first view is: Having a new transaction type to access this feature is an unnecessary overhead IMHO. We should strive for KISS, and not start adding new transaction types due to a bad design in EOFCREATE.

Plot twist: instead of after the initcontainer, the salt could be... in the tx initcontainer's EIP-7834 metadata section :P.

Interesting - I have to read that EIP first again.


I would like to mention that in yesterday's EOF call, it was mentioned that CreateX uses Nick's method. This is not true. I have the private key as backup, and here I elaborate on the decision why. Also, there are many RPCs that don't support per-EIP-155 transactions even though at the network level it would be supported (reason being that Geth defaults to non-support of pre-EIP-155 transactions since Berlin; see ethereum/go-ethereum#22339) as well as networks that simply don't support pre-EIP-155 transactions. Furthermore, some note on why Nick's method doesn't scale: I have 3 presigned transactions for CreateX creation available (see here), with one having 45m gasLimit. The last one couldn't be broadcasted on Ethereum due to today's block gasLimit. But even that one wouldn't be enough to e.g. deploy on Filecoin, which requires more than 100m gasLimit. So you see, having a backup key makes CreateX deployable on non-EVM-equivalent but on EVM-similar chains.

Lastly, if you're interested in some stats on CreateX, there is a community-maintained Dune dashboard: https://dune.com/patronumlabs/createx.

@pdobacz
Copy link
Member

pdobacz commented Jan 8, 2025

We've put together a summary doc with some possible scenarios of revising the hashing schemes/deployment methods for EOF: https://notes.ethereum.org/@ipsilon/SyrzctZSJg. The goal of this is to lay out our options in the clear and discuss them - whether we still can alter the EOF address schemes and if yes - what is the best way to do it.

All feedback very welcome. If the doc is missing a solution/scenario you'd like to make a case for, please let me know. I'm looking forward to discussing this in depth on the next EOF implementers call.

Taking the liberty to tag @pcaversaccio @charles-cooper @cameel @kuzdogan

@frangio
Copy link

frangio commented Jan 10, 2025

@pdobacz What does it mean for an approach to support AA deployments? What are the AA-specific challenges?

More generally, can you provide a description of each of the items in the comparison table? I think it would be useful to have as a list of desiderata.

These are the ones I can recall:

  • EOFCREATE hashing scheme:
    • Must prevent collisions between deployers
    • Should not include the init container hash (it requires code introspection)
    • May not have the same collision guarantees as CREATE/CREATE2 but should allow languages/libraries to recover them by appropriate construction of a salt
    • Ideally the developer is free to choose what to mix into the hash (input data, init code index or hash, etc.)
    • Contracts should be able to predict addresses as a function of inputs (not including contract state)

Additionally the following have come up:

  1. Same-address multi-chain deployments, where equivalent addresses roughly imply equivalent deployments
  2. Generic factories, which requires something like TXCREATE

IMO (1) is the main requirement, and (2) is just one technique currently used to achieve it. That is because same-address multi-chain deployments are extremely challenging, unless there is a preexisting multi-chain generic factory, so we make a big effort into deploying that factory to enable it. The issue with EOF is that it kills generic factories, but if multi-chain deployments were solved I believe this would not be a significant issue. Since it seems like TXCREATE would be a big source of ACDE friction, perhaps we can focus on solving multi-chain deployments in some other way.

@shemnon
Copy link
Contributor

shemnon commented Jan 13, 2025

Since EOF is the current "headliner" in Osaka I don't think we will have the same friction getting TXCREATE in, especially since it is being driven by end-user requirements and not driven by the evm devs.

Given that, I think Address+salt for both EOFCREATE and TXCREATE and a set of ERC standard contracts (with "standard" deployments) that address the use cases is what I see working. We may want to add / wrap the hash with a per-opcode value to prevent EOFCREATE and TXCREATE from having same-contract collisions. like hash(0xef0001ec || <address> || <salt> || 0xef0001ec) for EOFCREAE and use 0xef0001ed as the bumper for TXCREATE

@pdobacz
Copy link
Member

pdobacz commented Jan 14, 2025

@pdobacz What does it mean for an approach to support AA deployments? What are the AA-specific challenges?

"AA deployments" come up in the context of comparison of the pieces labelled (D/) and (F/), so TXCREATE and the nonce-less creation txs. While (D/) TXCREATE allows a smart contract wallet to deploy arbitrary code, while such creation txs do not, which is what sets the latter at a disadvantage.

Thanks for noting, I think this wording is too much of a mental shortcut. Should read "support deployments by smart contract wallets, as required by AA"

More generally, can you provide a description of each of the items in the comparison table? I think it would be useful to have as a list of desiderata.

Good point. I tried to make this self-explanatory by elaborating in the A, B, C... sections, but this seems to not be clear enough.

These are the ones I can recall:

* EOFCREATE hashing scheme:
  
  * Must prevent collisions between deployers
  * Should not include the init container hash (it requires code introspection)
  * May not have the same collision guarantees as CREATE/CREATE2 but should allow languages/libraries to recover them by appropriate construction of a salt
  * Ideally the developer is free to choose what to mix into the hash (input data, init code index or hash, etc.)
  * Contracts should be able to predict addresses as a function of inputs (not including contract state)

Additionally the following have come up:

1. Same-address multi-chain deployments, where equivalent addresses roughly imply equivalent deployments

2. Generic factories, which requires something like TXCREATE

IMO (1) is the main requirement, and (2) is just one technique currently used to achieve it. That is because same-address multi-chain deployments are extremely challenging, unless there is a preexisting multi-chain generic factory, so we make a big effort into deploying that factory to enable it. The issue with EOF is that it kills generic factories, but if multi-chain deployments were solved I believe this would not be a significant issue. Since it seems like TXCREATE would be a big source of ACDE friction, perhaps we can focus on solving multi-chain deployments in some other way.

Got it, so I think, in the table "Bytecode guarantees" boils down to 1., while "Generic factories" boils down to 2. My question is, whether or not 2. isn't of merit beyond providing 1. I initially thought having a factory which on one hand supports deploying arbitrary code, and on the other running some fixed logic (like registering the new contract in some registry or whatnot), would be useful. But if we can ascertain this is not a useful tool, I can put the row "Generic factories" off the comparison table and treat it only as means to accomplishing 1.

@frangio
Copy link

frangio commented Jan 14, 2025

support deployments by smart contract wallets, as required by AA

What deployments are needed? Say in the context of ERC-4337, a UserOp can specify a factory, but the factory doesn't need to be generic as far as I can tell.

@pdobacz
Copy link
Member

pdobacz commented Jan 15, 2025

support deployments by smart contract wallets, as required by AA

What deployments are needed? Say in the context of ERC-4337, a UserOp can specify a factory, but the factory doesn't need to be generic as far as I can tell.

Hm, Okay, maybe I'm missing sth, but let's take 4337's: (EDIT: I was missing something indeed, see below)

Create the account if it does not yet exist, using the initcode provided in the UserOperation...

I understand that this step requires can use a generic factory (that is, not an EOFCREATE one). In EOF with (D/) instead of deploying initcode provided in data would use a TXCREATE pointing to an initcontainer included in the tx's initcodes field (c.f. TXCREATE old spec). This isn't possible with (F/).

I think same challenge is with 7702 wallets deploying contracts.

EDIT: as noted by Frangio, I didn't understand the main usecase behind that initcode in ERC-4337, see comments that follow. The main usecase behind such initcode in ERC-4337 sense seems to be one which EOFCREATE does support.

@frangio
Copy link

frangio commented Jan 15, 2025

initCode is a misleading name so we should clarify that it's defined in ERC-4337 as:

concatenation of factory address and factoryData (or empty)

So initCode is not necessarily bytecode for a generic factory, it could very well be the address of a specialized factory (that can use EOFCREATE) and an encoded function call to request creation of an account from that factory. I'd expect that to be the most common way since it's cheaper. But it could technically also be bytecode for a generic factory, are there scenarios where this becomes necessary?

@pdobacz
Copy link
Member

pdobacz commented Jan 15, 2025

initCode is a misleading name so we should clarify that it's defined in ERC-4337 as:

OK, so it definitely misled me, thank you for bearing with me.

But it could technically also be bytecode for a generic factory, are there scenarios where this becomes necessary?

Yes, so this is my question as well, or more generally - how, if at all, can ERC-4337 SC Wallets deploy new contracts. In legacy EVM this is possibly using CREATE2, but I don't know if it is a pattern actually used or intended.

@frangio
Copy link

frangio commented Jan 15, 2025

how, if at all, can ERC-4337 SC Wallets deploy new contracts

Oh! I see what you mean now.

I just looked through Safe and Coinbase Smart Wallet as two examples, and couldn't find any way to directly deploy a contract from either. In the case of Safe it could be done through DELEGATECALL into a factory. I'll try asking other AA teams, but I don't see why this feature would be needed tbh.

@pdobacz
Copy link
Member

pdobacz commented Jan 29, 2025

On the last EOF implementers call 65 we arrived at some consensus to aim for pushing the Scenario 1b from https://notes.ethereum.org/@ipsilon/SyrzctZSJg, being the addition of TXCREATE, InitcodeTransaction and a predeployed Creator Contract(s) to serve as toe-hold contracts. Also keccak256(sender_address + salt) is proposed as the hashing scheme for both EOFCREATE and TXCREATE. Legacy-like creation transactions for EOF (EIP-7698) would at the same time be removed from EOF. Refer to the call notes/recording for details.

The change would be proposed in a new EIP (currently in the making), but it can be previewed in the PR to the EOF Megaspec document. The spec there is equivalent to that of the EIP being prepared.

Please take a look @pcaversaccio @charles-cooper @cameel @kuzdogan and provide feedback. I'm looking forward mainly to receiving confirmation that this approach satisfies the required deployment methods in use today. Or of course, if they don't, please let us know why and how we should fix it. We can also move that preliminary discussion to that EOF Megaspec document PR, before we have the EIP draft and a corresponding EthMag thread.

(Last minute heads-up: meanwhile we've identified what might be an issue. Quick gist: the EIP-7834 metadata section is at odds with the approach for a factory to include initcontainer_hash in the TXCREATE's salt to obtain bytecode guarantees. Hopefully, it's not a show-stopper. I'll write the details down later.)

@pcaversaccio
Copy link

On the last EOF implementers call 65 we arrived at some consensus to aim for pushing the Scenario 1b from https://notes.ethereum.org/@ipsilon/SyrzctZSJg, being the addition of TXCREATE, InitcodeTransaction and a predeployed Creator Contract(s) to serve as toe-hold contracts. Also keccak256(sender_address + salt) is proposed as the hashing scheme for both EOFCREATE and TXCREATE. Legacy-like creation transactions for EOF (EIP-7698) would at the same time be removed from EOF. Refer to the call notes/recording for details.

The change would be proposed in a new EIP (currently in the making), but it can be previewed in the PR to the EOF Megaspec document. The spec there is equivalent to that of the EIP being prepared.

Please take a look @pcaversaccio @charles-cooper @cameel @kuzdogan and provide feedback. I'm looking forward mainly to receiving confirmation that this approach satisfies the required deployment methods in use today. Or of course, if they don't, please let us know why and how we should fix it. We can also move that preliminary discussion to that EOF Megaspec document PR, before we have the EIP draft and a corresponding EthMag thread.

(Last minute heads-up: meanwhile we've identified what might be an issue. Quick gist: the EIP-7834 metadata section is at odds with the approach for a factory to include initcontainer_hash in the TXCREATE's salt to obtain bytecode guarantees. Hopefully, it's not a show-stopper. I'll write the details down later.)

I don't want to sidetrack the discussion but this is interesting from EOF Megaspec document:

Image

If I'm not mistaken, this would be the first predeploy (!= precompile) on Ethereum. Is there some discussion (e.g. an EIP) for this? I have seen such an approach for many L2s, but not yet for Ethereum.

So scenario 1b) states:

C/ EOFCREATE hashes with keccak256(sender + salt)
D/ TXCREATE hashes with keccak256(sender + salt)
E/ a predeployed TXCREATE factory contract to bootstrap EOF

I'm sorry if I missed the conversation around it, but you claim this scenario has "Bytecode guarantees" in here:

Image

How can you exactly guarantee the bytecode for a counterfactual contract address without using the init code in EOFCREATE and TXCREATE?

@kuzdogan
Copy link

Thanks @pdobacz I think I understood most of it but as a less versed person on the spec and terminology I'd want to summarize my takeaways and maybe you can correct me:

  • There will be no creation tx's, ie. tx's with to=null won't be able to deploy EOF code, only legacy code
  • There will be a new tx type InitcodeTransaction with a new initcodes field containing the initcodes to be deployed. This tx has to be sent to a contract that utilizes TXCREATE opcode.
  • There will be a TXCREATE factory contract deployed whose sole job will be to deploy each of codes in the initcodes field's array. Say this contract will be at 0xabcd.
  • One can't deploy contracts from an EOA by to=null tx's but can send a InitcodeTransaction type transaction to 0xabcd contract. That means all the EVM tooling for this specific chain needs to know ahead the address of this contract 0xabcd?

(Last minute heads-up: meanwhile we've identified what might be an issue. Quick gist: the EIP-7834 metadata section is at odds with the approach for a factory to include initcontainer_hash in the TXCREATE's salt to obtain bytecode guarantees. Hopefully, it's not a show-stopper. I'll write the details down later.)

Where is the initcontainer_hash in this whole picture? I also some initcode hash being mentioned in TXCREATE specs:

    - pops one more value from the stack (first argument): `tx_initcode_hash`
    - loads the initcode EOF container from the transaction `initcodes` array which hashes to `tx_initcode_hash`

But I don't see it in the 1b option. It's just keccak(sender + salt):

Image

@pdobacz
Copy link
Member

pdobacz commented Jan 29, 2025

Thank you for taking a look, this is much appreciated! Let me try to clarify everything.

Actually, I'm sorry, but the Creator Contract source code mentioned in the PR got an error (now fixed) - TXCREATE looks up the initcontainer by its hash, not its index. TXCREATE's description was correct, but maybe the incorrect Creator Contract source-code misled you both.

Having fixed that, to answer your specific questions:

How can you exactly guarantee the bytecode for a counterfactual contract address without using the init code in EOFCREATE and TXCREATE?

@pcaversaccio Consider a factory pseudocode (I omitted value and input for brevity):

function createWithInitcodeWitness(initcode_hash, salt) {
    final_salt = keccak256(initcode_hash || salt)
    return txcreate(initcode_hash, final_salt)
}

If such factory is deployed at guarantees_factory and gets CALLed, it will deploy using the initcode entry from the transaction which corresponds to initcode_hash (by the rules of TXCREATE). At the same time initcode_hash is included in the final_salt passed on to TXCREATE, so new_address = keccak256(0xff || guarantees_factory_address || keccak256(initcode_hash || salt)).

I'd want to summarize my takeaways and maybe you can correct me:

@kuzdogan looks all correct. A minor remark is that the intention is for the 0xabcd to be the same for all chains adopting the EIP (and EOF), likely an address from the precompile addresses range. I'm not 100% sure if this is something we can expect to hold?

But I don't see it in the 1b option. It's just keccak(sender + salt):

Maybe the above answer to pcaversaccio helps? It would be up to the specific factory to include it.

I now realized that the "bootstrap" Creator Contract should work more similar to the createWithInitcodeWitness function, i.e. should include initcode_hash in the address. This would ensure that all "derived" TXCREATE factories land in same addresses iff they have the same code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants