-
Notifications
You must be signed in to change notification settings - Fork 49
feat: OPCM v2 design doc #353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This PR adds the design document for OPCM v2.
protocol/opcm-v2.md
Outdated
| struct GameConfig { | ||
| GameType gameType; | ||
| address implementation; | ||
| bytes32 prestate; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since proofs contracts are now MCP-compatible, won't there be a 1:1 mapping between game type and implementation address, so we can remove the implementation arg?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, correct
| - **Add game types**: Implicit in `upgrade()` with modified games array | ||
| - **Update prestates**: Implicit in `upgrade()` with modified games array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about preserving addGameType and updatePrestates as convenience functions? We often want to do just those actions (e.g. changing prestate for a hard fork) and it feels error prone and hard to review to pass a full Config into upgrade for that. These convenience functions would just gather the other inputs before calling upgrade
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
upgrade() already takes very minimal inputs, so this is already considered
| **Mitigation**: | ||
| - Comprehensive security audit with focus on the new flexible interfaces | ||
| - Defensive coding practices throughout implementation | ||
| - State diff review for the first OPCM v2 upgrade to verify correctness |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Who needs to do state diff review—all signers, or just the devs creating the superchain-ops task?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just the devs creating the superchain ops task, the first time?
| **Mitigation**: | ||
| - Ensure all existing OPCM v1 tests pass with v2 implementation | ||
| - Comprehensive test coverage mapping v1 functionality to v2 | ||
| - Review session with all customer teams to validate their use cases are covered |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like an easy way to mitigate this is by not changing any external function signatures until the very end
| - Solidity function signatures/ABIs are the source of truth | ||
| - Go code for constructing function calls is automatically generated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we plan to do this generation? We've been trying to move away from the Geth codegen tool because it creates some pretty awful code and is quite limiting in how you can actually make calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've also seen issues where the struct definition is updated but the new fields aren't actually populated. We have some static analysis for this in op-deployer now but worth thinking about how we can ensure we get compiler errors if things are out of sync (including verifying the code was regenerated when required).
protocol/opcm-v2.md
Outdated
| - **Upgrades**: Only a subset of configuration is provided in the upgrade input; the remaining configuration is gathered automatically from the existing deployed contracts | ||
|
|
||
| This asymmetry allows upgrades to specify only what changes, while deployments must provide complete configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this expressed in code? How do you know what you need to provide when calling upgrade? The ABIs above pass the same Config object to both deploy and upgrade so it seems unclear how to know if an upgrade requires a new setting be specified or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going to fix the ABIs here to be more explicit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, can you take another look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what's confusing me is lines 83 and 84 have deploy(Config) and upgrade(Config) which makes it look like they take the same inputs. In which case, how do you know which parts of that Config are required to be specified for the upgrade being done?
But maybe I'm looking at the wrong place for the ABI you updated.
| - **Deploy new chains**: `deploy()` function | ||
| - **Upgrade existing chains**: `upgrade()` function | ||
| - **Dynamic game types**: GameConfig array in Config struct | ||
| - **Add game types**: Implicit in `upgrade()` with modified games array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that there isn't a way to iterate through the list of set dispute game implementations (it's a map). Apart from migrating to interop, OPCM doesn't support removing game types, but it's worth noting that upgrade won't remove existing game types if they aren't in the games array. That feels a little more surprising with this ABI but I don't think it's unreasonable.
You can remove games by directly calling DisputeGameFactory.setImplementation and setting a 0x00 implementation address.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the same thing applies to any maps set as part of initialize methods. That could open up some foot guns because map entries get left around after an upgrade and we don't have a way to specify that they should be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From discussion: For game types, we have a fixed list that is supported so they can just be removed upfront and then re-add what is specified to make this work.
For the generic case we should ensure that initialisers only set simple fields and don't write to maps so they are safe to call repeatedly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that there isn't a way to iterate through the list of set dispute game implementations (it's a map).
We should consider changing that mapping into an enumerable set so that it becomes iterable, we've found this structure to be quite useful elsewhere.
protocol/opcm-v2.md
Outdated
| bytes32 prestate; | ||
| } | ||
|
|
||
| struct Config { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we add a new field/change an existing field on the Config struct, won't we still have to update OP Deployer to plumb that field through the intent and into the OPCM? Is the idea that the codegen would make this process easier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we would still have to update op-deployer no matter what. Idea is to codegen this, yep.
| Required reviewers: | ||
| - **Proofs team representative**: Primary stakeholder for interface flexibility requirements | ||
| - **Protocol team lead**: For overall architectural alignment | ||
| - **EVM Safety team representative**: For security and safety validation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need somebody from platforms as well to make sure it'll integrate well with deployer.
protocol/opcm-v2.md
Outdated
| ### Architecture Overview | ||
|
|
||
| OPCM v2 consolidates the 5 separate functions of v1 into 2 primary functions: | ||
| - `deploy(Config memory config)`: Deploys a new OP Stack chain |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this include deploying blueprints?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd expect that Blueprints will still be deployed in DeployImplementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd expect that Blueprints will still be deployed in DeployImplementations.
Correct
|
|
||
| **Phase 1: Chain World Assembly (Load-or-Deploy Pattern)** | ||
|
|
||
| Both functions begin by gathering the complete set of contracts for the chain—the "chain world"—which includes SystemConfig, OptimismPortal, AnchorStateRegistry, and all other chain contracts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this include the Superchain contracts? How do we upgrade those?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good callout. The doc mentions 5 functions (deploy(), upgrade(), addGameType(), updatePrestate(), interopMigrate()), but there is now also upgradeSuperchainConfig().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good callout, we are maintaining a separate upgradeSuperchain function too
|
One other thing - what's the rollout plan here? Would love for us to validate that this solves the Proofs team's problems as early as possible. |
|
|
||
| ### Implementation Flow | ||
|
|
||
| Both `deploy()` and `upgrade()` follow a unified three-phase flow: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does upgrade() now operate on a single chain or do they still accept an array as input?
IIRC from the prototype it's a single chain, which makes sense in order to make deploy() and upgrade() consistent. We can always do the loop using multicall.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
upgrade() operates on a single chain, you can loop via multicall
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the rationale for this change? @maurelian do you recall the original rationale for accepting an array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I figured there wasn't any real benefit to an input with an array vs just calling the function multiple times in the same txn. If someone has an obvious reason why an array is valuable, happy to consider.
|
|
||
| Generate the Go glue code between OPCM and op-deployer from Solidity interfaces: | ||
| - Solidity function signatures/ABIs are the source of truth | ||
| - Go code for constructing function calls is automatically generated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this enable us to get rid of the Solidity scripts in scripts/deploy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many of them, yes
|
|
||
| Both functions begin by gathering the complete set of contracts for the chain—the "chain world"—which includes SystemConfig, OptimismPortal, AnchorStateRegistry, and all other chain contracts. | ||
|
|
||
| The load-or-deploy pattern attempts to load each contract address from an expected source (e.g., a known deployment address for upgrades, or a predictable location). If the contract cannot be found at the expected location, it is deployed instead. This unified pattern means the same chain world loading function serves both deploy and upgrade operations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the contract cannot be found at the expected location, it is deployed instead.
It feels odd to just casually say "if the contract cannot be found". ie. If a system is already deployed, then all the contracts should exist, and having one not exist would be a very strange thing. Is this meant to imply a situation where a net-new contract is being added during an upgrade?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, meant to imply a net-new contract being added during an upgrade
|
|
||
| ### Constraints | ||
|
|
||
| - Must integrate with existing op-deployer Go tooling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we aiming to maintain the existing op-deployer cli interface here? If we are planning to break anything here, we should consult with users to see what hardcoded invocations they might have in scripts and how easily they can update them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are aiming to maintain the existing interface for now, but then deprecate that interface in favor of a simpler one
| **Risk**: We might inadvertently miss some existing functionality that OPCM v1 provides when transitioning to the new 2-function interface. | ||
|
|
||
| **Mitigation**: | ||
| - Ensure all existing OPCM v1 tests pass with v2 implementation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we planning to use the 'strangler pattern' here, so that both OPCMs exist and are tested in parallel during development of V2? This was used successfully when migrating the deploy scripts to the new pattern without the I/O contracts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, we are doing that
|
|
||
| 1. **Alternative designs**: Are there alternative architectural approaches we may have missed that would better solve these problems? | ||
|
|
||
| 2. **Shipping timeline**: When should we ship this? Is U18 (upgrade 18) the target, or should it be later? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
U18 is going to audit with the enabling changes, but it seems unlikely to me that OPCM v2 itself will be ready at the same time as other keystone features that justify releasing U18.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, we will probably aim for U19
|
|
||
| **Rejection rationale**: The core OPCM design is good. We don't want to replace it; we want to improve specific aspects. A complete redesign would be throwing out the baby with the bathwater. The v2 approach preserves what works while fixing what doesn't. | ||
|
|
||
| ## Risks and Uncertainties |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per-tx gas usage is called out above, but might want to add a risk/uncertainty for gas usage in upgrade() and deploy() txs unless we've already measured the new versions of those functions. For example, would be good to determine if there is a limit on number of dispute games that can be deployed in a single upgrade() or deploy() tx before it reaches the 16.77Mgas/tx fusaka limit. Any other config/operations that should be limited to ensure fusaka-compatibility?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a test in place to make sure we remain below the Fusaka limit!
This PR adds the design document for OPCM v2.