From 6b775938206d8ccb3845457fdae6afed14e7479e Mon Sep 17 00:00:00 2001
From: Mark Tyneway <mark.tyneway@gmail.com>
Date: Tue, 18 Mar 2025 22:15:40 -0600
Subject: [PATCH 1/7] interop: design doc for topology

Design doc that is meant to define clearly the topology for a cloud
deployment of an OP Stack cluster. This is not meant to define the
only way to deploy a cluster, but is the architecture that we will
do our security review against to ensure that it is not possible to
attack a cluster.
---
 protocol/interop-topology.md | 145 +++++++++++++++++++++++++++++++++++
 1 file changed, 145 insertions(+)
 create mode 100644 protocol/interop-topology.md

diff --git a/protocol/interop-topology.md b/protocol/interop-topology.md
new file mode 100644
index 00000000..6e220442
--- /dev/null
+++ b/protocol/interop-topology.md
@@ -0,0 +1,145 @@
+# Purpose
+
+<!-- This section is also sometimes called “Motivations” or “Goals”. -->
+
+<!-- It is fine to remove this section from the final document,
+but understanding the purpose of the doc when writing is very helpful. -->
+
+This document exists to drive consensus and act on a reference of the preferred topology
+of a cloud deployment of an interop enabled OP Stack cluster.
+
+# Summary
+
+<!-- Most (if not all) documents should have a summary.
+While the length will likely be proportional to the length of the full document,
+the summary should be as succinct as possible. -->
+
+The production topology of an interop cluster checks the validity of cross chain transactions at:
+- cloud ingress (`proxyd`)
+- sentry node mempool ingress
+- sentry node mempool on interval
+- sequencer node mempool on interval
+
+The validity of cross chain transactions are not checked at:
+- sequencer node mempool ingress
+- block builder
+
+It is safe to not check at block building time because if the check passes at ingress
+time and passes again at block building time, it does not exhaustively cover all cases.
+It is possible that the remote reorg happens **after** the local block is sealed.
+In practice, it is far more likely to have an unsafe reorg on the remote chain that is not
+caught **before** the local block is sealed because the time between checking at the mempool
+and checking at block building is so small. Therefore we should not add a remote RPC request
+as part of the block building happy path, but we should still implement and benchmark it.
+
+TODO: include information on recommendation around standard mode/multi node/running multiple supervisors.
+Should include information on the value props of each so we can easily align on roadmap.
+
+# Problem Statement + Context
+
+<!-- Describe the specific problem that the document is seeking to address as well
+as information needed to understand the problem and design space.
+If more information is needed on the costs of the problem,
+this is a good place to that information. -->
+
+It is a goal to remove as many async, blocking operations from the hot path of
+the block builder as possible. Validating an interop cross chain transaction
+requires a remote RPC request to the supervisor. Having this as part of the hot
+path introduces a denial of service risk. Specifically, we do not want to have so
+many interop transactions in a block that it takes longer than the blocktime
+to build a block.
+
+To prevent this sort of issue, we want to move the validation on interop
+transactions to as early as possible in the process, so the hot path of the block builder
+only needs to focus on including transactions.
+
+For context, a cross chain transaction is defined in the [specs](https://github.com/ethereum-optimism/specs/blob/85966e9b809e195d9c22002478222be9c1d3f562/specs/interop/overview.md#interop). Any reference
+to the supervisor means [op-supervisor](https://github.com/ethereum-optimism/design-docs/blob/d732352c2b3e86e0c2110d345ce11a20a49d5966/protocol/supervisor-dataflow.md).
+
+# Proposed Solution
+
+<!-- A high level overview of the proposed solution.
+When there are multiple alternatives there should be an explanation
+of why one solution was picked over other solutions.
+As a rule of thumb, including code snippets (except for defining an external API)
+is likely too low level. -->
+
+## Solution
+
+### `op-supervisor` alternative backend
+
+We add a backend mode to `op-supervisor` that operates specifically by using dynamic calls to `eth_getLogs`
+to validate cross chain messages rather than its local index. This could be accomplished by adding
+new RPC endpoints that do this or could be done with runtime config. When `op-supervisor` runs in this
+mode, it is a "light mode" that only supports `supervisor_validateMessagesV2` and `supervisor_validateAccessList`
+(potentially a subset of their behavior). This would give us a form of "client diversity" with respect
+to validating cross chain messages. This is a low lift way to reduce the likelihood of a forged initiating
+message. A forged initiating message would be tricking the caller into believing that an initiating
+message exists when it actually doesn't, meaning that it could be possible for an invalid executing
+message to finalize.
+
+TODO: feedback to understand what capabilities are possible
+
+### `proxyd`
+
+We update `proxyd` to validate interop messages on cloud ingress. It should check against both the indexed
+backend of `op-supervisor` as well as the alternative backend.
+
+### Sentry Node Mempool Ingress
+
+We update the EL clients to validate interop transactions on ingress to the mempool. This should be a different
+instance of `op-supervisor` than the one that is used by `proxyd` to reduce the likelihood of a nondeterministic
+bug within `op-supervisor`.
+
+### Sentry Node + Sequencer Mempool on Interval
+
+We update the EL clients to validate interop transactions on an interval in the mempool. Generally the mempool
+will revalidate all transactions on each new block, but for an L2 that has 1-2s blocktime, that is quite often.
+It may be the case that we could get away with a batch RPC request every 1-2s, but generally we should not do
+`n` RPC requests on each block where `n` is the number of transactions that include statically declared executing
+messages.
+
+### Block Building
+
+Lets say that it takes 100ms for the transaction to be checked at `proxyd`, checked at the mempool of the sentry node,
+forwarded to the sequencer and pulled into the block builder. The chances of the status of an initiating message
+going from existing to not existing during that timeframe is extremely small. Even if we did check at the block builder,
+it doesn't capture the case of a future unsafe chain reorg happening that causes the message to become invalid.
+Because it is most likely that the remote unsafe reorg comes after the local block is sealed, there is no real
+reason to block the hot path of the chain with the remote lookups.
+
+## Resource Usage
+
+<!-- What is the resource usage of the proposed solution?
+Does it consume a large amount of computational resources or time? -->
+
+Doing a remote RPC request is always going to be an order of magnitude slower than doing a local lookup.
+Therefore we want to ensure that we can parallelize our remote lookups as much as possible. Block building
+is inherently a single threaded process given that the ordering of the transactions is very important.
+
+## Single Point of Failure and Multi Client Considerations
+
+There is a single point of failure with the `op-supervisor`. Adding an alternative backend that dynamically
+fetches data using `eth_getLogs` rather than looking up its local database helps to approximate a second implementation.
+
+<!-- Details on how this change will impact multiple clients. Do we need to plan for changes to both op-geth and op-reth? -->
+
+# Alternatives Considered
+
+<!-- List out a short summary of each possible solution that was considered.
+Comparing the effort of each solution -->
+
+## Block Building
+
+The main alternative to not validating transactions at the block builder is validating transactions
+at the block builder. We would like to have this feature implemented because it can work for simple networks,
+as well as act as an ultimate fallback to keep interop messaging live, but we do not want to run it as
+part of the happy path.
+
+# Risks & Uncertainties
+
+<!-- An overview of what could go wrong.
+Also any open questions that need more work to resolve. -->
+
+We really need to measure everything to validate our hypothesis on the ideal architecture.
+To validate the ideal architecture, we need to measure it and then try to break it.
\ No newline at end of file

From 427caac3a4d59cc53119bd457c91bd895e2e5571 Mon Sep 17 00:00:00 2001
From: Mark Tyneway <mark.tyneway@gmail.com>
Date: Wed, 19 Mar 2025 15:58:22 -0600
Subject: [PATCH 2/7] typo: fix

Co-authored-by: Axel Kingsley <axel.kingsley@gmail.com>
---
 protocol/interop-topology.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/protocol/interop-topology.md b/protocol/interop-topology.md
index 6e220442..860df68c 100644
--- a/protocol/interop-topology.md
+++ b/protocol/interop-topology.md
@@ -42,7 +42,7 @@ as information needed to understand the problem and design space.
 If more information is needed on the costs of the problem,
 this is a good place to that information. -->
 
-It is a goal to remove as many async, blocking operations from the hot path of
+It is a goal to remove as many sync, blocking operations from the hot path of
 the block builder as possible. Validating an interop cross chain transaction
 requires a remote RPC request to the supervisor. Having this as part of the hot
 path introduces a denial of service risk. Specifically, we do not want to have so

From 3cb9bd5268d4cf4a612b344993d62c51272464d1 Mon Sep 17 00:00:00 2001
From: axelKingsley <axel.kingsley@gmail.com>
Date: Thu, 20 Mar 2025 13:38:03 -0500
Subject: [PATCH 3/7] Add Host Redundancy Topic

---
 protocol/interop-topology.md | 59 ++++++++++++++++++++++++++++++------
 1 file changed, 50 insertions(+), 9 deletions(-)

diff --git a/protocol/interop-topology.md b/protocol/interop-topology.md
index 860df68c..3cb4430b 100644
--- a/protocol/interop-topology.md
+++ b/protocol/interop-topology.md
@@ -36,12 +36,9 @@ TODO: include information on recommendation around standard mode/multi node/runn
 Should include information on the value props of each so we can easily align on roadmap.
 
 # Problem Statement + Context
+There are two sets of problems to solve in our topology and software arrangement:
 
-<!-- Describe the specific problem that the document is seeking to address as well
-as information needed to understand the problem and design space.
-If more information is needed on the costs of the problem,
-this is a good place to that information. -->
-
+## Design Tx Ingress Flow for Optimal Latency, Correctness
 It is a goal to remove as many sync, blocking operations from the hot path of
 the block builder as possible. Validating an interop cross chain transaction
 requires a remote RPC request to the supervisor. Having this as part of the hot
@@ -56,7 +53,11 @@ only needs to focus on including transactions.
 For context, a cross chain transaction is defined in the [specs](https://github.com/ethereum-optimism/specs/blob/85966e9b809e195d9c22002478222be9c1d3f562/specs/interop/overview.md#interop). Any reference
 to the supervisor means [op-supervisor](https://github.com/ethereum-optimism/design-docs/blob/d732352c2b3e86e0c2110d345ce11a20a49d5966/protocol/supervisor-dataflow.md).
 
-# Proposed Solution
+## Arrange Infrastructure to Maximize Redundancy
+It is a goal to ensure that there are no single points of failure in the network infrastructure that runs an interop network.
+To that end, we need to organize hosts such that sequencers and supervisors may go down without an interruption.
+
+# Proposed Solutions
 
 <!-- A high level overview of the proposed solution.
 When there are multiple alternatives there should be an explanation
@@ -64,7 +65,8 @@ of why one solution was picked over other solutions.
 As a rule of thumb, including code snippets (except for defining an external API)
 is likely too low level. -->
 
-## Solution
+## TX Ingress Flow
+There are lots of gates and checks we can establish for inflowing Tx to prevent excess work (a DOS vector) from reaching the Supervisor.
 
 ### `op-supervisor` alternative backend
 
@@ -108,7 +110,7 @@ it doesn't capture the case of a future unsafe chain reorg happening that causes
 Because it is most likely that the remote unsafe reorg comes after the local block is sealed, there is no real
 reason to block the hot path of the chain with the remote lookups.
 
-## Resource Usage
+### Resource Usage
 
 <!-- What is the resource usage of the proposed solution?
 Does it consume a large amount of computational resources or time? -->
@@ -117,13 +119,36 @@ Doing a remote RPC request is always going to be an order of magnitude slower th
 Therefore we want to ensure that we can parallelize our remote lookups as much as possible. Block building
 is inherently a single threaded process given that the ordering of the transactions is very important.
 
-## Single Point of Failure and Multi Client Considerations
+### Single Point of Failure and Multi Client Considerations
 
 There is a single point of failure with the `op-supervisor`. Adding an alternative backend that dynamically
 fetches data using `eth_getLogs` rather than looking up its local database helps to approximate a second implementation.
 
 <!-- Details on how this change will impact multiple clients. Do we need to plan for changes to both op-geth and op-reth? -->
 
+## Host Topology / Arrangement
+
+In order to fully validate a Superchain, a Supervisor must be hooked up to one Node per chain (with one Executing Engine behind each).
+We can call this group a "full validation stack" because it contains all the executing parts to validate a Superchain.
+
+In order to have redundancy, we will need multiple Nodes, and also *multiple Supervisors*.
+We should use Conductor to ensure the Sequencers have redundancy as well.
+Therefore, we should arrange the nodes like so:
+
+|             | Chain A | Chain B | Chain C |
+|------------|---------|---------|---------|
+| Supervisor 1 | A1      | B1      | C1      |
+| Supervisor 2 | A2      | B2      | C2      |
+| Supervisor 3 | A3      | B3      | C3      |
+
+In this model, each chain has one Conductor, which joins all the Sequencers for a given network. And each heterogeneous group of Sequencers is joined by a Supervisor.
+This model gives us redundancy for both Sequencers *and* Supervisors. If an entire Supervisor were to go down,
+there are still two full validation stacks processing the chain correctly.
+
+There may need to be additional considerations the Conductor makes in order to determine failover,
+but these are not well defined yet. For example, if the Supervisor of the active Sequencer went down,
+it may be prudent to switch the active Sequencer to one with a functional Supervisor.
+
 # Alternatives Considered
 
 <!-- List out a short summary of each possible solution that was considered.
@@ -136,6 +161,22 @@ at the block builder. We would like to have this feature implemented because it
 as well as act as an ultimate fallback to keep interop messaging live, but we do not want to run it as
 part of the happy path.
 
+## Multi-Node (redundancy solution)
+
+One request that has been made previously is to have "Multi-Node" support. In this model,
+multiple Nodes for a single chain are connected to the same Supervisor. To be clear, the Supervisor software
+*generally* supports this behavior, with a few known edge cases where secondary Nodes won't sync fully.
+
+The reason this solution is not the one being proposed is two-fold:
+- Managing multiple Nodes sync status from a single Supervisor is tricky -- you have to be able to replay
+all the correct data on whatever node is behind, must be able to resolve conflicts between reported blocks,
+and errors on one Node may or may not end up affecting the other Nodes. While this feature has some testing,
+the wide range of possible interplay means we don't have high confidence in Multi-Node as a redundancy solution.
+- Multi-Node is *only* a Node redundancy solution, and the Supervisor managing multiple Nodes is still a single
+point of failure. If the Supervisor fails, *every* Node under it is unable to sync also, so there must *still*
+be a diversification of Node:Supervisor. At the point where we split them up, it makes no sense to have higher quanitities
+than 1:1:1 Node:Chain:Supervisor.
+
 # Risks & Uncertainties
 
 <!-- An overview of what could go wrong.

From ae00bfc887da501cfec35ce4b06ce16786301eea Mon Sep 17 00:00:00 2001
From: axelKingsley <axel.kingsley@gmail.com>
Date: Tue, 25 Mar 2025 12:07:15 -0500
Subject: [PATCH 4/7] Editorial Pass

---
 protocol/interop-topology.md | 161 +++++++++++++++++++++--------------
 1 file changed, 96 insertions(+), 65 deletions(-)

diff --git a/protocol/interop-topology.md b/protocol/interop-topology.md
index 3cb4430b..2a3fa1c2 100644
--- a/protocol/interop-topology.md
+++ b/protocol/interop-topology.md
@@ -10,35 +10,26 @@ of a cloud deployment of an interop enabled OP Stack cluster.
 
 # Summary
 
-<!-- Most (if not all) documents should have a summary.
-While the length will likely be proportional to the length of the full document,
-the summary should be as succinct as possible. -->
+The creation of Interop transactions opens Optimistim Networks to new forms of undesierable activity.
+Specifically, including an interop transaction carries two distinct risks:
+- If an interop transaction is included which is *invalid*, the block which contains it is invalid too,
+and must be replaced, causing a reorg.
+- If the block building and publishing system spends too much time validating an interop transaction,
+callers may exploit this effort to create DOS conditions on the network, where the chain is stalled or slowed.
 
-The production topology of an interop cluster checks the validity of cross chain transactions at:
-- cloud ingress (`proxyd`)
-- sentry node mempool ingress
-- sentry node mempool on interval
-- sequencer node mempool on interval
+The new component `op-supervisor` serves to efficiently compute and index cross-safety information across all chains
+in a dependency set. However, we still need to decide on the particular arrangement of components,
+and the desired flow for a Tx to satisfy a high degree of correctness without risking networks stalls.
 
-The validity of cross chain transactions are not checked at:
-- sequencer node mempool ingress
-- block builder
-
-It is safe to not check at block building time because if the check passes at ingress
-time and passes again at block building time, it does not exhaustively cover all cases.
-It is possible that the remote reorg happens **after** the local block is sealed.
-In practice, it is far more likely to have an unsafe reorg on the remote chain that is not
-caught **before** the local block is sealed because the time between checking at the mempool
-and checking at block building is so small. Therefore we should not add a remote RPC request
-as part of the block building happy path, but we should still implement and benchmark it.
-
-TODO: include information on recommendation around standard mode/multi node/running multiple supervisors.
-Should include information on the value props of each so we can easily align on roadmap.
+In this document we will propose the desired locations and schedule for validating transactions for high correctness
+and low impact. We will also propose a desired arrangement of hosts to maximize redundancy in the event that
+some component *does* fail.
 
 # Problem Statement + Context
-There are two sets of problems to solve in our topology and software arrangement:
 
-## Design Tx Ingress Flow for Optimal Latency, Correctness
+Breaking the problem into two smaller parts:
+
+## TX Flow - Design for Correctness, Latency
 It is a goal to remove as many sync, blocking operations from the hot path of
 the block builder as possible. Validating an interop cross chain transaction
 requires a remote RPC request to the supervisor. Having this as part of the hot
@@ -53,64 +44,61 @@ only needs to focus on including transactions.
 For context, a cross chain transaction is defined in the [specs](https://github.com/ethereum-optimism/specs/blob/85966e9b809e195d9c22002478222be9c1d3f562/specs/interop/overview.md#interop). Any reference
 to the supervisor means [op-supervisor](https://github.com/ethereum-optimism/design-docs/blob/d732352c2b3e86e0c2110d345ce11a20a49d5966/protocol/supervisor-dataflow.md).
 
-## Arrange Infrastructure to Maximize Redundancy
+## Redundancy - Design for Maximum Redundancy
 It is a goal to ensure that there are no single points of failure in the network infrastructure that runs an interop network.
 To that end, we need to organize hosts such that sequencers and supervisors may go down without an interruption.
 
-# Proposed Solutions
+This should include both Sequencers, arranged with Conductors, as well as redundancy on the Supervisors themselves.
 
-<!-- A high level overview of the proposed solution.
-When there are multiple alternatives there should be an explanation
-of why one solution was picked over other solutions.
-As a rule of thumb, including code snippets (except for defining an external API)
-is likely too low level. -->
+# Proposed Solutions
 
 ## TX Ingress Flow
-There are lots of gates and checks we can establish for inflowing Tx to prevent excess work (a DOS vector) from reaching the Supervisor.
-
-### `op-supervisor` alternative backend
-
-We add a backend mode to `op-supervisor` that operates specifically by using dynamic calls to `eth_getLogs`
-to validate cross chain messages rather than its local index. This could be accomplished by adding
-new RPC endpoints that do this or could be done with runtime config. When `op-supervisor` runs in this
-mode, it is a "light mode" that only supports `supervisor_validateMessagesV2` and `supervisor_validateAccessList`
-(potentially a subset of their behavior). This would give us a form of "client diversity" with respect
-to validating cross chain messages. This is a low lift way to reduce the likelihood of a forged initiating
-message. A forged initiating message would be tricking the caller into believing that an initiating
-message exists when it actually doesn't, meaning that it could be possible for an invalid executing
-message to finalize.
-
-TODO: feedback to understand what capabilities are possible
+There are multiple checks we can establish for inflowing Tx to prevent excess work (a DOS vector) from reaching the Supervisor.
 
 ### `proxyd`
 
-We update `proxyd` to validate interop messages on cloud ingress. It should check against both the indexed
+We can update `proxyd` to validate interop messages on cloud ingress. It should check against both the indexed
 backend of `op-supervisor` as well as the alternative backend.
+Because interop transactions are defined by their Access List,
+`proxyd` does not have to execute any transactions to make this request.
+This filter will eliminate all interop transactions made in bad faith, as they will be obviously invalid.
+
+It may be prudent for `proxyd` to wait and re-test a transaction after a short timeout (`1s` for example)
+to allow through transactions that are valid against the bleeding edge of chain content. `proxyd` can have its own
+`op-supervisor` and `op-node` cluster specifically to provide cross safety queries without putting any load on other
+parts of the network.
 
 ### Sentry Node Mempool Ingress
 
-We update the EL clients to validate interop transactions on ingress to the mempool. This should be a different
+We can update the EL clients to validate interop transactions on ingress to the mempool. This should be a different
 instance of `op-supervisor` than the one that is used by `proxyd` to reduce the likelihood of a nondeterministic
-bug within `op-supervisor`.
+bug within `op-supervisor`. See "Host Topology" below for a description of how to arrange this.
 
-### Sentry Node + Sequencer Mempool on Interval
+### All Nodes Mempool on Interval
 
-We update the EL clients to validate interop transactions on an interval in the mempool. Generally the mempool
-will revalidate all transactions on each new block, but for an L2 that has 1-2s blocktime, that is quite often.
-It may be the case that we could get away with a batch RPC request every 1-2s, but generally we should not do
-`n` RPC requests on each block where `n` is the number of transactions that include statically declared executing
-messages.
+We can update the EL clients to validate interop transactions on an interval in the mempool. Generally the mempool
+will revalidate all transactions on each new block, but for an L2 that has 1-2s blocktime, that could be frequent if the
+RPC round-trip of an `op-supervisor` query is too costly.
 
-### Block Building
+Instead, the Sequencer (and all other nodes) should validate only on a low frequency interval after ingress.
+The *reasoning* for this is: 
 
 Lets say that it takes 100ms for the transaction to be checked at `proxyd`, checked at the mempool of the sentry node,
 forwarded to the sequencer and pulled into the block builder. The chances of the status of an initiating message
 going from existing to not existing during that timeframe is extremely small. Even if we did check at the block builder,
 it doesn't capture the case of a future unsafe chain reorg happening that causes the message to become invalid.
 Because it is most likely that the remote unsafe reorg comes after the local block is sealed, there is no real
-reason to block the hot path of the chain with the remote lookups.
+reason to block the hot path of the chain with the remote lookups. If anything, we would want to coordinate these checks
+with the *remote block builders*, but of course we have no way to actually do this.
 
-### Resource Usage
+### Batching Supervisor Calls
+
+During ingress, transactions are independent and must be checked independently. However, once they've reached the Sequencer
+mempool, transactions can be grouped and batched by presumed block. Depending on the rate of the check, the Sequencer
+can collect all the transactions in the mempool it believes will be in a block soon, and can perform a batch RPC call
+to more effectively filter out transactions. This would allow the call to happen more often without increasing RPC overhead.
+
+### Note on Resource Usage
 
 <!-- What is the resource usage of the proposed solution?
 Does it consume a large amount of computational resources or time? -->
@@ -119,13 +107,6 @@ Doing a remote RPC request is always going to be an order of magnitude slower th
 Therefore we want to ensure that we can parallelize our remote lookups as much as possible. Block building
 is inherently a single threaded process given that the ordering of the transactions is very important.
 
-### Single Point of Failure and Multi Client Considerations
-
-There is a single point of failure with the `op-supervisor`. Adding an alternative backend that dynamically
-fetches data using `eth_getLogs` rather than looking up its local database helps to approximate a second implementation.
-
-<!-- Details on how this change will impact multiple clients. Do we need to plan for changes to both op-geth and op-reth? -->
-
 ## Host Topology / Arrangement
 
 In order to fully validate a Superchain, a Supervisor must be hooked up to one Node per chain (with one Executing Engine behind each).
@@ -149,6 +130,56 @@ There may need to be additional considerations the Conductor makes in order to d
 but these are not well defined yet. For example, if the Supervisor of the active Sequencer went down,
 it may be prudent to switch the active Sequencer to one with a functional Supervisor.
 
+## Solution Side-Ideas
+
+Although they aren't strictly related to TX Flow or Redundancy, here are additional ideas to increase the stability
+of a network. These ideas won't be brought forward into the Solution Summary.
+
+### `op-supervisor` alternative backend
+
+We add a backend mode to `op-supervisor` that operates specifically by using dynamic calls to `eth_getLogs`
+to validate cross chain messages rather than its local index. This could be accomplished by adding
+new RPC endpoints that do this or could be done with runtime config. When `op-supervisor` runs in this
+mode, it is a "light mode" that only supports `supervisor_validateMessagesV2` and `supervisor_validateAccessList`
+(potentially a subset of their behavior). This would give us a form of "client diversity" with respect
+to validating cross chain messages. This is a low lift way to reduce the likelihood of a forged initiating
+message. A forged initiating message would be tricking the caller into believing that an initiating
+message exists when it actually doesn't, meaning that it could be possible for an invalid executing
+message to finalize.
+
+TODO: feedback to understand what capabilities are possible
+
+
+# Solution Summary
+
+We should establish `op-supervisor` checks of transactions at the following points:
+- On cloud ingress to `proxyd`
+- On ingress to all mempools
+- On regular interval on all mempools
+
+Additionally, the Sequencer should batch the calls 
+
+The production topology of an interop cluster checks the validity of cross chain transactions at:
+- cloud ingress (`proxyd`)
+- sentry node mempool ingress
+- sentry node mempool on interval
+- sequencer node mempool on interval
+
+The validity of cross chain transactions are not checked at:
+- sequencer node mempool ingress
+- block builder
+
+It is safe to not check at block building time because if the check passes at ingress
+time and passes again at block building time, it does not exhaustively cover all cases.
+It is possible that the remote reorg happens **after** the local block is sealed.
+In practice, it is far more likely to have an unsafe reorg on the remote chain that is not
+caught **before** the local block is sealed because the time between checking at the mempool
+and checking at block building is so small. Therefore we should not add a remote RPC request
+as part of the block building happy path, but we should still implement and benchmark it.
+
+TODO: include information on recommendation around standard mode/multi node/running multiple supervisors.
+Should include information on the value props of each so we can easily align on roadmap.
+
 # Alternatives Considered
 
 <!-- List out a short summary of each possible solution that was considered.

From 6812d613f0369d5700d0c2e5f10eac6c61a52317 Mon Sep 17 00:00:00 2001
From: axelKingsley <axel.kingsley@gmail.com>
Date: Tue, 25 Mar 2025 16:12:41 -0500
Subject: [PATCH 5/7] Unsaved edits

---
 protocol/interop-topology.md | 52 ++++++++++++++++--------------------
 1 file changed, 23 insertions(+), 29 deletions(-)

diff --git a/protocol/interop-topology.md b/protocol/interop-topology.md
index 2a3fa1c2..46646094 100644
--- a/protocol/interop-topology.md
+++ b/protocol/interop-topology.md
@@ -133,7 +133,7 @@ it may be prudent to switch the active Sequencer to one with a functional Superv
 ## Solution Side-Ideas
 
 Although they aren't strictly related to TX Flow or Redundancy, here are additional ideas to increase the stability
-of a network. These ideas won't be brought forward into the Solution Summary.
+of a network. These ideas won't be brought forward into the Solution Summary or Action Items.
 
 ### `op-supervisor` alternative backend
 
@@ -157,42 +157,27 @@ We should establish `op-supervisor` checks of transactions at the following poin
 - On ingress to all mempools
 - On regular interval on all mempools
 
-Additionally, the Sequencer should batch the calls 
+Additionally, regular interval checks should use batch calls which validate at least a block's worth of the mempool
+at a time.
 
-The production topology of an interop cluster checks the validity of cross chain transactions at:
-- cloud ingress (`proxyd`)
-- sentry node mempool ingress
-- sentry node mempool on interval
-- sequencer node mempool on interval
-
-The validity of cross chain transactions are not checked at:
+`op-supervisor` checks of transactions should *not* happen at the following points:
 - sequencer node mempool ingress
-- block builder
-
-It is safe to not check at block building time because if the check passes at ingress
-time and passes again at block building time, it does not exhaustively cover all cases.
-It is possible that the remote reorg happens **after** the local block is sealed.
-In practice, it is far more likely to have an unsafe reorg on the remote chain that is not
-caught **before** the local block is sealed because the time between checking at the mempool
-and checking at block building is so small. Therefore we should not add a remote RPC request
-as part of the block building happy path, but we should still implement and benchmark it.
+- block building (as a synchronous activity)
 
-TODO: include information on recommendation around standard mode/multi node/running multiple supervisors.
-Should include information on the value props of each so we can easily align on roadmap.
+When we deploy hosts, we should currently use a "Full Validation Set" of one Supervisor plus N Managed nodes,
+to maximize redundancy and independent operation of validators. When Sequencers are deployed, Conductors should manage
+individual Managed Nodes *across* Supervisors.
 
 # Alternatives Considered
 
-<!-- List out a short summary of each possible solution that was considered.
-Comparing the effort of each solution -->
-
-## Block Building
+## Checking at Block Building Time (Tx Flow Solution)
 
 The main alternative to not validating transactions at the block builder is validating transactions
 at the block builder. We would like to have this feature implemented because it can work for simple networks,
 as well as act as an ultimate fallback to keep interop messaging live, but we do not want to run it as
 part of the happy path.
 
-## Multi-Node (redundancy solution)
+## Multi-Node (Host Redundancy Solution)
 
 One request that has been made previously is to have "Multi-Node" support. In this model,
 multiple Nodes for a single chain are connected to the same Supervisor. To be clear, the Supervisor software
@@ -210,8 +195,17 @@ than 1:1:1 Node:Chain:Supervisor.
 
 # Risks & Uncertainties
 
-<!-- An overview of what could go wrong.
-Also any open questions that need more work to resolve. -->
-
 We really need to measure everything to validate our hypothesis on the ideal architecture.
-To validate the ideal architecture, we need to measure it and then try to break it.
\ No newline at end of file
+To validate the ideal architecture, we need to measure it and then try to break it.
+
+Incorrect assumptions, or unexpected emergent behaviors in the network, could result in validation not happening at the right times,
+causing excessive replacement blocks. Conversely, we could also fail to reduce load on the block builder, still leading to slow
+block building or stalls.
+
+Ultimately, this design represents a hypothesis which needs real testing before it can be challenged and updated.
+
+# Action Items from this Document
+- Put the `proxyd` check in place
+- Put an interval check in place in the mempool
+- Remove build-time checks
+- Test RPC performance (data collectable via Grafana)
\ No newline at end of file

From 2f9aaf179fdeb6a28d995496e143c70c7c1afc3e Mon Sep 17 00:00:00 2001
From: axelKingsley <axel.kingsley@gmail.com>
Date: Wed, 26 Mar 2025 09:34:43 -0500
Subject: [PATCH 6/7] Add Conductor Action Item

---
 protocol/interop-topology.md | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/protocol/interop-topology.md b/protocol/interop-topology.md
index 46646094..5e1e6106 100644
--- a/protocol/interop-topology.md
+++ b/protocol/interop-topology.md
@@ -208,4 +208,5 @@ Ultimately, this design represents a hypothesis which needs real testing before
 - Put the `proxyd` check in place
 - Put an interval check in place in the mempool
 - Remove build-time checks
-- Test RPC performance (data collectable via Grafana)
\ No newline at end of file
+- Test RPC performance (data collectable via Grafana)
+- Consider and add Supervisor-health as a trigger for Conductor Leadership Transfer
\ No newline at end of file

From 57303e5a070fd78b6fa06f23900c53244a42e2f9 Mon Sep 17 00:00:00 2001
From: axelKingsley <axel.kingsley@gmail.com>
Date: Wed, 26 Mar 2025 09:51:39 -0500
Subject: [PATCH 7/7] Explicit op-geth mention

---
 protocol/interop-topology.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/protocol/interop-topology.md b/protocol/interop-topology.md
index 5e1e6106..2638984c 100644
--- a/protocol/interop-topology.md
+++ b/protocol/interop-topology.md
@@ -65,7 +65,7 @@ This filter will eliminate all interop transactions made in bad faith, as they w
 
 It may be prudent for `proxyd` to wait and re-test a transaction after a short timeout (`1s` for example)
 to allow through transactions that are valid against the bleeding edge of chain content. `proxyd` can have its own
-`op-supervisor` and `op-node` cluster specifically to provide cross safety queries without putting any load on other
+`op-supervisor` and `op-node` cluster (and implicitly, an `op-geth` per `op-node`), specifically to provide cross safety queries without putting any load on other
 parts of the network.
 
 ### Sentry Node Mempool Ingress