Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(damgr): failover altda->ethda should keep finalizing l2 chain #23

Conversation

samlaf
Copy link
Collaborator

@samlaf samlaf commented Jan 29, 2025

Closes DAINT-122

Description

When altda->ethda failover happens, the op-node's L2 finalized head completely stalls.

This PR consists of 2 commits:

  1. test to show the behavior (fails on this commit)
  2. fix damgr so that finalizedHead can keep advancing even after failover

Tests

First commit contains test that shows the buggy behavior. To run:

git checkout <COMMIT>
go test -run ^TestAltDA_FinalizationAfterEthDAFailover$ github.com/layr-labs/optimism/op-e2e/actions/altda

Fix

The problem with the damgr as it is is that

d.finalizedHead = d.state.lastPrunedCommitment

is always run no matter what. This means that finalization is always driven by altda commitments, but after failover, damgr stops seeing the commitments (because they arent altda commitments anymore). The fix in the second commit is to let the finalization be driven by L1 finalization when there are no altda commitments managed by the damgr.

NOTE: I don't fully understand the subtleties of the damgr and derivation pipeline interactions, so might be doing something wrong here... but seems sound to me, especially since it passes the test.

Additional Context
The test is a bit ugly currently (see TODO comment below). Someone with better understanding of op-node and event processing order would surely be able to help me clean this up. I use this patch for debugging event processing, in case this is of use to anyone:

diff --git a/op-e2e/actions/altda/altda_test.go b/op-e2e/actions/altda/altda_test.go
index 9d60f2fee..c6ef361d7 100644
--- a/op-e2e/actions/altda/altda_test.go
+++ b/op-e2e/actions/altda/altda_test.go
@@ -194,7 +194,7 @@ func (a *L2AltDA) ActNewL2Tx(t helpers.Testing) {
 //
 // 17 makes sense because challengeWindow=16 and we create 1 extra block before that,
 // and 204 L2blocks = 17 L1blocks * 12 L2blocks/L1block (L1blocktime=12s, L2blocktime=1s)
-func (a *L2AltDA) ActNewL2TxFinalized(t helpers.Testing) {
+func (a *L2AltDA) ActNewL2TxFinalized(t helpers.Testing, logEvents ...bool) {
 	// Include a new l2 batcher transaction, submitting an input commitment to the l1.
 	a.ActNewL2Tx(t)
 	// Create ChallengeWindow empty blocks so the above batcher blocks can finalize (can't be challenged anymore)
@@ -203,7 +203,20 @@ func (a *L2AltDA) ActNewL2TxFinalized(t helpers.Testing) {
 	// TODO: understand why we need to drain the pipeline before AND after actL1Finalized
 	a.sequencer.ActL2PipelineFull(t)
 	a.ActL1Finalized(t)
-	a.sequencer.ActL2PipelineFull(t)
+	if logEvents == nil {
+		a.sequencer.ActL2PipelineFull(t)
+	} else {
+		// Log all events until the end of the pipeline
+		count := 0
+		a.sequencer.ActL2EventsUntil(t, func(ev event.Event) bool {
+			count++
+			a.log.Info("new event", "event", ev, "count", count)
+			if count == 100 {
+				return true
+			}
+			return false
+		}, 100, false)
+	}
 
 	// Uncomment the below code to observe the behavior described in the TODO above
 	// syncStatus := a.sequencer.SyncStatus()
@@ -680,5 +693,12 @@ func TestAltDA_FinalizationAfterEthDAFailover(gt *testing.T) {
 		ssAfter := harness.sequencer.SyncStatus()
 		// Even after failover, the finalized head should continue advancing normally
 		require.Equal(t, ssBefore.FinalizedL2.Number+diffL2Blocks, ssAfter.FinalizedL2.Number)
+
+		harness.log.Info("Sync status before",
+			"unsafeL1", ssBefore.HeadL1.Number, "safeL1", ssBefore.SafeL1.Number, "finalizedL1", ssBefore.FinalizedL1.Number,
+			"unsafeL2", ssBefore.UnsafeL2.Number, "safeL2", ssBefore.SafeL2.Number, "finalizedL2", ssBefore.FinalizedL2.Number)
+		harness.log.Info("Sync status after",
+			"unsafeL1", ssAfter.HeadL1.Number, "safeL1", ssAfter.SafeL1.Number, "finalizedL1", ssAfter.FinalizedL1.Number,
+			"unsafeL2", ssAfter.UnsafeL2.Number, "safeL2", ssAfter.SafeL2.Number, "finalizedL2", ssAfter.FinalizedL2.Number)
 	}
 }

@samlaf
Copy link
Collaborator Author

samlaf commented Jan 29, 2025

Tests are currently failing, but waiting for discussions in ethereum-optimism#14050 to get resolved. That PR will fix these tests and then we can just rebase on top of them.

samlaf added 6 commits March 5, 2025 13:34
… after failover to ethda

Currently it does not, as shown by the test TestAltDA_FinalizationAfterEthDAFailover failing
Weiwei from Polymer found this bug. He proposed a solution. This is an alternative solution which seems simpler, but not 100% of its soundness.
@samlaf samlaf force-pushed the samlaf/fix--altda-failover-to-ethda-should-keep-finalizing-l2-chain branch from 11eb3d1 to 6cf7f03 Compare March 5, 2025 18:45
@samlaf samlaf requested review from litt3 and ethenotethan March 5, 2025 21:22
…verToEthDACalldata

This makes the test takes 4-7 minutes now, which is still acceptable.
@samlaf samlaf requested a review from litt3 March 6, 2025 17:46
@samlaf samlaf merged commit 4c75164 into eigenda-develop Mar 6, 2025
8 checks passed
@samlaf samlaf deleted the samlaf/fix--altda-failover-to-ethda-should-keep-finalizing-l2-chain branch March 6, 2025 20:51
samlaf added a commit that referenced this pull request Mar 7, 2025
… finalizing (#23)

* test(altda): add a test to make sure altda node keeps finalizing even after failover to ethda

Currently it does not, as shown by the test TestAltDA_FinalizationAfterEthDAFailover failing

* fix(damgr): ethda failover finalization stall bug

Weiwei from Polymer found this bug. He proposed a solution. This is an alternative solution which seems simpler, but not 100% of its soundness.

* fix: damgr_test doesn't compile

* ci: add op-alt-da tests to ci

* chore: add more logs to damgr and altda_data_source

* docs(altda_test): fix typo

* test(kt-devnet): add node-keeps-finalizing test to kurtosis TestFailoverToEthDACalldata

This makes the test takes 4-7 minutes now, which is still acceptable.

* style: rename log -> logger

* docs(kt-tests): document the EnclaveServiceClients

* style: breakup long log line
samlaf added a commit that referenced this pull request Mar 7, 2025
… finalizing (#23)

* test(altda): add a test to make sure altda node keeps finalizing even after failover to ethda

Currently it does not, as shown by the test TestAltDA_FinalizationAfterEthDAFailover failing

* fix(damgr): ethda failover finalization stall bug

Weiwei from Polymer found this bug. He proposed a solution. This is an alternative solution which seems simpler, but not 100% of its soundness.

* fix: damgr_test doesn't compile

* ci: add op-alt-da tests to ci

* chore: add more logs to damgr and altda_data_source

* docs(altda_test): fix typo

* test(kt-devnet): add node-keeps-finalizing test to kurtosis TestFailoverToEthDACalldata

This makes the test takes 4-7 minutes now, which is still acceptable.

* style: rename log -> logger

* docs(kt-tests): document the EnclaveServiceClients

* style: breakup long log line
samlaf added a commit that referenced this pull request Mar 7, 2025
… finalizing (#23)

* test(altda): add a test to make sure altda node keeps finalizing even after failover to ethda

Currently it does not, as shown by the test TestAltDA_FinalizationAfterEthDAFailover failing

* fix(damgr): ethda failover finalization stall bug

Weiwei from Polymer found this bug. He proposed a solution. This is an alternative solution which seems simpler, but not 100% of its soundness.

* fix: damgr_test doesn't compile

* ci: add op-alt-da tests to ci

* chore: add more logs to damgr and altda_data_source

* docs(altda_test): fix typo

* test(kt-devnet): add node-keeps-finalizing test to kurtosis TestFailoverToEthDACalldata

This makes the test takes 4-7 minutes now, which is still acceptable.

* style: rename log -> logger

* docs(kt-tests): document the EnclaveServiceClients

* style: breakup long log line
samlaf added a commit that referenced this pull request Mar 7, 2025
… finalizing (#23)

* test(altda): add a test to make sure altda node keeps finalizing even after failover to ethda

Currently it does not, as shown by the test TestAltDA_FinalizationAfterEthDAFailover failing

* fix(damgr): ethda failover finalization stall bug

Weiwei from Polymer found this bug. He proposed a solution. This is an alternative solution which seems simpler, but not 100% of its soundness.

* fix: damgr_test doesn't compile

* ci: add op-alt-da tests to ci

* chore: add more logs to damgr and altda_data_source

* docs(altda_test): fix typo

* test(kt-devnet): add node-keeps-finalizing test to kurtosis TestFailoverToEthDACalldata

This makes the test takes 4-7 minutes now, which is still acceptable.

* style: rename log -> logger

* docs(kt-tests): document the EnclaveServiceClients

* style: breakup long log line
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants