Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(f3): resolve finality for eth APIs according to F3 #12762

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

rvagg
Copy link
Member

@rvagg rvagg commented Dec 5, 2024

From CHANGELOG:

Ethereum APIs meet F3! When F3 is enabled and running, Ethereum APIs that accept block descriptors "finalized" and "safe" will use F3 to determine the block to select instead of the default 30 block delay for "safe" and 900 block delay for "finalized".

  • Introduced an lf3.F3API interface that I can mock and replace the default lf3.F3
  • Added an IsEnabled() to the F3 interface, lf3.F3 returns true
  • Added a lf3.DisabledF3 implementation that gets loaded when F3 isn't enabled, it returns false for IsEnabled(). All methods now have an error return and they'll return api.ErrF3Disabled; so this simplifies node/impl/full/f3.go which just has to pass through the calls instead of checking whether fff is nil.
  • Added a MockF3API that can be used as a replacement to modify some of its behaviour (focused on the changes in here fore now).
  • Added a TipSetResolver, which for now, just exposes ResolveEthBlockSelector for the Ethereum APIs. But this will be the extension point for the additional non-eth API work (I'll rebase feat(api): v2 API integrating with F3 #12719 ontop of this). This replaces the eth utility function getTipsetByBlockNumber and integrates with F3.

So, now when you resolve "safe" or "finalized", we first query F3, then:

  1. If F3 is disabled or not running, fall back to EC behaviour ("safe" = head-30-1, "finalized" = head-900-1) (we always -1 for the Eth APIs because the queries are transaction-centric and care about transaction execution).
  2. If F3 reports its latest epoch is later than EC finality, we fall back to EC behaviour
  3. If F3 is closer to head than the default EC variant (31, or 901), use the F3 epoch (-1) instead

@Kubuxu @masih PTAL

@Stebalien I'm pinging you because I wouldn't mind you additional sanity checking on the eth tipset-1 behaviour in here.

itests/eth_api_test.go Outdated Show resolved Hide resolved
Copy link
Member

@masih masih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blockers. I recommend getting error construction to be consistent. Left some comments on what might be surprising from DX/maintainer perspective.

chain/lf3/f3.go Outdated Show resolved Hide resolved
chain/tsresolver/tipset_resolver.go Show resolved Hide resolved
chain/tsresolver/tipset_resolver.go Show resolved Hide resolved
chain/tsresolver/tipset_resolver.go Show resolved Hide resolved
chain/tsresolver/tipset_resolver.go Outdated Show resolved Hide resolved
chain/tsresolver/tipset_resolver.go Outdated Show resolved Hide resolved
chain/tsresolver/tipset_resolver.go Outdated Show resolved Hide resolved
chain/tsresolver/tipset_resolver.go Outdated Show resolved Hide resolved
}
if running, _ := tsr.f3.IsRunning(); !running {
return nil, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have an additional check here for "is F3 running with finalization/static manifest".

Copy link
Member

@masih masih Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rvagg, here is my take on this comment:

is F3 running with finalization

This one means if "finalization" is enabled in F3 manifest, which in Lotus lingo translates to "Should F3 checkpoint the finalized tipsets?". It is specified here.

If finalisation in disabled in F3, then the API should be using EC finality regardless of whether F3 itself is enabled or not.

/static manifest

The F3 manifest integration in lotus is dynamic by default but up to a point, beyond which it will revert exclusively to a static manifest that "activates" F3 as the consensus mechanism for the network with uniform parameters across all nodes and no longer controllable by the F3 engineering team.

In the static manifest, finalisation is enabled. If we are in a state where F3 is running, and we are using static manifest (i.e. F3 is active) then use F3 to resolve a tipset's finality.

@Kubuxu please correct me if I have missed anything.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks! I the static/dynamic thing has more nuance than I was aware of, I probably should go read the spec to understand that a bit more deeply.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, went digging to understand a bit more of the dynamic/static situation and ended up having to dig into code. From what I can tell, there's no clear way of telling that we're not operating under a dynamic manifest, so the only check we can do here (the only check we need?) is to ask whether manifest.EC.Finalize is true, and if it's not, then ignore F3.

Does that sound right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is all handled now but I wouldn't mind a sanity check from both of you (@masih & @Kubuxu) in this function in particular.

It's now called maybeF3FinalizedTipSet to indicate that you may not get something useful from it. If F3 can't help us then we just move on to the defaults.

  1. We're not going to error here unless it's from dealing with the chain store, an error from F3 is worthy of a WARN log, and only if the error isn't ErrF3Disabled.
  2. If !manifest.EC.Finalize, we don't bother continuing
  3. Some basic defensive checks on the certificate
  4. All good? load and return that tipset, it's the latest that F3 has to offer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, sadly this doesn't work:

runtime error: invalid memory address or nil pointer dereference
github.com/filecoin-project/go-jsonrpc.doCall.func1
	/home/runner/go/pkg/mod/github.com/filecoin-project/[email protected]/handler.go:237
runtime.gopanic
	/home/runner/work/_tool/go/1.22.7/x64/src/runtime/panic.go:770
runtime.panicmem
	/home/runner/work/_tool/go/1.22.7/x64/src/runtime/panic.go:261
runtime.sigpanic
	/home/runner/work/_tool/go/1.22.7/x64/src/runtime/signal_unix.go:881
github.com/filecoin-project/lotus/chain/lf3.(*F3).GetManifest
	/home/runner/work/lotus/lotus/chain/lf3/f3.go:205
github.com/filecoin-project/lotus/chain/tsresolver.(*tipSetResolver).maybeF3FinalizedTipSet
	/home/runner/work/lotus/lotus/chain/tsresolver/tipset_resolver.go:131

So GetManifest as it exists today isn't safe to call when it hasn't got a manifest. This is in an itest, letting it load the default F3 module. I think maybe it needs to have a check in here for nil, but what should it return in that case?

lotus/chain/lf3/f3.go

Lines 203 to 206 in a46832f

func (fff *F3) GetManifest(ctx context.Context) (*manifest.Manifest, error) {
m := fff.inner.Manifest()
if m.InitialPowerTable.Defined() {
return m, nil

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we're querying it before the manifest provider can give it a manifest, I think F3#GetManifest should either return nil, nil or an error I could errors.Is on to deal with this case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetManifest should always fall back on returning the static manifest ( which is present at compile time ). If it does not then we should fix that. It seems like a bug.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it initialises with its manifest atomic.Pointer empty, and it only gets set after Start() is called and manifestProvider.ManifestUpdates() successfully sends it a new manifest. https://github.com/filecoin-project/go-f3/blob/6c3d2deca27e361b68680156eef99271d82c29a6/f3.go#L170

itests/eth_api_test.go Outdated Show resolved Hide resolved
@rvagg rvagg force-pushed the rvagg/eth-f3-finalized branch 5 times, most recently from f55cd57 to 40bc77c Compare December 11, 2024 06:36
@rvagg
Copy link
Member Author

rvagg commented Dec 12, 2024

FYI this is getting more of a refactor. It's been suggested that (for now at least), we should only expose the differences on a /v2/ API so /v1/ users don't experience any change even when F3 goes live so that it's an opt-in thing. I'll need to do some work to introduce the /v2/ in this PR and try and keep the code duplication minimal, which will be a challenge.

@rvagg rvagg force-pushed the rvagg/eth-f3-finalized branch from 40bc77c to b73c42c Compare December 16, 2024 07:38
@BigLep
Copy link
Member

BigLep commented Jan 14, 2025

@rvagg : given #12762 (comment) and that I don't think this is active for you currently, I"m going to move this back to draft so it's clear that it's not ready for review. Please correct me if that's wrong.

@BigLep BigLep marked this pull request as draft January 14, 2025 00:57
@rvagg
Copy link
Member Author

rvagg commented Jan 14, 2025

uh. yeah, and I moved a ton of refactoring into a PR that has since landed, so this one is now out of date and isn't even close to being landable! I'm going to have to close this and start again I think but I don't want to lose reference to it so I'll leave it open until I pick it up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: ⌨️ In Progress
Development

Successfully merging this pull request may close these issues.

4 participants