Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement v2 client GET functionality #972

Open
wants to merge 41 commits into
base: master
Choose a base branch
from

Conversation

litt3
Copy link
Contributor

@litt3 litt3 commented Dec 9, 2024

Why are these changes needed?

This PR creates a new EigenDAClientV2, and implements GET functionality for the new client. Additional client functionality will be implemented in upcoming PRs.

Checks

  • I've made sure the tests are passing. Note that there might be a few flaky tests, in that case, please comment that they are not relevant.
  • I've checked the new test coverage and the coverage percentage didn't drop.
  • Testing Strategy
    • Unit tests
    • Integration tests
    • This PR is not tested :(

Signed-off-by: litt3 <[email protected]>
@litt3 litt3 self-assigned this Dec 9, 2024
Copy link
Contributor

@samlaf samlaf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass on the client code. Logic LGTM. Added a bunch of nit comments to make code cleaner.
Have not looked at tests yet. Can you try to clean up the linter errors, makes it hard to read on github.

api/clients/eigenda_client_v2.go Outdated Show resolved Hide resolved
api/clients/eigenda_client_v2.go Outdated Show resolved Hide resolved
api/clients/eigenda_client_v2.go Outdated Show resolved Hide resolved
api/clients/eigenda_client_v2.go Outdated Show resolved Hide resolved
api/clients/eigenda_client_v2.go Outdated Show resolved Hide resolved
api/clients/eigenda_client_v2.go Outdated Show resolved Hide resolved
api/clients/eigenda_client_v2.go Outdated Show resolved Hide resolved
api/clients/eigenda_client_v2.go Outdated Show resolved Hide resolved
api/clients/eigenda_client_v2.go Outdated Show resolved Hide resolved
api/clients/eigenda_client_v2.go Outdated Show resolved Hide resolved
@litt3 litt3 mentioned this pull request Dec 10, 2024
5 tasks
api/clients/config_v2.go Outdated Show resolved Hide resolved
api/clients/eigenda_client_v2.go Outdated Show resolved Hide resolved
api/clients/eigenda_client_v2.go Outdated Show resolved Hide resolved
// EigenDAClientV2 provides the ability to get blobs from the relay subsystem, and to send new blobs to the disperser.
type EigenDAClientV2 struct {
log logging.Logger
// doesn't need to be cryptographically secure, as it's only used to distribute load across relays

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of scope for this PR but curious if we'd ever wanna let users define their own retrieval policies when communicating with relays

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly something to consider, at the latest if/when users have to pay relays for retrieval

api/clients/eigenda_client_v2.go Outdated Show resolved Hide resolved
api/clients/eigenda_client_v2.go Outdated Show resolved Hide resolved
api/clients/eigenda_client_v2.go Outdated Show resolved Hide resolved
Comment on lines 93 to 97
// if GetBlob returned an error, try calling a different relay
if err != nil {
c.log.Warn("blob couldn't be retrieved from relay", "blobKey", blobKey, "relayKey", relayKey, "error", err)
continue
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about the circumstance where the error is transient and the # of relay keys == 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting that we have an additional timeout, during which the client repeatedly retries all relays?

I could implement this if it's the way we want to go- but I don't see how the case relay keys == 1 is unique?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, I would prefer we let the outer layer implement the retry behavior it wants. In this case this means either the proxy, or even the batcher.

continue
}

// An honest relay should never send a blob which cannot be decoded

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To expand these invariants: an honest relay should never send a blob which doesn't respect its polynomial commitments. The thing is though this check would get caught upstream (i.e, within proxy directly) and probably cause the request to fail. The proxy client would trigger a retry which would probably route to another relay.

this isn't a big problem rn and we can just document it somewhere for circle back sometime in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason not to check this invariant here, included in this PR? Seems like it wouldn't be hard to add

Copy link
Contributor Author

@litt3 litt3 Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commitments are being checked in the most recent iteration.

api/clients/codecs/mock/BlobCodec.go Outdated Show resolved Hide resolved

// GetBlob iteratively attempts to retrieve a given blob with key blobKey from the relays listed in the blobCertificate.
//
// The relays are attempted in random order.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The disperser implementation itself shuffles the relay keys, so attempting each relay in order is actually fine. But it's good that we're not assuming that relay keys aren't ordered in any particular way.

Aka do we need to hit all the relayKeys to get all the blobs, or any ONE should do?

Any one should do!

api/clients/eigenda_client_v2.go Outdated Show resolved Hide resolved
api/clients/eigenda_client_v2.go Outdated Show resolved Hide resolved
}

// GetCodec returns the codec the client uses for encoding and decoding blobs
func (c *EigenDAClientV2) GetCodec() codecs.BlobCodec {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this getter?

Copy link
Contributor Author

@litt3 litt3 Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I carried it forward from v1 client - will remove as it may not be necessary for v2 (nevermind, see comment from Sam below)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll still need it. We use it in proxy to get the codec and IFFT blobs when computing commitments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left this method for now, but I don't think that the proxy will end up needing it. This client will be responsible for computing commitments when dispersing a payload

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use it in the verifier as well though afaik. Unless you're suggesting verification will also be done by the client when retrieving?

Copy link
Contributor Author

@litt3 litt3 Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless you're suggesting verification will also be done by the client when retrieving

That's what is being done with this PR- this client fetches and verifies everything. I don't think that the v2 version of the proxy will have anything like the current proxy "verifier", everything should be verified while getting and putting

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. let's remove this method then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

api/clients/codecs/mock/BlobCodec.go Outdated Show resolved Hide resolved
api/clients/codecs/mock/BlobCodec.go Outdated Show resolved Hide resolved
api/clients/codecs/mock/BlobCodec.go Outdated Show resolved Hide resolved
api/clients/config_v2.go Outdated Show resolved Hide resolved
api/clients/config_v2.go Outdated Show resolved Hide resolved
api/clients/eigenda_client_v2.go Outdated Show resolved Hide resolved
api/clients/eigenda_client_v2.go Outdated Show resolved Hide resolved
api/clients/eigenda_client_v2.go Outdated Show resolved Hide resolved
api/clients/eigenda_client_v2.go Outdated Show resolved Hide resolved
api/clients/v2/config.go Outdated Show resolved Hide resolved
@litt3 litt3 requested review from EthenNotEthan and samlaf January 17, 2025 21:08
@litt3 litt3 marked this pull request as ready for review January 17, 2025 21:08
@litt3 litt3 requested a review from bxue-l2 January 17, 2025 21:09
api/clients/v2/eigenda_client.go Outdated Show resolved Hide resolved
@litt3 litt3 requested a review from cody-littley January 21, 2025 14:39
api/clients/codecs/polynomial_form.go Outdated Show resolved Hide resolved
api/clients/v2/config.go Outdated Show resolved Hide resolved
api/clients/v2/config.go Outdated Show resolved Hide resolved
api/clients/v2/verification/blob_verifier.go Outdated Show resolved Hide resolved
api/clients/v2/verification/commitment_utils.go Outdated Show resolved Hide resolved
api/clients/v2/eigenda_client.go Outdated Show resolved Hide resolved
api/clients/v2/eigenda_client.go Outdated Show resolved Hide resolved
api/clients/v2/eigenda_client.go Show resolved Hide resolved
api/clients/v2/eigenda_client.go Outdated Show resolved Hide resolved
api/clients/v2/eigenda_client.go Outdated Show resolved Hide resolved
Comment on lines +25 to +26
// doesn't need to be cryptographically secure, as it's only used to distribute load across relays
random *rand.Rand

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small dumb question - does the use of a non-deterministic key potentially impact retrieval latencies across a subnetwork of verifier nodes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's possible to provide any guarantee of highly similar latencies across a network of verifier nodes. Even if all nodes were to talk to a single relay, that relay could have high latency variability in responding to the different verifier nodes.

I think the best solution would be to implement a tool which prioritizes the best relay partner on a node-by-node basis, as mentioned in the TODO comment, so that every verifier node gets a response as quickly as possible

return nil, fmt.Errorf("new relay client: %w", err)
}

ethClient, err := geth.NewClient(ethConfig, gethcommon.Address{}, 0, log)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably out of scope but does the use of geth for the eth package imply that the node being used has to be a go-ethereum one or do other execution client nodes (e.g, reth, besu) also work?

Copy link
Contributor Author

@litt3 litt3 Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the use of geth for the eth package imply that the node being used has to be a go-ethereum

I don't think that's the implication. The bindings method I'm using requires an EthClient input parameter, and the implementation happens to be in a package called geth. But I don't see why the target node would be required to be geth

@0x0aa0 can you weigh in here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah no it doesn't, it's just a geth provided library. its just an implementation of the eth rpc methods which all clients implement.

}

payload, err := c.codec.DecodeBlob(blob)
if err != nil {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this error a terminal one but others warrant retrying? Is the idea that if a blob passes verification then the contents would always be the same and therefore the codec decoding would yield the same result irrespective of relay?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g couldn't only one relay lie about the length of the blob, causing the initial varuint decoding and length invariant to fail?

Copy link
Contributor Author

@litt3 litt3 Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if a blob passes verification then the contents would always be the same and therefore the codec decoding would yield the same result irrespective of relay

Correct. If the blob contents verified against the cert we have, that means the relay delivered the blob as it was dispersed. If we asked another relay, either it would:

  1. return the same blob bytes, and we end up at the same place
  2. return different blob bytes. if these different bytes are decodable, that means they are necessarily different from the bytes we currently have, so the cert can't possibly verify

If a non-parseable blob verifies against the commitments, time to panic. Either it's a bug, or worse

api/clients/v2/eigenda_client.go Outdated Show resolved Hide resolved
api/clients/codecs/polynomial_form.go Show resolved Hide resolved
api/clients/v2/config.go Outdated Show resolved Hide resolved
api/clients/v2/eigenda_client.go Outdated Show resolved Hide resolved
@litt3 litt3 requested review from EthenNotEthan and samlaf January 21, 2025 22:17
Copy link
Contributor

@ian-shim ian-shim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

api/clients/v2/eigenda_client.go Show resolved Hide resolved
@litt3 litt3 requested a review from anupsv January 22, 2025 21:12
api/clients/v2/config.go Show resolved Hide resolved
Comment on lines +152 to +153
It should not be possible for an honestly generated certificate to verify
for an invalid blob!`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think this is not 100% accurate. the cert only talks about the blob. Here its the decoding of the blob to the payload that is failing. Our disperser client would only construct correctly encoded blobs, but it might have been constructed by some other library (lets say our rust library), and there might be some incompatibility between the two for eg.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, "honestly generated certificate" includes the dispersing client verifying that the input bytes have a valid encoding for the intended use. By "should not be possible", I'm trying to convey that this should never happen in the normal course of business. If it does happen, it indicates when of the following:

  • a bug somewhere in the system
  • a malicious dispersing client
  • broken cryptography

Do you have any suggestions for rewording here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure because I haven't wrapped my head around this whole length proof verification stuff, and the kind of attacks possible. But your "a malicious dispersing client" is PRECISELY the kind of thing EigenDA is meant to protect against (otherwise... just use S3). So we absolutely need good safeguards and guarantees around these attacks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's chat during our Integrations Food for Thought this afternoon. I'm fine in the meantime with merging this PR as is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But your "a malicious dispersing client" is PRECISELY the kind of thing EigenDA is meant to protect against

I don't think it is. EigenDA is meant to guarantee that data is immutable and available. If the dispersing client wants to fill a blob with total gobbldygook that means nothing to anyone, DA isn't in a position to protect against that. All DA can do it guarantee that the gobbldygook is faithfully preserved

api/clients/v2/verification/eigenda_cert.go Outdated Show resolved Hide resolved
api/clients/v2/mock/blob_verifier.go Outdated Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for catching this so late, but can we rename this file, the struct, the mock, everything literally, to cert_verifier instead? We can't change the contract name (although I will put a LOT of pressure to get this to change), but I don't think it means our entire codebase should be riddled with this mistake.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this name change.

I'm going to do a rename PR after this PR merges, since we know the contained EigenDAClient needs to become PayloadRetriever.

Since this blob_verifier is pre-existing, I'd prefer to postpone this suggestion until that rename PR.

api/clients/v2/verification/blob_verifier.go Show resolved Hide resolved
type EigenDAClient struct {
log logging.Logger
// doesn't need to be cryptographically secure, as it's only used to distribute load across relays
random *rand.Rand
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need a mutex around rand

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we do.

The only random method we're using is Perm, which calls Intn under the hood.

There exists a static rand method in math.Rand, which calls Intn on the global rand singleton, without any synchronization.

func Intn(n int) int { return globalRand().Intn(n) }

This indicates to me that it must be safe to call without synchronization.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation says

Random numbers are generated by a Source, usually wrapped in a Rand. Both types should be used by a single goroutine at a time: sharing among multiple goroutines requires some kind of synchronization.

Think the whole package is not goroutine safe. I think (but can't find a link atm) that everything in golang is assumed to NOT be goroutine safe, unless explicitly stated in the documentation comment. For eg golang maps are not goroutine safe: there's https://pkg.go.dev/golang.org/x/sync/syncmap for that.

Copy link
Contributor Author

@litt3 litt3 Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The package itself is comfortable using a global instance of Rand for some functionality, without explicit synchronization. I think it would be reasonable to copy the same usage patterns, without explicit synchronization

If you're not comfortable with this, though, I think I would lean toward sacrificing test determinism, and just use the static methods from rand instead of maintaining a local source of randomness. Having mutexes around random is very ugly.

Thoughts on this?

Comment on lines 114 to 120
blobCommitmentProto := contractEigenDABlobVerifier.BlobCommitmentBindingToProto(
&eigenDACert.BlobVerificationProof.BlobCertificate.BlobHeader.Commitment)
blobCommitment, err := encoding.BlobCommitmentsFromProtobuf(blobCommitmentProto)

if err != nil {
return nil, fmt.Errorf("blob commitments from protobuf: %w", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feel like this should be a single util function that does both steps.
Also can we rename blobCommitmentProto to something like blobCommitmentInCert

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a utility function, and sidestepped the name blobCommitmentProto 7b66df6a

LMK what you think about the chosen name for the util function, it's awkward but nothing better came to mind

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks! I don't like though that we use commitmentS (with an s) in one place and not the other.
image

Can we make those the same (prob remove the s?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a larger renaming discussion.

Considering the struct itself contains 2 commitments, in addition to other elements, if anything BlobCommitments, plural, is the better name.

api/clients/v2/eigenda_client.go Outdated Show resolved Hide resolved
Comment on lines +173 to +178
func (c *EigenDAClient) verifyBlobAgainstCert(
blobKey core.BlobKey,
relayKey core.RelayKey,
blob []byte,
kzgCommitment *encoding.G1Commitment,
blobLength uint) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this method take a blob and a cert? That's what I would have intuited from the name. Is it really worth extracting all of these fields from the cert? Would we ever want to call it without actually having a cert in hand?

Copy link
Contributor Author

@litt3 litt3 Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with passing in the entire EigenDACert is that verifyBlobAgainstCert is called in a loop, and the data contained in the cert must be processed before it can be used (the binding type must be converted to the internal type). And we wouldn't want to do this conversion multiple times.

One option would be to pass in an encoding.BlobCommitments type, in place of the separate G1Commitment and uint length. But since these 2 fields are only a subset of the data contained in the encoding.BlobCommitments struct, my personal preference is to only pass in the things we need.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha that makes sense at least. Might be nice to have an internal type for Cert (contains all the datatypes that a cert has, but in the internal types instead of contract types). But I'm fine with merging without this. This PR has been alive for too long already haha.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a larger team discussion about when to create and use internal datatypes.

Should we always have internal types? Only when we need to define methods for autogenerated types? Only when necessary (i.e., use proto types wherever possible)?

api/clients/v2/eigenda_client.go Outdated Show resolved Hide resolved
Comment on lines +201 to +202
// here: it isn't necessary to verify the length proof itself, since this will have been done by DA nodes prior to
// signing for availability.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really true? I feel like we should still check the proof?
Otherwise wouldn't the same argument also apply to the actual commitment above? and possibly some other checks we check in the CertVerification contract call.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also still having trouble understanding why we check <= :( :(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summoning the experts @bxue-l2 @anupsv

But here's my attempt at an explanation:
The reason why we need to verify the kzg commitment is because without doing that, we don't know for sure that the relay sent us the same bytes that the DA nodes received. Once we have verified the commitment, this is guaranteed, so we can begin to rely on our trust of the DA nodes. We assume that a given threshold of DA nodes must be honest, and the only way the length proof could fail verification is if greater than the assumed threshold is malicious.

Now, as to whether the length check is needed at all, let me repeat here my comment I recently made on the notion doc:

I’m also not 100% convinced it is necessary in the retrieval path. The only thing I can think that this is protecting against is a relay sending tons of extra padded 0s? But we could also protect against that by simply forbidding relays from sending trailing zeros, and check that

// create a randomized array of indices, so that it isn't always the first relay in the list which gets hit
indices := c.random.Perm(relayKeyCount)

// TODO (litt3): consider creating a utility which can deprioritize relays that fail to respond (or respond maliciously)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, utility should also prioritize relays with lower latencies (although perhaps it should still reach out to lower priority relays with small but non-zero probability).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expanded TODO to mention prioritizing low latency relays

litt3 added 2 commits January 23, 2025 11:02
Signed-off-by: litt3 <[email protected]>
Signed-off-by: litt3 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants