Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zero copy message decoding #42

Open
thedodd opened this issue Jan 22, 2024 · 1 comment
Open

Zero copy message decoding #42

thedodd opened this issue Jan 22, 2024 · 1 comment

Comments

@thedodd
Copy link
Contributor

thedodd commented Jan 22, 2024

This would add a next level of performance to parsing incoming Kafka requests. The main idea:

  • Request payloads would be parsed / validate in a way which is not too dissimilar to how it is currently done in this crate as of 0.8.x.
  • Instead of allocating new collections to produce owned copies of the decoded messages, instead we would produce messages which can borrow data from the backing Bytes buffer.
  • This pattern of always expecting a backing Bytes buffer will be quite nice, because then the type signatures for the zero-copy types will not need to be generic over lifetimes, instead they will simply embed the Bytes buffer.
  • There are a few more difficult patterns which we will have to tackle, indexmaps, vectors, things of that nature; however, a lot of the work could likely be amortized:
    • The zero-copy message types could embed state where needed. Offsets into the buffer. Version info. Things of that nature.
    • Amortizing lookups and offsets will be much less expensive that copying data and allocating storage.
  • BONUS: support direct mutation of data without having to copy. This would per particularly helpful in cases where record offsets need to be updated, and things of that nature.

Other projects which have explored this space:

One thing that could help bypass a lot of the difficulty with alignment and the like: just use accessors to access data. Don't attempt to build structs which are backed by the buffer. Instead, access fields of data via methods on a struct which simply embeds the Bytes buffer. Definitely still edge cases and things to work through; however, that alone will bypass a large portion of alignment issues.

Thoughts?

@tychedelia
Copy link
Owner

At one point, I was working on a proxy that could benefit from this approach, but ultimately decided I just needed to parse the header which is pretty straight forward. I'd be curious to know how much overhead we currently have parsing and constructing messages. And what the use cases of our users are and whether they'd benefit from such an API. I think I'd want such a project to be driven by a production user to make sure any improvements were worth the complexity cost.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants