RFC for async #238

Kriskras99 · 2025-07-21T18:31:08Z

A start at #136

I've decided to do the RFC as a PR so it's easier to give inline commentary.

I strongly recommend reading the blog post mentioned in the references and watch the video about rc-zip (becomes relevant after ~10 minutes).

Feel free to edit/expand the RFC.

martin-g · 2025-07-21T20:22:55Z

@Xuanwo @xxchan @Fokko Do you have experience with these matters in your libraries ?

martin-g

I just watched the video about Sans IO and honestly I still have no idea how the implementation will look like ...
But looking at https://github.com/bearcove/rc-zip I see that we will need to add more crates for each runtime - tokio, async-std/smol, monoio, sync, etc. Tokio is pretty much the standard for async but the idea of having to add a crate for every runtime looks scary to me.

For now maybe_async looks most promising to me!

(This is not a final decision! We are discussing!)

rfc.md

Kriskras99 · 2025-07-22T13:24:49Z

The annoying thing about maybe_async is that it will break builds. If a crate is depending on two libraries who both depend on avro, but one enabes sync and the other async. The crate will not build.

For sans-io, we don't need to add a crate for every runtime. We can provide a sync and a tokio crate, and if someone wants to use a different runtime they can wrap the core crate themselves. The wrapper can be really simple, namely just a loop over the parser. The reason that rc-zip-* wrappers are more complex is because they provide a lot of extra trait implementations, and have implementations for the entries.

I plan on experimenting with writing a basic Sans I/O parser for Avro, just to see how it would look.

martin-g · 2025-07-22T13:53:49Z

The annoying thing about maybe_async is that it will break builds. If a crate is depending on two libraries who both depend on avro, but one enabes sync and the other async. The crate will not build.

Didn't they solve this by introducing two features (is_async and is_sync) and the macro generates separate module/function for each ?

E.g.

#[maybe_async]
async fn foo() {...}

would generate fn foo_sync() when is_sync is enabled and async fn foo_async() when is_async is enabled.

Kriskras99 · 2025-07-22T14:06:33Z

Looking at the docs (specifically the last two codeblocks), it doesn't seem to rename the functions.

martin-g · 2025-07-23T13:15:49Z

https://crates.io/crates/synca - one more contender. I haven't explored it too much.

Kriskras99 · 2025-07-23T14:24:24Z

That looks a lot better than maybe_async. On the state machine front, I also found this article about using async functions to generate state machines which are usable from sync code. It's code is a bit messy because they seem to want to use it for embedded and thus don't want to use the heap.

rfc.md

Kriskras99 · 2025-07-25T17:53:45Z

I've pushed a (non-functional) prototype for parsing with state machines.

The most interesting thing about this prototype is the use of a "command" tape and a "output" tape.
I created the "command" tape as a replacement for a Schema in the state machine, as it's really annoying to keep track of where you are in the Schema (lots of if let Schema::).
The "output" tape replaces the Value while the state machine is still parsing the input. This has the same reason as not using the Schema. As without the schema, the field names are not known (or they could be known but would take a lot of space in the "command" tape).

The implementation is far from perfect and there is a lot that could be improved. For one, the ObjectStateMachine has a lot of duplication going on in the parse function. I'm pretty sure that duplication can be removed quite easily, making it a lot more readable. This implementation also cuts a lot of corners, like EOF handling or compression.

I do think it gives a good idea of what it would be like to have the core be state machines, wrapped in async and sync wrappers.

I currently only have a sync wrapper, but I'll start on an async wrapper just using the futures crate which should be compatible with almost all async runtimes (either through adapters or directly).

Note: I've not implemented converting the schema to a "command" tape. Implementing it is easy, implementing it so that the tape is compact is a bit more complicated. I've also not implemented decoding the "output" tape to a Value or T: Deserialize, this should actually be easy with almost no complexity. There are also more todos scattered around.

Kriskras99 · 2025-07-25T18:16:58Z

I've added the async implementation. ObjectContainerFileReader should really be a thin wrapper around an ObjectContainerFileStateMachine as a lot of logic is shared between the sync and async versions of next_object.
So the iterator/stream implementation are a more accurate view of how it would look to share the core.

Kriskras99 · 2025-07-26T13:27:12Z

I did not check clippy before pushing, as it doesn't function anyway (too many todo!()).
It does pass cargo check.

Kriskras99 · 2025-07-27T13:02:07Z

I've done some small refactoring in the state machine POC. The logic of the Object Container file format is now moved to a state machine, making the sync and async readers really simple. I've also added proper EoF handling.

Kriskras99 · 2025-07-27T13:41:55Z

I also did an attempt at making a SyncA POC, but don't yet known what to do about the iterators

avro/src/state_machines/reading/mod.rs

Kriskras99 · 2025-07-28T13:02:59Z

@martin-g if you like the statemachine approach, I can start work on making it a proper PR (including stuff like tests and handling edgecases). If you'd like to see a POC with SyncA first, that's possible too.

martin-g · 2025-07-28T14:34:06Z

Thanks for all your work on this topic!
I start understanding how SansIO is supposed to work!

Yes, I would to try with SyncA too! Do you mind if I work on it in parallel to you?

At the end either your work or mine will be thrown away (or both) but I'm fine with that since I'll learn something new.

Kriskras99 · 2025-07-28T14:47:27Z

That sounds like a good plan, lets do it in different branches than this rfc one.

Kriskras99 · 2025-08-16T22:22:04Z

I've completed the reader part!

There are a few issues/notes:

snap does not have a streaming decoder, it's easy to implement but getting the PR upstreamed will take some work as the maintainer isn't active on this project anymore
I've disabled the reader::test_from_avro_datum_with_union_to_struct test. In my opinion the provided schema is not compatible with the bytes. The user should instead provide the right schema and use a reader schema for the schema evolution. If we keep this, we should inform users as at least one does depend on it
I don't support having Schema::Null as the root schema in combination with the from_avro_datum* functions. They see an empty byteslice as an EoF. It's possible to add support for this, but I don't think anybody should depend on it.
I've moved all the tests that were in decode.rs to state_machines/reading/mod.rs, so they're not gone!
The current code organization is temporary and can be improved. I can imagine making decode a module and putting the state machines in there, making reader also a module with sync and async submodules. The same goes for the errors.
I did not yet implement direct from tape deserialisation from Serde types. I plan on implementing this, as it's more efficient than going through Value first.
There are a bunch of TODO scattered all over the place, also in places I did not modify. They are potential improvement points I ran into while implementing all of this.
I've temporarily added the rustfmt.toml back, so I can sort imports more easily.

avro/Cargo.toml

avro/tests/io.rs

Kriskras99 · 2025-08-25T09:42:33Z

I think implementing the writer is will be really simple. encode.rs can be kept, as it just writes to the temporary buffer in the writer. So most of the logic is not file/stream related.

Co-authored-by: Martin Grigorov <[email protected]>

Kriskras99 force-pushed the rfc branch from e13e0d7 to 10513f9 Compare July 22, 2025 10:49

martin-g reviewed Jul 22, 2025

View reviewed changes

rfc.md Outdated Show resolved Hide resolved

Kriskras99 commented Jul 23, 2025

View reviewed changes

rfc.md Outdated Show resolved Hide resolved

Kriskras99 commented Jul 23, 2025

View reviewed changes

rfc.md Show resolved Hide resolved

martin-g reviewed Jul 28, 2025

View reviewed changes

avro/src/state_machines/reading/mod.rs Outdated Show resolved Hide resolved

Kriskras99 force-pushed the rfc branch 2 times, most recently from 13b6c32 to 4d803ed Compare August 16, 2025 22:09

martin-g reviewed Aug 25, 2025

View reviewed changes

avro/Cargo.toml Outdated Show resolved Hide resolved

avro/tests/io.rs Outdated Show resolved Hide resolved

avro/tests/io.rs Outdated Show resolved Hide resolved

Kriskras99 and others added 5 commits August 27, 2025 09:11

rfc

f13b44d

Update rfc.md

0ab189e

Co-authored-by: Martin Grigorov <[email protected]>

Update rfc.md

f9f1b6f

Update rfc.md

a0215e7

State machine example

94259ab

Kriskras99 force-pushed the rfc branch from 4d803ed to 94259ab Compare August 27, 2025 09:12

RFC for async #238

Are you sure you want to change the base?

RFC for async #238

Uh oh!

Conversation

Kriskras99 commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

martin-g commented Jul 21, 2025

Uh oh!

martin-g left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Kriskras99 commented Jul 22, 2025

Uh oh!

martin-g commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Kriskras99 commented Jul 22, 2025

Uh oh!

martin-g commented Jul 23, 2025

Uh oh!

Kriskras99 commented Jul 23, 2025

Uh oh!

Uh oh!

Uh oh!

Kriskras99 commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Kriskras99 commented Jul 25, 2025

Uh oh!

Kriskras99 commented Jul 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Kriskras99 commented Jul 27, 2025

Uh oh!

Kriskras99 commented Jul 27, 2025

Uh oh!

Uh oh!

Kriskras99 commented Jul 28, 2025

Uh oh!

martin-g commented Jul 28, 2025

Uh oh!

Kriskras99 commented Jul 28, 2025

Uh oh!

Kriskras99 commented Aug 16, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Kriskras99 commented Aug 25, 2025

Uh oh!

Uh oh!

Kriskras99 commented Jul 21, 2025 •

edited

Loading

martin-g commented Jul 22, 2025 •

edited

Loading

Kriskras99 commented Jul 25, 2025 •

edited

Loading

Kriskras99 commented Jul 26, 2025 •

edited

Loading