Skip to content

Conversation

@ableegoldman
Copy link
Contributor

This PR is mainly a POC to show that it can be done -- we'll need to wait for 4.0 to be released before merging this. It also includes the changes we'll need to make for that upgrade (eg removal of deprecated APIs)

Until we can merge and release this new feature, it has to be run on top of my WIP KIP-1112 branch (here).

Remaining work beyond this POC:
-AK: converting all of the other stateful DSL operators to implement the ProcessorSupplier#stores method
-Responsive: fill in the async store builders with all the store types (eg versioned stores, window and session stores)

@ableegoldman ableegoldman requested a review from agavra November 19, 2024 07:25
id("responsive.docker")
}

repositories {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to put this into all the build.gradle files so I could compile from my local AK branch. I can remove all of these before merging if anyone feels strongly about it, but if it doesn't do any harm then I'd rather keep these for the next time I'm working off of a dev branch


builder.stream(ORDERS, Consumed.with(Serdes.String(), RegressionSchema.orderSerde()))
.transform(BatchTransformer::new, "grouped-orders-store")
.process(BatchTransformer::new, "grouped-orders-store")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not part of async -- related to the 4.0 upgrade

}

final Object o = configs.originals().get(InternalConfig.INTERNAL_TASK_ASSIGNOR_CLASS);
final Object o = configs.originals().get(StreamsConfig.TASK_ASSIGNOR_CLASS_CONFIG);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not part of async -- related to the 4.0 upgrade

final Set<StoreBuilder<?>> userStoreBuilders
final ProcessorSupplier<KIn, VIn, KOut, VOut> userProcessorSupplier
) {
if (userStoreBuilders == null || userStoreBuilders.isEmpty()) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to KIP-1112 -- this check was supposed to be deleted already but I must have missed it before (the corresponding check in FixedKeyProcessorSupplier was correctly removed at that time)


private void maybeInitializeAsyncStoreBuilders() {
if (asyncStoreBuilders == null) {
asyncStoreBuilders = initializeAsyncBuilders(userProcessorSupplier.stores());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to delay the initialization of the async store builders because the call to the inner processor's #stores method might return different things at different points in the build process (for example, if a downstream operator forces materialization, the #stores will return null/empty until that downstream operator is processed)

final String processorName,
final ProcessorSupplier<KIn, VIn, KOut, VOut> processorSupplier
) {
return AsyncProcessorSupplier.createAsyncProcessorSupplier(processorSupplier);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we really want to wrap every processor as async -- WDYT about only wrapping stateful operators by default?

(We can of course still allow users to inject their own custom async processors if they have some heavier processing without state like an RPC)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking we'd leverage configure() to allow us to plug into exactly this consideration. Not sure how we want to expose that as an API, but we can figure that out later. For now we can do just stateful ones. Like you said, if it's something stateless they want to make async it's pretty easy to write a processor for it manually.

) {
super(records);
super(records, Collections.emptyMap());
// TODO(sophie): need to pass in the actual next offsets here?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to the upgrade -- a recent KIP added offset tracking to the consumer records, haven't looked into it much yet but I suspect Streams may be using this in some way and so we might need to actually implement this

import org.apache.kafka.streams.state.StoreBuilder;
import org.apache.kafka.streams.state.TimestampedBytesStore;

public class DelayedAsyncStoreBuilder<K, V, T extends StateStore>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this class is basically where all the nasty hacky stuff is confined to. For now there are two main culprits:

  1. checking and unwrapping the FactoryWrappingStoreBuilder, an internal AK class I had to add while converting DSL operators to use #stores, so that the DSLStoreSuppliers are resolved correctly
  2. some "light" reflection to grab the storeSupplier from a storeBuilder instance

@ableegoldman ableegoldman changed the title DSL w/ async: e2e implementation for stream-table join [DRAFT] Automatic async wrapping for stateful processors Dec 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants