Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DEVEX-222] Add built-in auto-serialization #329

Merged
merged 49 commits into from
Feb 27, 2025

Conversation

oskardudycz
Copy link
Collaborator

@oskardudycz oskardudycz commented Jan 23, 2025

Motivation

KurrentDB users should be able to quickly feel efficient in using basic operations like appending new messages, reading them or subscribing to notifications about them. The current API is not fulfilling it as it doesn't provide safe defaults for the most common scenarios like:

  • building the state from events,
  • appending new events as an outcome of the business logic,
  • subscribing to stream etc.

Currently, they need to consider immediately in which order to read the stream, what the deadline is, and what max count to provide. And most importantly, how to serialize data and which serializer to use, as there's none. All of those customisations are needed in the end but shouldn't be thrown immediately on users if they just want to use the default recommended way. In most development environments, you can find the default, mature choices for serialization.

Such code may not be complex; once you get it right, you don't need to change it too often. But if it's stacked with multiple other things you need to keep in mind, then it's easy to miss something. Most importantly, we shouldn't require our users to build their own wrappers around KurrentDB, but we should provide an accessible API to benefit users from using KurrentDB from the get-go.

Solution

The code adds automatic serialization of message payloads. It uses JSON as a default but allows users to implement the binary format themselves. Thanks to that, user can:

  • append plain messages ,
  • use a simplified Message wrapper that takes data, and optionally metadata and message-id, this hides the unwanted complexity of EventData,
  • read messages deserialised automatically. Thanks to that, the user doesn't need to think about which resolved event is correct (event, original event, they can just use Message or DeserializedData fields).
  • subscribe to messages deserialised automatically.

To introduce those capabilities, it was necessary to introduce new methods not to make breaking changes. Having that, I benefited from it and reshaped their APIs.

Now, they just take mandatory data as parameters (e.g. stream name when appending or reading from stream) and pack the rest of the optional parameters into the options parameter. It also wraps safe defaults. That's aligned with how Java and Node.js clients are doing.

Thanks to that, users can use such simple APIs as:

var streamName = $"users-{Guid.NewGuid()}";

// Appending
var messages = [new UserRegistered(email, name)];
client.AppendToStreamAsync(streamName, messages);

// Appending with options
var messages = [new UserRegistered(email, name)];
await client.AppendToStreamAsync(
    streamName
    messages, 
    new AppendToStreamOptions{ ExpectedStreamState = StreamState.NoStream }
);

// Reading
List<UserRegistered> readMessages = 
    client.ReadStreamAsync(streamName)
    .Select(e => e.DeserializedData).ToListAsync();

// Reading with options
List<UserRegistered> readMessages = 
    client.ReadStreamAsync(streamName, new ReadStreamOptions { MaxCount = 3 })
    .Select(e => e.DeserializedData).ToListAsync();

// Subscribing
await using var subscription = client.SubscribeToStream(streamName);

// Subscribing with options
await using var subscription = client
    .SubscribeToStream(streamName, new SubscribeToStreamOptions{ Start = FromStream.End})

Next steps

After getting approval for changes in the .NET client, that should be applied accordingly to other clients (starting from Java and Node.js).

This change is foundational for enabling scenarios like getting state from events, running business logic, and appending new events as an outcome.

Details

0. No intentional breaking changes were made.

1. This PR introduces the default System.Text.Json serializer and uses it both for JSON and Binary serialization. Users can change the default serialization and also override it through each operation option.

2. Automatic serialization is only enabled for new methods. Thanks to that, people won't get additional overhead upon migration and will be motivated to migrate to the new methods. I didn't make old methods obsolete, but I'd suggest marking them as [Obsolete("Use the new method x. This method may be removed in future releases", false)]. Such registration won't be marked as a warning but as information, so we won't have friction and confusion. This would also give a hint for existing users that they can use new methods.

3. A basic client-side schema registry was introduced. Currently, it's a simple implementation that is responsible for finding the proper serializer based on the content type and mapping between the message type name and CLR type. Default convention is: {categoryName}-{CLR Type Full Name}.

The CLR name will be of course, specific to .NET, but if customers are also using other clients (e.g. Java, Node.js) then they can override the convention by injecting a custom implementation of the IMessageTypeNamingStrategy interface into settings. They can also define different, more universal conventions that are not based on C# or Java types, but then they'll also need to register those types up front.

Users can register message type mapping upfront or benefit from the automated registration. The automated registration will try to load type from assemblies available in AppDomain. It won't load assemblies. No assembly loading code is provided out of the box or registration by attribute. This can be added to the follow-up PR if needed. We could also use Assembly Qualified Name; then, we'd get assembly loading out of the box.

The Mappings are cached in the type registry to improve performance.

The automatic registration was added to allow automatic use of KurrentDB and diverge convention when needed.

4. General serialization settings can be set through client settings, which include:

  • Changing the default content type to octet stream,
  • Changing the default System.Text.Json options. That should help to adjust to settings that users are currently using (if they're different from the recommended ones),
  • Overriding serializer implementations. User can implement the new ISerializer interface for a specific content type. For now, I haven't provided a serializer for Protobuf and Avro, as the PR is already big. This can (and should) be added later.
  • Changing type name mapping convention,
  • Register type mappings manually. This can help register types manually upfront or those that are not following the default conventions.
  • Register CLR message types for the stream category. They will be mapped using the default (or registered) naming conventions.
  • Registering the custom metadata type. For now default serializer either takes TraceMetadata or same metadata type for all messages. Serialization will take data as it is. The more nuanced default handling can be provided by the user in the custom IMessageTypeNamingStrategy implementation. We can also expand it in the future.

All of that can also be customized for each operation.

5. The central piece of serialization is MessageSerializer class that orchestrates the serialization code. It does that by:

  • resolving message type using the registered in schema registry mapping strategy,
  • taking the serializer from the schema registry based on the intended content type,
  • (de)serializing message data using it,
  • (de)serializing message metadata as JSON

6. Old methods were made of wrappers for new ones. Thanks to that, the old test suite also covers new methods, and there was no need to repeat it. It'll also be easier to drop them in the future. The integration tests were added, covering additional scenarios.

TODO:

  • ensuring that the minimum set of types are expected, and the rest are made internally,
  • detailed unit tests after getting an agreement on the design.

…ate method

This should enable easier customisation like changing serializer settings.
@oskardudycz oskardudycz force-pushed the DEVEX-222-AutoSerialization branch 3 times, most recently from 748955e to 64d583d Compare January 23, 2025 13:46
@oskardudycz oskardudycz force-pushed the DEVEX-222-AutoSerialization branch 2 times, most recently from 59666ee to d12c429 Compare January 23, 2025 15:15
…erialization

It's not fully ready, as it has hardcoded schema serializer. Main question will be how much from schema registry I need to move here.
@oskardudycz oskardudycz force-pushed the DEVEX-222-AutoSerialization branch from d12c429 to 001f1a0 Compare January 23, 2025 15:24
…rom ResolvedEvent

Per @RagingKore suggestion it'll be better not to keep the reference to serializer in ResolvedEvent to keep the clear intention behind it.

Now, the serializer is resolved from schema registry based on the auto-serialization settings.

Made possible to register more than one serializer per content type. Currently it doesn't support more schema definition types. To be discussed how to include them later on.
That will make resolution easier and give users ability to inject their own serializers implementation
…rialization Context

Previously we had DeserializationContext, but after consideration, it'll be better to merge those concepts to make easier customizations per operations (e.g. append, subscribe, etc.).
Refactored the code accordingly. It takes the full object instead of the limited number of parameters, as you may be using metadata to get parameters about clrtype.
@oskardudycz oskardudycz force-pushed the DEVEX-222-AutoSerialization branch from 6677a46 to 64c79e7 Compare January 29, 2025 10:00
@oskardudycz oskardudycz force-pushed the DEVEX-222-AutoSerialization branch from 4e88a61 to 786caf3 Compare January 29, 2025 11:04
…pper to make clearer responsibilities

Actually, it's just wrapping message serializer based on the client settings.
Removed also obsolete SerializationSettings
… IMessageSerializer

There's no need to force all code to know about wrapper existence, it's okay to just create it during the setup.
…ied syntax

Now they don't require all the fancy, but redundant in most cases Stream Positions, Directions, EventData etc.
…ad of parameters

Thanks to that safe defaults are already provided
@oskardudycz oskardudycz force-pushed the DEVEX-222-AutoSerialization branch 3 times, most recently from 0cdb254 to 869e503 Compare February 25, 2025 13:11
@oskardudycz oskardudycz force-pushed the DEVEX-222-AutoSerialization branch 7 times, most recently from 8569d78 to 32461a5 Compare February 26, 2025 08:30
@oskardudycz oskardudycz force-pushed the DEVEX-222-AutoSerialization branch from 32461a5 to e938afe Compare February 26, 2025 08:33
Used also two external assemlies
- one that's loaded,
- one that's never loaded.

To double-check behaviour of loading types from different assemblies.
@oskardudycz oskardudycz force-pushed the DEVEX-222-AutoSerialization branch from a490f31 to 1af1def Compare February 26, 2025 11:27
@w1am w1am merged commit 44607ea into DEVEX-185-Rebranding Feb 27, 2025
45 checks passed
w1am added a commit that referenced this pull request Feb 27, 2025
w1am added a commit that referenced this pull request Feb 27, 2025
@VaughnVernon
Copy link

Suggestion for refactoring from standard defaults to custom serialization, etc.

(This is probably the wrong place to suggest but I thought I should offer while I read reading @oskardudycz spec.)

@alexeyzimarev alexeyzimarev deleted the DEVEX-222-AutoSerialization branch March 10, 2025 15:47
@alexeyzimarev
Copy link
Member

Hey @VaughnVernon, we'll be updating the docs closer to the new client package release date. There's a lot of work planned for both clients and related docs. We realise that our client docs are between average and bad depending on who's looking and we aim to make clients docs substantially better.

@VaughnVernon
Copy link

@alexeyzimarev Sure. I didn't intend to sound critical about the docs in general. I was just thinking about possible improvements to help with the necessary steps to optimize serialization, etc., that @oskardudycz mentioned in the above description and/or comments. With a sort of checklist, developers can follow the "cookbook" so they don't forget an essential step and not understand why it's not working.

@oskardudycz
Copy link
Collaborator Author

@VaughnVernon appologies for the delayed answer. I think that cookbook is a good suggestion. We'll need to think how to include that in the documentation. I think that those changes can make actually easier to explain gradually how KurrentDB works, as now what's default is the recommended way (e.g. autoserialization, deadlines, etc.). So they can just start appending and reading messages. Then we can explain why and when may consider using additional options. For that checklists make a good sense imho.

@VaughnVernon
Copy link

@oskardudycz Yes, the default, opinionated way is essential. Up-and-running makes it easier on the supported and the supporters. Then when it's time to get serious about optimizations, the advanced support will help the supported thread 5 to 7 needles.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants