Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicit polymorphism discriminator #2903

Open
Tmpod opened this issue Jan 19, 2025 · 8 comments
Open

Explicit polymorphism discriminator #2903

Tmpod opened this issue Jan 19, 2025 · 8 comments

Comments

@Tmpod
Copy link

Tmpod commented Jan 19, 2025

What is your use-case and why do you need this feature?

I've been experimenting with MongoDB through KMongo and KtMongo, specifically around polymorphic documents. Both kbson (used by KMongo) and the official Kotlin Mongo driver (used by KtMongo) support polymorphic serialization, but always through a virtual field (whose name is dependent on the format).

This isn't ideal. You have to resort to clunky workarounds to use the type-safe query API, you have use SerialName everywhere and in general it isn't intuitive at all. Implicit fields make code harder to reason about, and often require diving into library sources to really understand what's happening.

Here's an example with current implicit fields
enum class FooType { A, B }

@Serializable
sealed interface Foo {
    val common: Int
    val type: FooType
}

@Serializable
@SerialName("A")
class FooA(
    override val common: Int,
    val bar: Long,
) : Foo {
    // Can't be concrete otherwise you get weird behaviour between
    // the type field generated by kx.ser (with value from @SerialName)
    // and the actual concrete field
    override val type get() = FooType.A
}

@Serializable
@SerialName("B")
class FooB(
    override val common: Int,
    val baz: String,
) : Foo {
    override val type get() = FooType.B
}

suspend fun someQueries(db: Database) {
    db.getCollection<FooA>().insertOne(FooA(common = 1, bar = 2))
    db.getCollection<FooB>().insertOne(FooB(common = 1, baz = "3"))
    
    // Only works if the format-specific discriminator is called "type".
    // If it is something else you either have to change the property name
    // or use @SerialName
    val fooA = db.getCollection<FooA>().findOne { Foo::type eq FooType.A }
    val fooB = db.getCollection<FooB>().findOne { Foo::type eq FooType.B }
    val foo = db.getCollection<Foo>().findOne()
    println(fooA.bar)
    println(fooB.baz)
    println(foo.type)
}

Describe the solution you'd like

Let's consider a new @PolymorphicDiscriminator annotation that tells kx.ser and implementing formats which field should be used for polymorphic (de)serialization. One could write the following instead:

The same example but using this suggested annotation
enum class FooType { A, B }

@Serializable
sealed class Foo(
    @PolymorphicDiscriminator
    // Could also use @SerialName here if wanted
    val type: FooType,
    val common: Int,
)

@Serializable
class FooA(
    common: Int,
    val bar: Long,
) : Foo(FooType.A, common)
// EDIT: I'm aware these constructors are invalid with @Serializable,
// you'd have to override instead, like in the previous example.

@Serializable
class FooB(
    common: Int,
    val baz: String,
) : Foo(FooType.B, common)

suspend fun someQueries(db: Database) {
    db.getCollection<FooA>().insertOne(FooA(common = 1, bar = 2))
    db.getCollection<FooB>().insertOne(FooB(common = 1, baz = "3"))

    // We are now sure the type-safe API will resolve to the correct discriminator name,
    // because that's defined by the user, not the format developer
    val fooA = db.getCollection<FooA>().findOne { Foo::type eq FooType.A }
    val fooB = db.getCollection<FooB>().findOne { Foo::type eq FooType.B }
    val foo = db.getCollection<Foo>().findOne()
    println(fooA.bar)
    println(fooB.baz)
    println(foo.type)
}

I'm not well versed in compiler plugins or annotation processors, so I'm not entirely sure on the implementation details, but all the semantic information required for such an annotation exist. When you're processing a class that is part of a sealed hierarchy, you check if the value it defines is always known at compile-time and if it is unique (e.g. you can't have two implementors use the same discriminator value). I'd even say the value could be anything directly matchable in when expressions.
If the values are all consistent, you can generate the polymorphic deserialization function which selects the correct implementor serializer based on the decoded discriminator, akin to what already happens on the current implementation based on an implicit field.


Please correct me if anything I've said is wrong or if my suggestion is unattainable at the moment. Thank you!

@Tmpod Tmpod added the feature label Jan 19, 2025
@CLOVIS-AI
Copy link

Hey, KtMongo author here. I'll give a bit more context.

KtMongo relies on KotlinX.Serialization to convert Kotlin types to and from MongoDB BSON. Through this, we support sealed class and sealed interface simply because KotlinX.Serialization supports them, using an imaginary field type which contains a discriminant.

KtMongo provides an extensive DSL over MongoDB operators. This DSL is implemented through extension functions and generics, allowing us to refer to fields when writing requests:

@Serializable
class User(
    val name: String,
    val age: Int,
)

users.find {
    User::age gt 18
}

So far, so good. The problem comes because of the imaginary field KotlinX.Serialization creates on sealed types:

@Serializable
sealed class Foo

@Serializable
class FooA(val a: Int) : Foo()

@Serializable
class FooB(val b: String) : Foo()

A instance of FooB("foo") will be serialized as:

{
    "type": "FooB",
    "b": "foo"
}

We would like to be able to write a query for specifically instances of FooB, but this requires referring to the imaginary "field" type:

users.find {
    User::foo / Foo::type eq "foo"    // ⚠ Foo::type does not exist
}

This could be worked around by creating an extension function and hard-coding the name type, but KotlinX.Serialization will actually name the discriminant something else if there is already a field named type, so there is no good typesafe way of knowing how the field is called.

By using explicit enums, this issue removes the magic around the discriminant, so it can be accessed through regular Kotlin code.

@pdvrieze
Copy link
Contributor

@Tmpod The information you are looking for is where @SerialName is applied to the type of the serialized object (defaulting to the FQN of the type). You can give it any desired value (but it should be unique within the context of a single format instance).

@CLOVIS-AI You can get this information as text from the descriptor: serializer<Foo>.descriptor.serialName

@Tmpod
Copy link
Author

Tmpod commented Jan 20, 2025

@CLOVIS-AI Thank you for adding more context!

This could be worked around by creating an extension function and hard-coding the name type, but KotlinX.Serialization will actually name the discriminant something else if there is already a field named type, so there is no good typesafe way of knowing how the field is called.

What I'm proposing would avoid all that hardcoding. By explicitly marking a field as the discriminant, you remove the unintuitive behaviour (imo) of virtual fields. The serialization library knows which field it should encode/decode first and how to map values to serializers. I just don't know how feasible it is to get the discriminant value from each implementor in the context of an annotation processor or compiler plugin. To me, it should be possible, the value is static, but I really am not familiar with the capabilities exposed to processors/plugins (though I've already noticed limitations, such as annotations being incapable of taking values I'd consider compile-time constants, such as enum variants).

The information you are looking for is where @SerialName is applied to the type of the serialized object (defaulting to the FQN of the type). You can give it any desired value (but it should be unique within the context of a single format instance).

I'm not sure I follow. I used @SerialName in my first example, but the point is that, because it is useful to access the type of documents both for querying Mongo and other situations (without having to resort to when matches), kx.ser should allow explicit marking of discriminants in closed polymorphic hierarchies. It would also have the added benefit of making the code clearer to developers, since there's less "magic" involved — you know exactly which field is used to decide how to deserialize.

@sandwwraith
Copy link
Member

I guess what you want here is #1664, so it's a duplicate.

I also would like to point out that kotlinx.serialization by default always performs polymorphic serialization in array format: [type, object]. Insertion of type keyword is a Json-specific code path.

@pdvrieze
Copy link
Contributor

@CLOVIS-AI Thank you for adding more context!
What I'm proposing would avoid all that hardcoding. By explicitly marking a field as the discriminant, you remove the unintuitive behaviour (imo) of virtual fields.
This is inconsistent with the design of the library (and thus would be very hard to introduce). The way the type is exposed is in many ways format specific, and doesn't need to take the shape of a synthetic attribute. In all cases it must be constant, so having it as an attribute is poor design. Note that the structure exposed to formats is (in pseudo-json): "myPolyValue"= [ "ConcreteTypeSerialName", { /* some struct value */ }]. Many formats (including json) collapse this into a single struct (or for json as an array), but that is a format specific transformation.

The serialization library knows which field it should encode/decode first and how to map values to serializers. I just don't know how feasible it is to get the discriminant value from each implementor in the context of an annotation processor or compiler plugin. To me, it should be possible, the value is static, but I really am not familiar with the capabilities exposed to processors/plugins (though I've already noticed limitations, such as annotations being incapable of taking values I'd consider compile-time constants, such as enum variants).

I'm not sure I follow. I used @SerialName in my first example, but the point is that, because it is useful to access the type of documents both for querying Mongo and other situations (without having to resort to when matches), kx.ser should allow explicit marking of discriminants in closed polymorphic hierarchies. It would also have the added benefit of making the code clearer to developers, since there's less "magic" involved — you know exactly which field is used to decide how to deserialize.

Do you mean you want to read the value of the type without using runtime type information (using the fact that the instance is of the specific type). As to what "field" is used to decide how to deserialize, the name is always the same (except when explicitly overridden). In the specific case of Json this can be provided either globally in the format configuration, or per type (not attribute) using @JsonClassDiscriminator, this annotation is then inherited.

As to "not having to resort to when matches" you can use a calculated property:

@Serializable
abstract class Base {
    abstract val type: String
}

@Serializable
class Child(val myProp:String) {
    override val type: String get() = Child.serializer().descriptor.serialName // or a hardcoded/cached version
}

@Tmpod
Copy link
Author

Tmpod commented Jan 20, 2025

I guess what you want here is #1664, so it's a duplicate.

I suppose it is related, yes, but what I was suggesting was more versatile, as I see it. The referenced issue is also in part about nested hierarchies and

I also would like to point out that kotlinx.serialization by default always performs polymorphic serialization in array format: [type, object]. Insertion of type keyword is a Json-specific code path.

I had read about that, but I'm not grasping the reasoning behind why kx.ser doesn't support field-based discriminants at the core library level. Why does it need to be a format-specific thing? Either way, an annotation that could tell the format implementors which field should be used for discriminating would be nice, instead of having each format re-implement the same annotations (kbson has a similar BsonClassDiscriminator) annotation.

@CLOVIS-AI
Copy link

As to "not having to resort to when matches" you can use a calculated property:

@serializable
abstract class Base {
abstract val type: String
}

@serializable
class Child(val myProp:String) {
override val type: String get() = Child.serializer().descriptor.serialName // or a hardcoded/cached version
}

Can you clarify what the value will return? "type" or "Child"? KtMongo needs to know both.

Also, that is a bit verbose to add to each class. Could there be a way to automatically generate this somehow?

@Tmpod
Copy link
Author

Tmpod commented Jan 20, 2025

Do you mean you want to read the value of the type without using runtime type information (using the fact that the instance is of the specific type).

Yeah. Since that field exists in the serialized document, why not have it in the deserialized object too? Makes some things easier, and in the case of KMongo/KtMongo's and similar DSLs, it's important to have it.

As to what "field" is used to decide how to deserialize, the name is always the same (except when explicitly overridden). In the specific case of Json this can be provided either globally in the format configuration, or per type (not attribute) using @JsonClassDiscriminator, this annotation is then inherited.

I am aware of that annotation, however it's format specific (and string-based, but that's a minor nitpick and a limitation of annotations). I think an annotation on the field itself would be clearer to developers — this field will be used to resolve polymorphic serializers; no virtual fields at play, no using @SerialName in classes (unintuitive, at least to me). It would also work across formats, even if they can only implement array-based polymorphism.

Again, not entirely sure if this is truly implementable, but perhaps this could work: generate and expose a getter for the discriminant field (its SerialDescriptor, maybe?), that the implementing formats and serializers could use to know which field to encode first and how to match the first decoded value to select the deserializer. It sounds to me like it could even be implemented a separate thing, though it would be more useful to have that built into the core library itself.

If this sounds plausible, I could try implement it to help illustrate what I mean by all this 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants