Skip to content

Conversation

@dtolnay
Copy link
Member

@dtolnay dtolnay commented Sep 15, 2025

No description provided.

@oli-obk
Copy link
Member

oli-obk commented Sep 15, 2025

As a defense against casual users of the private types? Because the same scheme can be used to keep using the private stuff in a downstream crate.

Not sure this kind of extra defense is worth it

@dtolnay
Copy link
Member Author

dtolnay commented Sep 15, 2025

I think it is low cost and decent benefit. If there is any logic in there that someone wants access to, they need to find someone to maintain it officially so that their users do not experience repeated breakage from serde updates.

@dtolnay dtolnay merged commit 5b37e8a into serde-rs:master Sep 15, 2025
14 checks passed
@dtolnay dtolnay deleted the private branch September 15, 2025 20:20
PartiallyUntyped pushed a commit to awslabs/aws-lambda-rust-runtime that referenced this pull request Sep 17, 2025
serde broke these in <serde-rs/serde#2980>

smithy depends on lambda-http 0.8 until the HTTP 1 migration can be
done, so this is needed.
PartiallyUntyped added a commit to awslabs/aws-lambda-rust-runtime that referenced this pull request Sep 17, 2025
serde broke these in <serde-rs/serde#2980>

smithy depends on lambda-http 0.8 until the HTTP 1 migration can be
done, so this is needed.

Co-authored-by: Ariel Ben-Yehuda <[email protected]>
@juntyr
Copy link

juntyr commented Sep 20, 2025

As a maintainer of RON, this feels not good. We have been hoping and wishing and praying for #2912. As a format that isn't JSON but does want to support serde attributes, looking at serde internals (the type name for serde Content, TagOrContent, and the automatically derived expected error for structs deserialized as maps) is the only way to provide these features currently. I don't like the situation, clearly you don't like the situation. But just breaking it entirely without a heads-up or alternative seems ... not good.

Please consider releasing #2912, which would already be a major step forward and eliminate RON's primary dependency on serde internals. After that we could work together to see how information that RON (and other formats) may need can be exposed the proper way.

Thank you for your help

@dtolnay
Copy link
Member Author

dtolnay commented Sep 20, 2025

What specifically is broken entirely without an available alternative @juntyr? Is it basically just the fn is_serde_content<T>() -> bool that is mentioned in the linked issue?

@juntyr
Copy link

juntyr commented Sep 20, 2025

RON currently requires hacks for the following:

  • untagged enums -> detect serde content (and then switch to a JSON-like deserialisation mode that is a hack by itself); properly detecting serde content would be a mid-term fix, long term requires custom content
  • internally tagged enums -> detect TagOrContent
  • flattened structs -> structs look like maps (which in RON is two distinct concepts); here we currently check if the expected message of the visitor starts with "struct " even though RON is given a map to convert it back to a struct

Basically, in an ideal world, serde would just provide the attributes and then call special methods during serialisation and deserialisation to handle them. By default, these methods would do the same as now. With such special methods, it might even be possible to avoid custom content types entirely if the deserialiser is just told the end purpose, e.g. "please try to deserialise this enum type, but in untagged mode"

@dtolnay
Copy link
Member Author

dtolnay commented Sep 20, 2025

I understand that you are wishing for something along the lines of #2912 in the future, along with other things, but that is unrelated to this PR. I am interested in understanding why ron-rs/ron#583 (comment) says this PR makes RON nearly impossible.

@juntyr
Copy link

juntyr commented Sep 20, 2025

Yes of course - well this PR is specifically to prevent users relying on serde internals (a good goal in general). RON had relied on detecting the content type to support untagged enums, something our users clearly make use of. We used the type name for this detection, a hack in many respects (since Rust discourages relying on the format as well, and that actually broke a month earlier when Rust started including lifetimes in the type name). Moving the type of course also broke the type name. But both of those could have been fixed with a match over all type names from the versions RON is compatible with. Now, we could of course try to overfit on the current format by allowing the version number in the path name, though I have a feeling that this is not the way you want RON to go.

So we need a way that works for serde and RON. That is either some way for RON to detect content (and tag or content) that serde can guarantee to remain available so long as those types remain in use. Or, preferably, it would be a way for RON to not have to worry about content anymore, e.g. by allowing RON to provide its own content type (which would also remove some other hacks).

@dtolnay
Copy link
Member Author

dtolnay commented Sep 20, 2025

I recommend doing something like the following, which is written using the public API of serde. A Deserializer has always been able to tell what type is being deserialized from it. None of type_name or use of serde internal data structure paths is needed for this.

// [dependencies]
// serde = { version = "1", default-features = false, features = ["alloc"] }
// serde_derive = "1"
// typeid = "1"
use serde::de::{Deserialize, Deserializer, Visitor};
use serde_derive::Deserialize;
use std::any::TypeId;
use std::fmt::{self, Display};

fn type_id_of_untagged_enum_default_buffer() -> TypeId {
    #[derive(Deserialize)]
    enum A {}
    type B = A;

    #[derive(Deserialize)]
    #[serde(untagged)]
    enum UntaggedEnum {
        A(A),
        B(B),
    }

    struct TypeIdDeserializer;

    impl<'de> Deserializer<'de> for TypeIdDeserializer {
        type Error = TypeIdError;
        fn deserialize_any<V>(self, _visitor: V) -> Result<V::Value, Self::Error>
        where
            V: Visitor<'de>,
        {
            Err(TypeIdError(typeid::of::<V::Value>()))
        }
        serde::forward_to_deserialize_any! {
            bool i8 i16 i32 i64 i128 u8 u16 u32 u64 u128 f32 f64 char str string
            bytes byte_buf option unit unit_struct newtype_struct seq tuple
            tuple_struct map struct enum identifier ignored_any
        }
    }

    #[derive(Debug)]
    struct TypeIdError(TypeId);

    impl Display for TypeIdError {
        fn fmt(&self, _formatter: &mut fmt::Formatter) -> fmt::Result {
            Ok(())
        }
    }

    impl serde::de::Error for TypeIdError {
        fn custom<T: Display>(_message: T) -> Self {
            unreachable!()
        }
    }

    impl serde::de::StdError for TypeIdError {}

    match Deserialize::deserialize(TypeIdDeserializer) {
        Ok(UntaggedEnum::A(void) | UntaggedEnum::B(void)) => match void {},
        Err(TypeIdError(typeid)) => typeid,
    }
}

fn is_serde_content<T>() -> bool {
    typeid::of::<T>() == type_id_of_untagged_enum_default_buffer()
}

fn main() {
    println!("{}", is_serde_content::<serde::__private225::de::Content>());
}

@juntyr
Copy link

juntyr commented Sep 21, 2025

Thank you, that works! I was able to use this approach to detect the default buffer and the tag or buffer types.

We use another hack to detect flattened structs (see https://github.com/ron-rs/ron/blob/cdd9fccd76e4c2f515706453f7c0e4351b8f3784/src/de/mod.rs#L679). Do you have an idea for how to do that one better?

We have one test and our main fuzzer that also depended on serde internal to fake specific deserialisation code paths (effectively fuzz over any Rust data type and serde attribute that would be possible). Obviously we could just pin a specific version of serde to know the private module path or implement the same logic (since this is just a test and so nothing would break for users if internals change). However, it would be better of course to just do things right. What would be the correct way to exercise these code paths? see https://github.com/ron-rs/ron/blob/master/tests/non_string_tag.rs and https://github.com/ron-rs/ron/blob/master/fuzz/fuzz_targets/bench/lib.rs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

serde::__private::de::Content::deserialize not found in Content<'_> in serde_derive v1.0.224

3 participants