How should pairs be dealt with?
#36
Replies: 4 comments 6 replies
-
Inefficient both ways, while possible.
It is worthy of note that we could intend to allow for users to serialize multiple values in a buffer using the internal functions, in a future version. So, storing at the beginning or end could become more complex. What if we store as we go? We can say:
|
Beta Was this translation helpful? Give feedback.
-
Could we use the values like they are documented, but instead of having users define them, we take the first N values and have them incredibly cheap? An issue is that we will rely on the formatting discussed by the prior comment. Whenever it gets resolved. Could we allocate them for special cases. Like vector2s. Or research and see how other formats use their values and use some of the ideas. I don't think there are many I would be willing to implement though. Table could be improved based on the case, but it makes serialization much slower. |
Beta Was this translation helpful? Give feedback.
-
|
I just thought of this, but what about when a user repairs an id to another value of the same type? When serializing it should say id-value and for the other value it does id-otherValue. However that will corrupt all instances of the other value when serializing. Thus it'll correctly deserialize, as corrupted data. Only thought of this since I was like "What if I just reuse pair ids to do unions". The fix would be to somehow know when an id has been reused by a different value and be able to "re-assign" it when needed to another value. Nothing needs to change for deserialization, only stuff for serialization. We either error or do the above or do nothing/warn users. |
Beta Was this translation helpful? Give feedback.
-
Ha.. I have been thinking about this. So what exactly would it entail and what benefits would be seen? From a code standpoint, it will be much more readable and maintainable without pairs. From a user standpoint it removes an odd quirk that is seldomly used and hinders performance, it also reduces what needs to be considered. From a performance perspective it allows us to easily remove parts that we knew were less performant for the general case, like constants table. The redesign could pull from reformat stringpack and extend it to use all of the string pair bytes. I don't personally like the approach, and even redesigning could be a wrong approach. We could just take away those bytes and have large gaps to keep compatibility. I think we can even keep deserialize the same but remove pairs from serialize. It won't harm most users and is a safer way to migrate. Why am I rethinking about this? Since existing already takes the brunt of reducing output size, shaving a few bytes off is not worth it. In benchmarks, it was an incredibly underwhelming difference between pairs and pairless. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Mentioned in #34:
What are the brought up issues?
Pairs introduces migration pains, mainly that adding/adjusting a pair between dataset versions could break compatibility.
Deserializing the data when not provided the list of pairs results in malformed data. Or it errors, depending on the version.
Example of an issue:
{ A = 1, C = "" }->{ C = 1 } | { C = "" }(could be either) [Sending old data to a new server]{ A = 1, C = "" }->{ A = 1 } | { A = "" }(could be either) [Sending new data to an old server]Potential Solutions?
Beta Was this translation helpful? Give feedback.
All reactions