How should `pairs` be dealt with? #36

nwinn-student · 2025-10-21T22:14:52Z

nwinn-student
Oct 21, 2025
Maintainer

Mentioned in #34:

What are the brought up issues?

Pairs introduces migration pains, mainly that adding/adjusting a pair between dataset versions could break compatibility.
- As such, using pairs forces careful consideration of the past, current, and future dataset formats.
Deserializing the data when not provided the list of pairs results in malformed data. Or it errors, depending on the version.

Example of an issue:

local Sample = { A = 1, C = "" }

Version 1: pair(1, "A")
Version 2: pair(1, "C")
V1 -> V2: { A = 1, C = "" } -> { C = 1 } | { C = "" } (could be either) [Sending old data to a new server]
V2 -> V1: { A = 1, C = "" } -> { A = 1 } | { A = "" } (could be either) [Sending new data to an old server]

Potential Solutions?

Remove pairs
- Complete rework of the underlying binary format since 50%+ of the values are set aside for pairs
Forbade altering existing pairs. Also suggest caution regarding adding pairs due to compatibility breakages.
Store the used pairs at the beginning or end of the serialized buffer, similar to CBOR storing the schema (I think?).

nwinn-student · 2025-10-21T22:52:17Z

nwinn-student
Oct 21, 2025
Maintainer Author

Store the used pairs at the beginning or end of the serialized buffer

Inefficient both ways, while possible.

For serializing, we would need to shift the entire buffer over for possibly a few bytes. (Beginning)
For deserialization, we would need to somehow know where the end of the buffer is. (End)
For serializing, adding the pairs to the end works. For deserialization, adding the pairs to the beginning works.

It is worthy of note that we could intend to allow for users to serialize multiple values in a buffer using the internal functions, in a future version. So, storing at the beginning or end could become more complex.

What if we store as we go?

We can say:

If the id has not been cached in constants or existing, we will store the associated value immediately after.

Serialization impacts
- Possibly restricting on future improvements.
- Degrades performance since we now need to check if it is a paired constant.
Deserialization impacts
- How do we handle? When we deserialize currently, we check the pairs table and return the value. In Reformat stringpack #18 we introduce erroring when we fail to understand the value. Since we are planning on tending towards errors upon failure, how do we know?
  - Should we swap to do value followed by id when serializing? That has serialization issues. We would need to somehow bypass ack- actually. Both do that. How do we even store the value past the id if the id takes over immediately.

3 replies

nwinn-student Oct 21, 2025
Maintainer Author

Could we modify the functions to have an optional argument that skips the id part?

Would that cause harm?

nwinn-student Oct 30, 2025
Maintainer Author

In the end I decided to go with the topmost approach and see if my assumptions about implementation pains were correct. Gladly, they were not.

The performance implications were size-able, but worth the improved user experience.

nwinn-student Oct 30, 2025
Maintainer Author

Could we modify the functions to have an optional argument that skips the id part?

Would that cause harm?

I made it so that internally we basically do this, so it won't cause harm. Users who wish to use the internals would need to look at table.luau to see how I did it. Mostly just wrapping the desired function and asking whether the value is within link.values and if so, do one thing, else do the normal serialization.

So the code seems to not be changed that much, but taking out pairs support from non-table types ensured that the aforementioned migration pains wouldn't arise.

nwinn-student · 2025-10-21T22:58:35Z

nwinn-student
Oct 21, 2025
Maintainer Author

Remove pairs

Could we use the values like they are documented, but instead of having users define them, we take the first N values and have them incredibly cheap? An issue is that we will rely on the formatting discussed by the prior comment. Whenever it gets resolved.

Could we allocate them for special cases. Like vector2s. Or research and see how other formats use their values and use some of the ideas. I don't think there are many I would be willing to implement though. Table could be improved based on the case, but it makes serialization much slower.

1 reply

nwinn-student Feb 12, 2026
Maintainer Author

There is no guaranteed improvement when using the current approach to take the first N. Also it is not replicable, so one run could have a wildly different size than another, assuming many duplicates.

nwinn-student · 2026-02-01T21:28:25Z

nwinn-student
Feb 1, 2026
Maintainer Author

I just thought of this, but what about when a user repairs an id to another value of the same type?

When serializing it should say id-value and for the other value it does id-otherValue. However that will corrupt all instances of the other value when serializing. Thus it'll correctly deserialize, as corrupted data.

Only thought of this since I was like "What if I just reuse pair ids to do unions".

The fix would be to somehow know when an id has been reused by a different value and be able to "re-assign" it when needed to another value. Nothing needs to change for deserialization, only stuff for serialization. We either error or do the above or do nothing/warn users.

2 replies

nwinn-student Feb 1, 2026
Maintainer Author

Preferably we error in pairs or remove it, but it has a perf concern. We either need to make it slow to pair or wasteful memory-wise.

There are cases where pairs like that would be important, but I think using up all 1k+ is unlikely.

nwinn-student Feb 1, 2026
Maintainer Author

Preferably we error in pairs or remove it, but it has a perf concern. We either need to make it slow to pair or wasteful memory-wise.

Waste memory. The cost isn't that much memory (just double that of current for storing pairs). Storing pairs is pretty cheap. Although we need to then consider whether to error or overwrite. I say error, since if you say 1+1 is 11, then later say 1+1 is 2, you are wrong. Both are true, but the contexts are different, one uses base 1 and the other base 3+. You would need to be able to specify the context surrounding the pair in order for it to be interpretted correctly. (Which means 1 more byte for each use of the pair, which makes it not worthwhile.)

nwinn-student · 2026-02-12T07:09:39Z

nwinn-student
Feb 12, 2026
Maintainer Author

Remove pairs
Complete rework of the underlying binary format since 50%+ of the values are set aside for pairs

Ha.. I have been thinking about this. So what exactly would it entail and what benefits would be seen?

From a code standpoint, it will be much more readable and maintainable without pairs. From a user standpoint it removes an odd quirk that is seldomly used and hinders performance, it also reduces what needs to be considered. From a performance perspective it allows us to easily remove parts that we knew were less performant for the general case, like constants table.

The redesign could pull from reformat stringpack and extend it to use all of the string pair bytes. I don't personally like the approach, and even redesigning could be a wrong approach. We could just take away those bytes and have large gaps to keep compatibility. I think we can even keep deserialize the same but remove pairs from serialize. It won't harm most users and is a safer way to migrate.

Why am I rethinking about this?

Since existing already takes the brunt of reducing output size, shaving a few bytes off is not worth it. In benchmarks, it was an incredibly underwhelming difference between pairs and pairless.

0 replies

How should pairs be dealt with? #36

Uh oh!

Uh oh!

nwinn-student Oct 21, 2025 Maintainer

Replies: 4 comments · 6 replies

Uh oh!

nwinn-student Oct 21, 2025 Maintainer Author

Uh oh!

Uh oh!

nwinn-student Oct 21, 2025 Maintainer Author

Uh oh!

nwinn-student Oct 30, 2025 Maintainer Author

Uh oh!

nwinn-student Oct 30, 2025 Maintainer Author

Uh oh!

nwinn-student Oct 21, 2025 Maintainer Author

Uh oh!

nwinn-student Feb 12, 2026 Maintainer Author

Uh oh!

nwinn-student Feb 1, 2026 Maintainer Author

Uh oh!

nwinn-student Feb 1, 2026 Maintainer Author

Uh oh!

nwinn-student Feb 1, 2026 Maintainer Author

Uh oh!

nwinn-student Feb 12, 2026 Maintainer Author

How should `pairs` be dealt with? #36

nwinn-student
Oct 21, 2025
Maintainer

Replies: 4 comments 6 replies

nwinn-student
Oct 21, 2025
Maintainer Author

nwinn-student Oct 21, 2025
Maintainer Author

nwinn-student Oct 30, 2025
Maintainer Author

nwinn-student Oct 30, 2025
Maintainer Author

nwinn-student
Oct 21, 2025
Maintainer Author

nwinn-student Feb 12, 2026
Maintainer Author

nwinn-student
Feb 1, 2026
Maintainer Author

nwinn-student Feb 1, 2026
Maintainer Author

nwinn-student Feb 1, 2026
Maintainer Author

nwinn-student
Feb 12, 2026
Maintainer Author