Table Micro-optimizations #59
Replies: 6 comments 2 replies
-
This optimization sounds too risky, branches are expensive. However, even if the branch predict misses every time the new approach may take more cycles. It needs to read/write to upvalues 2-3 times more per cycle. Luckily it is increment/decrement by 1, so it may be treated as a write and not a read. Still, the benefit to dicts or tables may not be enough. It is merely shifting work from those to arrays. |
Beta Was this translation helpful? Give feedback.
-
"type(element)" is still needed at the top since we need to skip the entire cycle when the element is a thread/function. Else we would need to be able to rollback the changes by shifting pos back. The cost isn't that much though. So maybe we do it, just add a tempPos and in the branch set pos to it. |
Beta Was this translation helpful? Give feedback.
-
|
For both serialization and deserialization, localizing the position value could cause performance improvements. For deserialize, we could localize outside the loop, but we would need to update prior and post to deserialCache. For serialize, we could do the same. We do read/write to pos alot, so it could be stored in the L1 cache the entire time, which means localizing won't improve. However, luau is an interpretted language and may not have this sort of optimization yet for upvalues, so the improvement to codegen may be minimal, even causing regressions, but the improvement to non-codegen could be even more. |
Beta Was this translation helpful? Give feedback.
-
|
We could pass the id we obtained from table to the respective value functions, which cuts 1 fastcall per value. It does make break compatibility if a user relies on it, which they would for userdata. Which means we would need to have it as an optional param. boolean deserialization will be much faster (~15%), but the rest will likely benefit ~2-5%. |
Beta Was this translation helpful? Give feedback.
-
|
Although not necessarily 100% on table, but they are impacted the most by this suggestion performance-wise since inflate is triggered far more often. What if we have a 2MB buffer that we start out with. We can store said buffer in init.luau, we then internally use this buffer for serialization instead of an empty one. We do assume that running in parallel is impossible, but if it is possible we can fallback. This improvement does still allow for us to support the user passing their own buffer/pos. I expect a minor performance improvement for larger tables (2-5%) and a larger improvement for smaller tables (5-10%). This idea was inspired by another serialization library I looked into a a few months ago. Likely Sera, but I know not. |
Beta Was this translation helpful? Give feedback.
-
|
Although not necessarily 100% on table, but they are impacted the most by this suggestion performance-wise since inflate is triggered far more often. What if we have a 2MB buffer that we start out with. We can store said buffer in init.luau, we then internally use this buffer for serialization instead of an empty one. We do assume that running in parallel is impossible, but if it is possible we can fallback. This improvement does still allow for us to support the user passing their own buffer/pos. I expect a minor performance improvement for larger tables (2-5%) and a larger improvement for smaller tables (5-10%). This idea was inspired by another serialization library (Sera) I looked into a a few months ago. And of course, the JEP draft from issue 8329758 (Faster Startup and Warmup with ZGC) since it reminded me about Sera's approach. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
When looking through table.luau I realized that some serialization aspects could be adjusted to possible squeeze out more performance.
The isArray branch in the loop could be removed and replaced with a tempPos and updates to it. We would need to know to skip a pos though. Likely to cause regressions for arrays.
In the dict section, we could move "type(key)" to as late as possible and remove functions/threads from being added to the constant table. Likely to cause noticable improvement for datasets that use the existing table for keys.
Beta Was this translation helpful? Give feedback.
All reactions