Skip to content

Conversation

diegommm
Copy link
Contributor

@diegommm diegommm commented Sep 13, 2025

Improve memory handling of function arguments in vm.VM by preallocating a single slice to hold all the arguments for all the function calls. This is based on an estimation made by inspecting the program's bytecode.

There are several points to argue here of course:

  1. A program can return earlier and thus not consume all the allocated space. Answer: making a single allocation for likely very few items more than probably used is worth it since it's very little elements and a single allocation is probably more expensive in terms of GC and runtime memory management. I have also set a safety limit for preallocation just in case.
  2. We could use program.Arguments to get the exact number of arguments being passed. Answer: While this is true, this adds a little more computation and an estimation works fairly good for most cases. We gain ~5% speed by making an estimation and it is likely to be good enough in many situations.
  3. Programs with function calls in a predicate will probably not have enough allocated space. Answer: again, this is an optimization targeted at simple and straightforward programs, and will work in many of the most common situations. Other programs will likely see no decrease in performance, and we will still allocate as we were doing before in that case. Actually, programs with a predicate will still see a performance gain because we will allocate a bit less until the buffer is drained, then it falls back to allocate for each call.

In general, this optimization works well for many simple and common use cases and doesn't affect other cases.

Benchmark results:

goos: linux
goarch: amd64
pkg: github.com/expr-lang/expr/vm
cpu: 13th Gen Intel(R) Core(TM) i7-13700H
                          │ bench-results-old.txt │        bench-results-new.txt        │
                          │        sec/op         │   sec/op     vs base                │
VM/name=function_calls-20             1.495µ ± 0%   1.277µ ± 1%  -14.58% (p=0.000 n=20)

                          │ bench-results-old.txt │        bench-results-new.txt         │
                          │         B/op          │     B/op      vs base                │
VM/name=function_calls-20            2.297Ki ± 0%   2.625Ki ± 0%  +14.29% (p=0.000 n=20)

                          │ bench-results-old.txt │       bench-results-new.txt        │
                          │       allocs/op       │ allocs/op   vs base                │
VM/name=function_calls-20             40.000 ± 0%   1.000 ± 0%  -97.50% (p=0.000 n=20)

@diegommm diegommm force-pushed the vm-and-runtime-improvements branch from 93986dc to d030d1e Compare September 13, 2025 07:34
@diegommm diegommm marked this pull request as ready for review September 13, 2025 07:43
Comment on lines +670 to +693
func estimateFnArgsCount(program *Program) int {
// Implementation note: a program will not necessarily go through all
// operations, but this is just an estimation
var count int
for _, op := range program.Bytecode {
if int(op) < len(opArgLenEstimation) {
count += opArgLenEstimation[op]
}
}
return count
}

var opArgLenEstimation = [...]int{
OpCall1: 1,
OpCall2: 2,
OpCall3: 3,
// we don't know exactly but we know at least 4, so be conservative as this
// is only an optimization and we also want to avoid excessive preallocation
OpCallN: 4,
// here we don't know either, but we can guess it could be common to receive
// up to 3 arguments in a function
OpCallFast: 3,
OpCallSafe: 3,
}
Copy link
Contributor Author

@diegommm diegommm Sep 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially used a switch in estimateFnArgsCount but then tried with this table and got a 4% improvement in speed.
However, you can see that I am making an array with 56 elements but I'm using only 6. I preferred it this way because I think the code looks clearer. But if you prefer to make the table use exactly the number of items it needs then we would just need to make the following change (you can just apply this suggestion as it is here if you want, I just tested this exact code and it is also correctly formatted with spaces :) ):

Suggested change
func estimateFnArgsCount(program *Program) int {
// Implementation note: a program will not necessarily go through all
// operations, but this is just an estimation
var count int
for _, op := range program.Bytecode {
if int(op) < len(opArgLenEstimation) {
count += opArgLenEstimation[op]
}
}
return count
}
var opArgLenEstimation = [...]int{
OpCall1: 1,
OpCall2: 2,
OpCall3: 3,
// we don't know exactly but we know at least 4, so be conservative as this
// is only an optimization and we also want to avoid excessive preallocation
OpCallN: 4,
// here we don't know either, but we can guess it could be common to receive
// up to 3 arguments in a function
OpCallFast: 3,
OpCallSafe: 3,
}
func estimateFnArgsCount(program *Program) int {
// Implementation note: a program will not necessarily go through all
// operations, but this is just an estimation
var count int
for _, op := range program.Bytecode {
op -= OpCall1 // if underflows only becomes bigger so it's ok
if int(op) < len(opArgLenEstimation) {
count += opArgLenEstimation[op]
}
}
return count
}
var opArgLenEstimation = [...]int{
OpCall1 - OpCall1: 1,
OpCall2 - OpCall1: 2,
OpCall3 - OpCall1: 3,
// we don't know exactly but we know at least 4, so be conservative as this
// is only an optimization and we also want to avoid excessive preallocation
OpCallN - OpCall1: 4,
// here we don't know either, but we can guess it could be common to receive
// up to 3 arguments in a function
OpCallFast - OpCall1: 3,
OpCallSafe - OpCall1: 3,
}

@diegommm diegommm force-pushed the vm-and-runtime-improvements branch from 1e5e3d8 to 2fb1f53 Compare September 13, 2025 21:53
@diegommm diegommm changed the title VM performance improvements VM performance improvements in function calls Sep 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant