VM performance improvements in function calls #832

diegommm · 2025-09-13T07:21:40Z

Improve memory handling of function arguments in vm.VM by preallocating a single slice to hold all the arguments for all the function calls. This is based on an estimation made by inspecting the program's bytecode.

There are several points to argue here of course:

A program can return earlier and thus not consume all the allocated space. Answer: making a single allocation for likely very few items more than probably used is worth it since it's very little elements and a single allocation is probably more expensive in terms of GC and runtime memory management. I have also set a safety limit for preallocation just in case.
We could use program.Arguments to get the exact number of arguments being passed. Answer: While this is true, this adds a little more computation and an estimation works fairly good for most cases. We gain ~5% speed by making an estimation and it is likely to be good enough in many situations.
Programs with function calls in a predicate will probably not have enough allocated space. Answer: again, this is an optimization targeted at simple and straightforward programs, and will work in many of the most common situations. Other programs will likely see no decrease in performance, and we will still allocate as we were doing before in that case. Actually, programs with a predicate will still see a performance gain because we will allocate a bit less until the buffer is drained, then it falls back to allocate for each call.

In general, this optimization works well for many simple and common use cases and doesn't affect other cases.

Benchmark results:

goos: linux
goarch: amd64
pkg: github.com/expr-lang/expr/vm
cpu: 13th Gen Intel(R) Core(TM) i7-13700H
                          │ bench-results-old.txt │        bench-results-new.txt        │
                          │        sec/op         │   sec/op     vs base                │
VM/name=function_calls-20             1.495µ ± 0%   1.277µ ± 1%  -14.58% (p=0.000 n=20)

                          │ bench-results-old.txt │        bench-results-new.txt         │
                          │         B/op          │     B/op      vs base                │
VM/name=function_calls-20            2.297Ki ± 0%   2.625Ki ± 0%  +14.29% (p=0.000 n=20)

                          │ bench-results-old.txt │       bench-results-new.txt        │
                          │       allocs/op       │ allocs/op   vs base                │
VM/name=function_calls-20             40.000 ± 0%   1.000 ± 0%  -97.50% (p=0.000 n=20)

diegommm · 2025-09-13T14:55:11Z

vm/vm.go

+func estimateFnArgsCount(program *Program) int {
+	// Implementation note: a program will not necessarily go through all
+	// operations, but this is just an estimation
+	var count int
+	for _, op := range program.Bytecode {
+		if int(op) < len(opArgLenEstimation) {
+			count += opArgLenEstimation[op]
+		}
+	}
+	return count
+}
+
+var opArgLenEstimation = [...]int{
+	OpCall1: 1,
+	OpCall2: 2,
+	OpCall3: 3,
+	// we don't know exactly but we know at least 4, so be conservative as this
+	// is only an optimization and we also want to avoid excessive preallocation
+	OpCallN: 4,
+	// here we don't know either, but we can guess it could be common to receive
+	// up to 3 arguments in a function
+	OpCallFast: 3,
+	OpCallSafe: 3,
+}


I initially used a switch in estimateFnArgsCount but then tried with this table and got a 4% improvement in speed.
However, you can see that I am making an array with 56 elements but I'm using only 6. I preferred it this way because I think the code looks clearer. But if you prefer to make the table use exactly the number of items it needs then we would just need to make the following change (you can just apply this suggestion as it is here if you want, I just tested this exact code and it is also correctly formatted with spaces :) ):

Suggested change

func estimateFnArgsCount(program *Program) int {

// Implementation note: a program will not necessarily go through all

// operations, but this is just an estimation

var count int

for _, op := range program.Bytecode {

if int(op) < len(opArgLenEstimation) {

count += opArgLenEstimation[op]

}

}

return count

}

var opArgLenEstimation = [...]int{

OpCall1: 1,

OpCall2: 2,

OpCall3: 3,

// we don't know exactly but we know at least 4, so be conservative as this

// is only an optimization and we also want to avoid excessive preallocation

OpCallN: 4,

// here we don't know either, but we can guess it could be common to receive

// up to 3 arguments in a function

OpCallFast: 3,

OpCallSafe: 3,

}

func estimateFnArgsCount(program *Program) int {

// Implementation note: a program will not necessarily go through all

// operations, but this is just an estimation

var count int

for _, op := range program.Bytecode {

op -= OpCall1 // if underflows only becomes bigger so it's ok

if int(op) < len(opArgLenEstimation) {

count += opArgLenEstimation[op]

}

}

return count

}

var opArgLenEstimation = [...]int{

OpCall1 - OpCall1: 1,

OpCall2 - OpCall1: 2,

OpCall3 - OpCall1: 3,

// we don't know exactly but we know at least 4, so be conservative as this

// is only an optimization and we also want to avoid excessive preallocation

OpCallN - OpCall1: 4,

// here we don't know either, but we can guess it could be common to receive

// up to 3 arguments in a function

OpCallFast - OpCall1: 3,

OpCallSafe - OpCall1: 3,

}

antonmedv · 2025-09-18T12:58:47Z

I guess OpCall1, OpCall2 OpCall3 is kind of a same way of avoiding buffer allocation. What is the speedup?

Also, probably v1.18 will gonna be refactored to a new architecture ;)

diegommm · 2025-09-20T23:37:27Z

I guess OpCall1, OpCall2 OpCall3 is kind of a same way of avoiding buffer allocation. What is the speedup?

At first, I also thought that OpCall1, OpCall2, and OpCall3 wouldn't allocate in the heap. But they do allocate in the heap when I run the benchmarks and they run slower.

Total speedup is 15%. And reduced to a single allocation per run in most cases.

Also, probably v1.18 will gonna be refactored to a new architecture ;)

Nice, can't wait! Ping me if you need some help :)

diegommm · 2025-09-29T20:53:03Z

@antonmedv I answered above, let me know if you want me to try a different approach or if you think it's ok we could merge it.

antonmedv · 2025-09-30T12:06:09Z

Let me try to test it again, and run my benches as well.

diegommm · 2025-10-25T21:42:46Z

Hi @antonmedv! I apologize for bothering, I wanted to know if I can help providing better benchmarks. Or let me know if anything doesn't look good and I can improve it.

Thank you!

antonmedv · 2025-10-26T07:07:03Z

Hi! Sorry I was sick for lats weeks. I will come back to reviewing stuff.

diegommm added 3 commits September 13, 2025 04:29

optimize vm allocation of function arguments

cf666b4

make estimation faster by using a table

60d09ef

add benchmarks

d030d1e

diegommm force-pushed the vm-and-runtime-improvements branch from 93986dc to d030d1e Compare September 13, 2025 07:34

avoid all allocations if no arguments are needed

39e88fc

diegommm marked this pull request as ready for review September 13, 2025 07:43

diegommm commented Sep 13, 2025

View reviewed changes

diegommm added 2 commits September 13, 2025 12:10

simplify code and gain 2% speed

64d5d1c

add safety limit on preallocation

2fb1f53

diegommm force-pushed the vm-and-runtime-improvements branch from 1e5e3d8 to 2fb1f53 Compare September 13, 2025 21:53

diegommm changed the title ~~VM performance improvements~~ VM performance improvements in function calls Sep 13, 2025

Merge branch 'master' into vm-and-runtime-improvements

a3d86ee

antonmedv added the waiting response label Sep 18, 2025

diegommm force-pushed the vm-and-runtime-improvements branch from 23dba4e to a3d86ee Compare September 20, 2025 23:47

diegommm and others added 3 commits September 20, 2025 20:48

Merge branch 'master' into vm-and-runtime-improvements

7ea86c9

Merge branch 'master' into vm-and-runtime-improvements

8841c48

Merge branch 'master' into vm-and-runtime-improvements

263f973

antonmedv added this to the v1.17.7 milestone Oct 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

VM performance improvements in function calls #832

VM performance improvements in function calls #832

Uh oh!

diegommm commented Sep 13, 2025 •

edited

Loading

Uh oh!

diegommm Sep 13, 2025 •

edited

Loading

Uh oh!

antonmedv commented Sep 18, 2025

Uh oh!

diegommm commented Sep 20, 2025 •

edited

Loading

Uh oh!

diegommm commented Sep 29, 2025

Uh oh!

antonmedv commented Sep 30, 2025

Uh oh!

diegommm commented Oct 25, 2025

Uh oh!

antonmedv commented Oct 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

VM performance improvements in function calls #832

Are you sure you want to change the base?

VM performance improvements in function calls #832

Uh oh!

Conversation

diegommm commented Sep 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

diegommm Sep 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

antonmedv commented Sep 18, 2025

Uh oh!

diegommm commented Sep 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

diegommm commented Sep 29, 2025

Uh oh!

antonmedv commented Sep 30, 2025

Uh oh!

diegommm commented Oct 25, 2025

Uh oh!

antonmedv commented Oct 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

diegommm commented Sep 13, 2025 •

edited

Loading

diegommm Sep 13, 2025 •

edited

Loading

diegommm commented Sep 20, 2025 •

edited

Loading