Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

btf: read all line info records at once in parseLineInfoRecords #1657

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

paulcacheux
Copy link
Contributor

@paulcacheux paulcacheux commented Jan 24, 2025

When parsing line info records, we currently iterate and call binary.Read once per line info record. This is quite inefficient and binary.Read already supports out of the box being called directly on a slice and reading multiple structures at once. This is exactly what this PR does.

Benchmark:

                        │   old.txt   │               new.txt               │
                        │   sec/op    │   sec/op     vs base                │
ParseLineInfoRecords-10   288.6µ ± 3%   141.3µ ± 4%  -51.02% (p=0.000 n=10)

                        │   old.txt    │            new.txt             │
                        │     B/op     │     B/op      vs base          │
ParseLineInfoRecords-10   128.0Ki ± 0%   128.0Ki ± 0%  ~ (p=1.000 n=10)

                        │    old.txt    │              new.txt               │
                        │   allocs/op   │ allocs/op   vs base                │
ParseLineInfoRecords-10   4098.000 ± 0%   3.000 ± 0%  -99.93% (p=0.000 n=10)

@paulcacheux paulcacheux force-pushed the paulcacheux/parse-line-info-records-slice branch from 9f82efc to fb809f7 Compare January 24, 2025 16:45
@paulcacheux paulcacheux changed the title read all line info records at once in parseLineInfoRecords btf: read all line info records at once in parseLineInfoRecords Jan 24, 2025
@ti-mo
Copy link
Collaborator

ti-mo commented Jan 27, 2025

Cool! Drive-by: whenever you optimize old code that doesn't change frequently, make sure to bolt it down with testing.AllocsPerRun. 😉

@paulcacheux paulcacheux force-pushed the paulcacheux/parse-line-info-records-slice branch from fb809f7 to d5f37dd Compare January 27, 2025 14:14
@paulcacheux paulcacheux marked this pull request as ready for review January 27, 2025 14:14
@paulcacheux
Copy link
Contributor Author

I added a small test for the allocations, I'm a bit sad about it because it checks against 3:

  • as opposed to all other allocation tests in the repo that test against 0
  • it feels a bit fragile, since it's susceptible to changes in the stdlib that would improve or degrade the amount of allocations of binary.Read
    Otherwise I could do something like < 10 or something like that, WDYT ?

@paulcacheux paulcacheux force-pushed the paulcacheux/parse-line-info-records-slice branch from d5f37dd to 4b76be2 Compare January 27, 2025 14:44
@dylandreimerink
Copy link
Member

dylandreimerink commented Jan 27, 2025

it feels a bit fragile, since it's susceptible to changes in the stdlib that would improve or degrade the amount of allocations of binary.Read

Yea, I would have to agree with that, seems fragile. I imagine it would be frustrating if we have to start updating magic values in out tests when we bump Go / stdlib versions. I don't know if a <10 version hits the mark, like if we go to 11, would that be a cause for us to reconsider something?

On the other hand, I understand the desire to curb allocations and memory use caused by the lib. Not sure if this is the best way to do so however.

Case and point the failing test, on the last go version it was 7 allocs: https://github.com/cilium/ebpf/actions/runs/12991527462/job/36229383952?pr=1657

@ti-mo
Copy link
Collaborator

ti-mo commented Jan 29, 2025

Discussed this offline with Dylan. I agree on all arguments raised, but:

  • we have a few of these tests in the code base, and so far they've been quiet
  • let's assume that amount of allocs in stdlib functions shrinks over time instead of growing, given the amount of scrutiny on them
  • these alloc measurements are enforced over small blocks of deterministic code without side effects
  • we can always remove them if they cause too much noise (that would be on me 🙂)

I think pinning it down to 7 allocs is the best we can do in absence of a continuous benchmarking setup or a solution to track allocs of certain spans of code over time. These alternatives are (much) more expensive to run and not maintenance-free by any means.

Also, since this is a wire format defined in a (sort of) spec that doesn't change on a whim, my tendency would be to write a custom marshaler for these things, but I've had trouble convincing Lorenz of this in the past. If we really need this to not allocate, it's still the way to go IMO.

@paulcacheux
Copy link
Contributor Author

paulcacheux commented Jan 29, 2025

Ok, I can change the test to <= 7 for this PR. And work on a follow up PR with a real unmarshaller

`binary.Read` is able to directly read its result in a slice, and this
is way more efficient from a CPU and memory allocation standpoint

                        │   old.txt   │               new.txt               │
                        │   sec/op    │   sec/op     vs base                │
ParseLineInfoRecords-10   288.6µ ± 3%   141.3µ ± 4%  -51.02% (p=0.000 n=10)

                        │   old.txt    │            new.txt             │
                        │     B/op     │     B/op      vs base          │
ParseLineInfoRecords-10   128.0Ki ± 0%   128.0Ki ± 0%  ~ (p=1.000 n=10)

                        │    old.txt    │              new.txt               │
                        │   allocs/op   │ allocs/op   vs base                │
ParseLineInfoRecords-10   4098.000 ± 0%   3.000 ± 0%  -99.93% (p=0.000 n=10)

Signed-off-by: Paul Cacheux <[email protected]>
@paulcacheux paulcacheux force-pushed the paulcacheux/parse-line-info-records-slice branch from 4b76be2 to d01fe7c Compare January 29, 2025 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants