Skip to content

[WIP] scx_rusty: add AMD IBS performance monitoring #1724

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

etsal
Copy link
Contributor

@etsal etsal commented Apr 22, 2025

scx_rusty is currently using execution time as the main metric with which is load balances tasks between domains. However, there are other forms of resource contention that we can avoid through load balancing, e.g., L3 footprint and memory bandwidth. We can track these metrics accurately, per-process using hardware monitoring extensions like AMD IBS. Add initial support for reading data from these extensions into the scheduler.

STATUS:

  • The number of L1 hits and number of DRAM hits is suspiciously high and low, respectively.
  • There is currently a large number of valid samples that hold seemingly invalid physical and virtual addresses. These are exclusively L1 hits, but it remains to be seen whether they are actually valid or somehow erroneous and should be filtered out.

@etsal etsal requested a review from htejun April 22, 2025 19:48
Copy link
Contributor Author

@etsal etsal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example trace:

          <idle>-0       [003] dnZ3. 10777.212041: bpf_trace_printk: STORE      (0x1,0x1,0x0) [c1, 0]
          <idle>-0       [016] d.Z2. 10777.212044: bpf_trace_printk: STORE      (0x1,0x1,0x0) [c1, 0]
          <idle>-0       [024] d.Z3. 10777.212047: bpf_trace_printk: LOAD       (0x1,0x1,0x0) [11a, 0]
          <idle>-0       [017] d.Z2. 10777.212048: bpf_trace_printk: STORE      (0x1,0x1,0x0) [c1, 0]
          <idle>-0       [027] d.Z2. 10777.212051: bpf_trace_printk: LOAD       (0x1,0x1,0x0) [11a, 0]
          <idle>-0       [020] d.Z3. 10777.212053: bpf_trace_printk: STORE      (0x1,0x1,0x0) [c1, 0]
          <idle>-0       [003] dnZ2. 10777.212053: bpf_trace_printk: STORE      (0x1,0x1,0x0) [c1, 0]
          <idle>-0       [020] d.Z2. 10777.212059: bpf_trace_printk: LOAD       (0x1,0x1,0x0) [104, 0]
          <idle>-0       [016] d.Z3. 10777.212061: bpf_trace_printk: LOAD       (0x1,0x1,0x0) [11a, 0]
          <idle>-0       [017] d.Z2. 10777.212064: bpf_trace_printk: LOAD       (0x1,0x1,0x0) [11a, 0]
          <idle>-0       [003] dnZ2. 10777.212066: bpf_trace_printk: LOAD       (0x1,0x1,0x0) [11a, 0]
          <idle>-0       [006] d.Z2. 10777.212069: bpf_trace_printk: LOAD       (0x1,0x1,0x0) [110, 0]
          <idle>-0       [002] d.Z3. 10777.212070: bpf_trace_printk: STORE      (0x1,0x1,0x0) [348, 0]
          <idle>-0       [020] d.Z2. 10777.212075: bpf_trace_printk: STORE      (0x1,0x1,0x0) [c1, 0]
          <idle>-0       [024] d.Z3. 10777.213024: bpf_trace_printk: LOAD       (0x1,0x1,0x0) [390, 0]
          <idle>-0       [006] d.Z3. 10777.213026: bpf_trace_printk: LOAD       (0x1,0x1,0x0) [fddb05e68, ffffadac0044ce68]
          <idle>-0       [017] d.Z3. 10777.213028: bpf_trace_printk: LOAD       (0x1,0x1,0x0) [fde085e60, ffffadac00690e60]
          <idle>-0       [018] d.Z3. 10777.213030: bpf_trace_printk: STORE      (0x1,0x1,0x0) [874a3c710, ffffffffa6a3c710]
          <idle>-0       [027] d.Z2. 10777.213032: bpf_trace_printk: STORE      (0x1,0x1,0x0) [c1, 0]
          <idle>-0       [003] dNZ2. 10777.213035: bpf_trace_printk: STORE      (0x1,0x1,0x0) [c1, 0]
          <idle>-0       [017] d.Z2. 10777.213038: bpf_trace_printk: LOAD       (0x1,0x1,0x0) [11a, 0]
          <idle>-0       [016] d.Z2. 10777.213044: bpf_trace_printk: LOAD       (0x1,0x1,0x0) [11a, 0]
          <idle>-0       [027] d.Z2. 10777.213044: bpf_trace_printk: LOAD       (0x1,0x1,0x0) [11a, 0]
          <idle>-0       [008] d.Z3. 10777.213045: bpf_trace_printk: STORE      (0x1,0x1,0x0) [c1, 0]
          <idle>-0       [020] d.Z3. 10777.213046: bpf_trace_printk: LOAD       (0x1,0x1,0x0) [11a, 0]
          <idle>-0       [024] dnZ2. 10777.213047: bpf_trace_printk: LOAD       (0x1,0x1,0x0) [11a, 0]
          <idle>-0       [017] d.Z2. 10777.213051: bpf_trace_printk: LOAD       (0x1,0x1,0x0) [11a, 0]
          <idle>-0       [003] dNZ3. 10777.213052: bpf_trace_printk: LOAD       (0x1,0x1,0x0) [11a, 0]
          <idle>-0       [027] d.Z2. 10777.213058: bpf_trace_printk: STORE      (0x1,0x1,0x0) [c1, 0]
          <idle>-0       [020] d.Z3. 10777.213061: bpf_trace_printk: STORE      (0x1,0x1,0x0) [c1, 0]
          <idle>-0       [016] d.Z2. 10777.213061: bpf_trace_printk: STORE      (0x1,0x1,0x0) [c1, 0]
          <idle>-0       [018] d.Z2. 10777.213061: bpf_trace_printk: LOAD       (0x1,0x1,0x0) [104, 0]
          <idle>-0       [022] d.Z3. 10777.213062: bpf_trace_printk: LOAD       (0x1,0x1,0x0) [11a, 0]
          <idle>-0       [020] d.Z2. 10777.213066: bpf_trace_printk: LOAD       (0x1,0x1,0x0) [2, 0]
          <idle>-0       [017] d.Z2. 10777.213068: bpf_trace_printk: STORE      (0x1,0x1,0x0) [c1, 0]
          <idle>-0       [003] dNZ2. 10777.213068: bpf_trace_printk: STORE      (0x1,0x1,0x0) [c1, 0]
          <idle>-0       [016] d.Z3. 10777.214025: bpf_trace_printk: STORE      (0x2,0x1,0x0) [101f5dfec, ffff8a88c1f5dfec]
          <idle>-0       [020] d.Z3. 10777.214027: bpf_trace_printk: STORE      (0x1,0x1,0x0) [874a419e0, ffffffffa6a419e0]
          <idle>-0       [017] d.Z3. 10777.214030: bpf_trace_printk: STORE      (0x1,0x1,0x0) [fde08ffb8, fffffe24ebccefb8]
          <idle>-0       [008] d.Z2. 10777.214032: bpf_trace_printk: LOAD       (0x1,0x1,0x0) [11a, 0]
          <idle>-0       [020] d.Z3. 10777.214033: bpf_trace_printk: LOAD       (0x1,0x1,0x0) [2, 0]
          <idle>-0       [027] d.Z3. 10777.214038: bpf_trace_printk: STORE      (0x1,0x1,0x0) [c1, 0]
          <idle>-0       [016] d.Z3. 10777.214039: bpf_trace_printk: STORE      (0x1,0x1,0x0) [20c, 0]
          <idle>-0       [002] d.Z3. 10777.214041: bpf_trace_printk: LOAD       (0x1,0x1,0x0) [f2, 0]
          <idle>-0       [008] d.Z3. 10777.214041: bpf_trace_printk: LOAD       (0x1,0x1,0x0) [11a, 0]
          <idle>-0       [003] dNZ2. 10777.214042: bpf_trace_printk: STORE      (0x1,0x1,0x0) [c1, 0]
          <idle>-0       [020] d.Z2. 10777.214044: bpf_trace_printk: STORE      (0x1,0x1,0x0) [c1, 0]
          <idle>-0       [017] d.Z2. 10777.214045: bpf_trace_printk: STORE      (0x1,0x1,0x0) [c1, 0]
          <idle>-0       [027] d.Z2. 10777.214048: bpf_trace_printk: STORE      (0x1,0x1,0x0) [c1, 0]
          <idle>-0       [016] d.Z3. 10777.214050: bpf_trace_printk: LOAD       (0x1,0x1,0x0) [11a, 0]
          <idle>-0       [022] d.Z2. 10777.214051: bpf_trace_printk: STORE      (0x1,0x1,0x0) [c1, 0]
          <idle>-0       [008] d.Z2. 10777.214053: bpf_trace_printk: STORE      (0x1,0x1,0x0) [c1, 0]
          <idle>-0       [003] dNZ2. 10777.214059: bpf_trace_printk: LOAD       (0x1,0x1,0x0) [11a, 0]
          <idle>-0       [016] d.Z2. 10777.214059: bpf_trace_printk: LOAD       (0x1,0x1,0x0) [11a, 0]
          <idle>-0       [008] d.Z2. 10777.214059: bpf_trace_printk: LOAD       (0x1,0x1,0x0) [fddc0fec8, fffffe4a3f98bec8]
          <idle>-0       [027] d.Z2. 10777.214060: bpf_trace_printk: LOAD       (0x1,0x1,0x0) [11a, 0]
          <idle>-0       [020] d.Z2. 10777.214060: bpf_trace_printk: LOAD       (0x1,0x1,0x0) [11a, 0]
          <idle>-0       [022] d.Z2. 10777.214061: bpf_trace_printk: STORE      (0x1,0x1,0x0) [c1, 0]
          <idle>-0       [017] d.Z2. 10777.214061: bpf_trace_printk: LOAD       (0x1,0x1,0x0) [11a, 0]
          <idle>-0       [016] d.Z2. 10777.214062: bpf_trace_printk: LOAD       (0x1,0x1,0x0) [110, 0]
          <idle>-0       [008] d.Z3. 10777.215024: bpf_trace_printk: STORE      (0x1,0x1,0x0) [fddc05e68, ffffadac004b4e68]
          <idle>-0       [002] d.Z3. 10777.215027: bpf_trace_printk: LOAD       (0x1,0x1,0x0) [fdd905e30, ffffadac0037ce30]
          <idle>-0       [018] d.Z3. 10777.215030: bpf_trace_printk: STORE      (0x1,0x1,0x0) [fde10ffc0, fffffe47649b7fc0]
          <idle>-0       [027] d.Z3. 10777.215032: bpf_trace_printk: LOAD       (0x1,0x1,0x0) [11a, 0]
          <idle>-0       [024] d.Z3. 10777.215033: bpf_trace_printk: STORE      (0x1,0x1,0x0) [c1, 0]
          <idle>-0       [003] dnZ3. 10777.215034: bpf_trace_printk: LOAD       (0x1,0x1,0x0) [11a, 0]
          <idle>-0       [017] d.Z2. 10777.215036: bpf_trace_printk: LOAD       (0x1,0x1,0x0) [11a, 0]
          <idle>-0       [027] d.Z3. 10777.215039: bpf_trace_printk: LOAD       (0x1,0x1,0x0) [11a, 0]
          <idle>-0       [020] d.Z2. 10777.215043: bpf_trace_printk: LOAD       (0x1,0x1,0x0) [11a, 0]
          <idle>-0       [003] dnZ2. 10777.215047: bpf_trace_printk: LOAD       (0x1,0x1,0x0) [11a, 0]
          <idle>-0       [017] d.Z3. 10777.215052: bpf_trace_printk: STORE      (0x1,0x1,0x0) [c1, 0]
          <idle>-0       [020] d.Z2. 10777.215055: bpf_trace_printk: STORE      (0x1,0x1,0x0) [c1, 0]
          <idle>-0       [003] dnZ3. 10777.215057: bpf_trace_printk: STORE      (0x1,0x1,0x0) [c1, 0]
          <idle>-0       [017] d.Z2. 10777.215060: bpf_trace_printk: STORE      (0x1,0x1,0x0) [c1, 0]

@etsal etsal requested a review from kkdwivedi April 22, 2025 19:49
@multics69
Copy link
Contributor

@etsal -- The idea of leveraging the perf counter in making scheduling decisions is super cool! I wonder how much overhead it imposes.

@etsal
Copy link
Contributor Author

etsal commented Apr 23, 2025

@etsal -- The idea of leveraging the perf counter in making scheduling decisions is super cool! I wonder how much overhead it imposes.

One possible issue wrt overhead is that right now we trigger for every single perf event, which is not really necessary. This is because we need fine granularity for samples but do not really care about receiving them immediately. If that turns out ot be a problem we should should either find a way to batch the delivery of these events, or we should write one :)

@multics69
Copy link
Contributor

@etsal -- The idea of leveraging the perf counter in making scheduling decisions is super cool! I wonder how much overhead it imposes.

One possible issue wrt overhead is that right now we trigger for every single perf event, which is not really necessary. This is because we need fine granularity for samples but do not really care about receiving them immediately. If that turns out ot be a problem we should should either find a way to batch the delivery of these events, or we should write one :)

I think you can change the sampling rate of AMD IBS (like Intel PEBS). If so, you can introduce a logic to autotune the sampling rate to control the overhead (e.g., 1000 samples / second).

@etsal
Copy link
Contributor Author

etsal commented Apr 23, 2025

@etsal -- The idea of leveraging the perf counter in making scheduling decisions is super cool! I wonder how much overhead it imposes.

One possible issue wrt overhead is that right now we trigger for every single perf event, which is not really necessary. This is because we need fine granularity for samples but do not really care about receiving them immediately. If that turns out ot be a problem we should should either find a way to batch the delivery of these events, or we should write one :)

I think you can change the sampling rate of AMD IBS (like Intel PEBS). If so, you can introduce a logic to autotune the sampling rate to control the overhead (e.g., 1000 samples / second).

It is, but ideally we would still get a very high amount of samples in batches instead of one at a time to avoid running the BPF callback as often, which I'm not sure is currently possible.

@multics69
Copy link
Contributor

I think you can change the sampling rate of AMD IBS (like Intel PEBS). If so, you can introduce a logic to autotune the sampling rate to control the overhead (e.g., 1000 samples / second).

It is, but ideally we would still get a very high amount of samples in batches instead of one at a time to avoid running the BPF callback as often, which I'm not sure is currently possible.

Sure, the batching will be essential to put aside processing IBS samples from the critical path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants