Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hardirqs: Optimize output information and increase CPU display #5228

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

kknjh
Copy link
Contributor

@kknjh kknjh commented Mar 10, 2025

Combine the previous count and time into one,
and increase CPU independent statistics and maximum interrupt time

Before:
    ./hardirqs
    Tracing hard irq event time... Hit Ctrl-C to end.
    ^C
    HARDIRQ                    TOTAL_usecs
    enp0s3                             254
    snd_intel8x0                       368
    
    ./hardirqs -C
    Tracing hard irq events... Hit Ctrl-C to end.
    ^C
    HARDIRQ                    TOTAL_count
    enp0s3                              14
    ata_piix                             2
    
After:
    ./hardirqs
    Tracing hard irq event time... Hit Ctrl-C to end.
    ^C
    HARDIRQ                    TOTAL_count TOTAL_usecs MAX_usecs
    ata_piix                             2          32        21
    enp0s3                              19         318        42
    
    ./hardirqs -C
    Tracing hard irq event time... Hit Ctrl-C to end.
    ^C
    HARDIRQ                    TOTAL_count TOTAL_usecs MAX_usecs CPU
    enp0s3                          111375     1420732       303   2
    enp0s3                            1546       21826        39   9
    ata_piix                             8         162        33   8

@@ -42,7 +42,7 @@ const char *argp_program_bug_address =
const char argp_program_doc[] =
"Summarize hard irq event time as histograms.\n"
"\n"
"USAGE: hardirqs [--help] [-T] [-N] [-d] [interval] [count] [-c CG]\n"
"USAGE: hardirqs [--help] [-T] [-N] [-d] [interval] [-C] [-c CG]\n"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not replace '[count]' with '[-C]'.
I know you want to add a CPU column for the output. What is your use case? How does this help you for production investigation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't replace count with CPU, I just changed count to default display. Count, total time, and max time will all be displayed. Increasing CPU can observe whether the interrupt distribution is balanced

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kknjh

This count is not the same as that count. It functions like the 'outputs' positional argument in the file below. :)

bcc/tools/hardirqs_example.txt

652 # ./hardirqs -h
653 usage: hardirqs [-h] [-T] [-N] [-C] [-d] [interval] [outputs]
654 
655 Summarize hard irq event time as histograms
656 
657 positional arguments:
658   interval           output interval, in seconds
659   outputs            number of outputs   // <-------

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I misunderstood. I have already made the modifications.
Thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kknjh

What is your use case? How does this help you for production investigation?

need your answer for this question, I agree with this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, whether the CPU handling network card interrupts is on the same NUMA node to avoid accessing memory across NUMA nodes. Or observe whether the interruption distribution is balanced

Combine the previous count and time into one,
and increase CPU independent statistics and maximum interrupt time

Before:
./hardirqs
Tracing hard irq event time... Hit Ctrl-C to end.
^C
HARDIRQ                    TOTAL_usecs
enp0s3                             254
snd_intel8x0                       368

./hardirqs -C
Tracing hard irq events... Hit Ctrl-C to end.
^C
HARDIRQ                    TOTAL_count
enp0s3                              14
ata_piix                             2

After:
./hardirqs
Tracing hard irq event time... Hit Ctrl-C to end.
^C
HARDIRQ                   TOTAL_count TOTAL_usecs MAX_usecs
ata_piix                            2          32        21
enp0s3                             19         318        42

./hardirqs -C
Tracing hard irq event time... Hit Ctrl-C to end.
^C
HARDIRQ                   TOTAL_count TOTAL_usecs MAX_usecs CPU
enp0s3                         111375     1420732       303   2
enp0s3                           1546       21826        39   9
ata_piix                            8         162        33   8

Signed-off-by: Feng Yang <[email protected]>
Copy link
Collaborator

@yonghong-song yonghong-song left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried your patch. The following is what I see:

$ ./hardirqs -C
Tracing hard irq event time... Hit Ctrl-C to end.
^C
HARDIRQ                    TOTAL_count TOTAL_usecs MAX_usecs CPU
mlx5_comp9@pci:0000:5e:00.0           4           3         1   7
mlx5_comp17@pci:0000:5e:00.0           3           7         3  14
mlx5_comp12@pci:0000:5e:00.0          13          17         4  24
mlx5_comp15@pci:0000:5e:00.0          17          19         3   5
mlx5_comp19@pci:0000:5e:00.0           4           3         2   4
mlx5_comp3@pci:0000:5e:00.0          16          11         2   9
mlx5_comp7@pci:0000:5e:00.0           6          10         3   8
ahci[0000:00:11.5]                  21         111         8  14
mlx5_comp8@pci:0000:5e:00.0          21          20         2  26
mlx5_comp0@pci:0000:5e:00.0          18          19         3  21
mlx5_comp18@pci:0000:5e:00.0          18          24         2  30
mlx5_comp1@pci:0000:5e:00.0           9          11         3  11
mlx5_async0@pci:0000:5e:00.0          18          65         7  17
mlx5_comp10@pci:0000:5e:00.0          19          20         3  34
mlx5_comp2@pci:0000:5e:00.0           8          11         3  38
mlx5_comp11@pci:0000:5e:00.0           5           9         3  16
mlx5_comp14@pci:0000:5e:00.0          14          18         3  32
mlx5_comp5@pci:0000:5e:00.0           4           4         2  18

In the above, "mlx5_comp*" represents different irq's on different cpus. In your case, probably, only device name appears. So if this is the case, your change does make sense.

Also, look at the above, please make sure proper alignment for the output.

Also, in the commit message, please clearly describe the API (command line) option change.

@@ -48,31 +48,17 @@ static __always_inline bool is_target_cpu() {

static int handle_entry(int irq, struct irqaction *action)
{
u64 ts = bpf_ktime_get_ns();
u32 key = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the above two lines of codes after below 'is_target_cpu()'?

@@ -42,7 +42,7 @@ const char *argp_program_bug_address =
const char argp_program_doc[] =
"Summarize hard irq event time as histograms.\n"
"\n"
"USAGE: hardirqs [--help] [-T] [-N] [-d] [interval] [count] [-c CG]\n"
"USAGE: hardirqs [--help] [-T] [-N] [-d] [-C] [interval] [count] [-c CG]\n"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the 'count' is always on, then let us remove '[count]'.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants