Skip to content

[LTS 9.4] dmaengine: idxd: Fix possible Use-After-Free in irq_process_work_list #422

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 28, 2025

Conversation

jainanmol84
Copy link

  • Commit Message Requirements
  • Built against Vault/LTS Environment
  • kABI Check Passed, where Valid (Pre 9.4 RT does not have kABI stability)
  • Boot Test
  • Kernel SelfTest results
  • Additional Tests as determined relevant

Commit message

jira VULN-8255
cve CVE-2024-40956
commit-author Li RongQing <[email protected]>
commit e3215deca4520773cd2b155bed164c12365149a7

Use list_for_each_entry_safe() to allow iterating through the list and deleting the entry in the iteration process. The descriptor is freed via idxd_desc_complete() and there's a slight chance may cause issue for the list iterator when the descriptor is reused by another thread without it being deleted from the list.

Fixes: 16e19e11228b ("dmaengine: idxd: Fix list corruption in description completion")
	Signed-off-by: Li RongQing <[email protected]>
	Reviewed-by: Dave Jiang <[email protected]>
	Reviewed-by: Fenghua Yu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
	Signed-off-by: Vinod Koul <[email protected]>
(cherry picked from commit e3215deca4520773cd2b155bed164c12365149a7)
	Signed-off-by: Anmol Jain <[email protected]>

Kernel build logs

/home/anmol/kernel-src-tree
no .config file found, moving on
[TIMER]{MRPROPER}: 0s
x86_64 architecture detected, copying config
'configs/kernel-x86_64-rhel.config' -> '.config'
Setting Local Version for build
CONFIG_LOCALVERSION="-_ajain__ciqlts9_4-b8631ac27"
Making olddefconfig
  HOSTCC  scripts/basic/fixdep
  HOSTCC  scripts/kconfig/conf.o
  HOSTCC  scripts/kconfig/confdata.o
  HOSTCC  scripts/kconfig/expr.o
  LEX     scripts/kconfig/lexer.lex.c
  YACC    scripts/kconfig/parser.tab.[ch]
  HOSTCC  scripts/kconfig/lexer.lex.o
  HOSTCC  scripts/kconfig/menu.o
  HOSTCC  scripts/kconfig/parser.tab.o
  HOSTCC  scripts/kconfig/preprocess.o
  HOSTCC  scripts/kconfig/symbol.o
  HOSTCC  scripts/kconfig/util.o
  HOSTLD  scripts/kconfig/conf
#
# configuration written to .config
#
Starting Build
  SYSHDR  arch/x86/include/generated/uapi/asm/unistd_32.h
  SYSHDR  arch/x86/include/generated/uapi/asm/unistd_64.h
  WRAP    arch/x86/include/generated/uapi/asm/bpf_perf_event.h
  WRAP    arch/x86/include/generated/uapi/asm/errno.h
  WRAP    arch/x86/include/generated/uapi/asm/fcntl.h
  WRAP    arch/x86/include/generated/uapi/asm/ioctl.h
  WRAP    arch/x86/include/generated/uapi/asm/ioctls.h
[--snip--]
  STRIP   /lib/modules/5.14.0-_ajain__ciqlts9_4-b8631ac27+/kernel/sound/virtio/virtio_snd.ko
  INSTALL /lib/modules/5.14.0-_ajain__ciqlts9_4-b8631ac27+/kernel/sound/x86/snd-hdmi-lpe-audio.ko
  STRIP   /lib/modules/5.14.0-_ajain__ciqlts9_4-b8631ac27+/kernel/sound/x86/snd-hdmi-lpe-audio.ko
  SIGN    /lib/modules/5.14.0-_ajain__ciqlts9_4-b8631ac27+/kernel/sound/virtio/virtio_snd.ko
  SIGN    /lib/modules/5.14.0-_ajain__ciqlts9_4-b8631ac27+/kernel/sound/x86/snd-hdmi-lpe-audio.ko
  INSTALL /lib/modules/5.14.0-_ajain__ciqlts9_4-b8631ac27+/kernel/sound/xen/snd_xen_front.ko
  INSTALL /lib/modules/5.14.0-_ajain__ciqlts9_4-b8631ac27+/kernel/virt/lib/irqbypass.ko
  STRIP   /lib/modules/5.14.0-_ajain__ciqlts9_4-b8631ac27+/kernel/sound/xen/snd_xen_front.ko
  STRIP   /lib/modules/5.14.0-_ajain__ciqlts9_4-b8631ac27+/kernel/virt/lib/irqbypass.ko
  SIGN    /lib/modules/5.14.0-_ajain__ciqlts9_4-b8631ac27+/kernel/sound/xen/snd_xen_front.ko
  SIGN    /lib/modules/5.14.0-_ajain__ciqlts9_4-b8631ac27+/kernel/virt/lib/irqbypass.ko
  DEPMOD  /lib/modules/5.14.0-_ajain__ciqlts9_4-b8631ac27+
[TIMER]{MODULES}: 17s
Making Install
sh ./arch/x86/boot/install.sh 5.14.0-_ajain__ciqlts9_4-b8631ac27+ \
	arch/x86/boot/bzImage System.map "/boot"
[TIMER]{INSTALL}: 38s
Checking kABI
Checking kABI
kABI check passed
Setting Default Kernel to /boot/vmlinuz-5.14.0-_ajain__ciqlts9_4-b8631ac27+ and Index to 5
The default is /boot/loader/entries/a2348f88b66c403aaebf889752103fac-5.14.0-_ajain__ciqlts9_4-b8631ac27+.conf with index 5 and kernel /boot/vmlinuz-5.14.0-_ajain__ciqlts9_4-b8631ac27+
The default is /boot/loader/entries/a2348f88b66c403aaebf889752103fac-5.14.0-_ajain__ciqlts9_4-b8631ac27+.conf with index 5 and kernel /boot/vmlinuz-5.14.0-_ajain__ciqlts9_4-b8631ac27+
Generating grub configuration file ...
Adding boot menu entry for UEFI Firmware Settings ...
done
Hopefully Grub2.0 took everything ... rebooting after time metrices
[TIMER]{MRPROPER}: 0s
[TIMER]{BUILD}: 3575s
[TIMER]{MODULES}: 17s
[TIMER]{INSTALL}: 38s
[TIMER]{TOTAL} 3634s
Rebooting in 10 seconds

kernel-build.log

Kselftests

$ grep '^ok ' kselftest-before.log | wc -l && grep '^ok ' kselftest-after.log | wc -l
364
363
$  grep '^not ok ' kselftest-before.log | wc -l && grep '^not ok ' kselftest-after.log | wc -l
74
75

kselftest-after.log
kselftest-before.log

jira VULN-8255
cve CVE-2024-40956
commit-author Li RongQing <[email protected]>
commit e3215de

Use list_for_each_entry_safe() to allow iterating through the list and
deleting the entry in the iteration process. The descriptor is freed via
idxd_desc_complete() and there's a slight chance may cause issue for
the list iterator when the descriptor is reused by another thread
without it being deleted from the list.

Fixes: 16e19e1 ("dmaengine: idxd: Fix list corruption in description completion")
	Signed-off-by: Li RongQing <[email protected]>
	Reviewed-by: Dave Jiang <[email protected]>
	Reviewed-by: Fenghua Yu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
	Signed-off-by: Vinod Koul <[email protected]>
(cherry picked from commit e3215de)
	Signed-off-by: Anmol Jain <[email protected]>
@thefossguy-ciq
Copy link

This PR is causing the tls kselftest under the net subsystem to fail.

$ git diff <(grep '^ok ' kselftest-before.log) <(grep '^ok ' kselftest-after.log) | grep '^-ok'
-ok 6 selftests: net: tls

@PlaidCat
Copy link
Collaborator

This PR is causing the tls kselftest under the net subsystem to fail.

$ git diff <(grep '^ok ' kselftest-before.log) <(grep '^ok ' kselftest-after.log) | grep '^-ok'
-ok 6 selftests: net: tls

If you diff the net: tls part the majority of SKIPS and fail come from

Failure setting TCP_ULP, testing without tls
Screenshot 2025-07-17 at 1 16 19 PM Screenshot 2025-07-17 at 1 17 33 PM

Were these on a different VM? Maybe just the right TLS stuff isn't installed, I can't see why this change would impact net: tls like this

@shreeya-patel98
Copy link

shreeya-patel98 commented Jul 21, 2025

@PlaidCat @thefossguy-ciq @jainanmol84 I think we need a script which installs all the necessary packages needed for kselftest. Many kselftests were failing for me initially because of not having some packages needed for that test. I then tried to install atleast the basic ones. But we all will have different results with different packages. So I will try to work on something where we have a standard installation script which everyone can use after setting up their VM.

@jainanmol84
Copy link
Author

jainanmol84 commented Jul 21, 2025

Were these on a different VM? Maybe just the right TLS stuff isn't installed, I can't see why this change would impact net: tls like this

This is the same VM—nothing changed in the environment
Screenshot 2025-07-21 at 6 36 15 PM

@thefossguy-ciq
Copy link

Does running the kselftests again help?

@PlaidCat
Copy link
Collaborator

I think this is one of those fun issues of kselftests being fickle

@jainanmol84
Copy link
Author

jainanmol84 commented Jul 21, 2025

Does running the kselftests again help?

Does running the kselftests again help?

I re-ran the tests and got 2 additional failures.

[anmol@test-vim-enhanced-nomodule-lts94-testing ~]$ grep '^ok ' kselftest-before.log | wc -l && grep '^ok ' kselftest-after.log | wc -l
364
361
[anmol@test-vim-enhanced-nomodule-lts94-testing ~]$ grep '^not ok ' kselftest-before.log | wc -l && grep '^not ok ' kselftest-after.log | wc -l
74
75
[anmol@test-vim-enhanced-nomodule-lts94-testing ~]$ git diff <(grep '^ok ' kselftest-before.log) <(grep '^ok ' kselftest-after.log) | grep '^-ok'
-ok 1 selftests: mqueue: mq_open_tests # SKIP
-ok 2 selftests: mqueue: mq_perf_tests # SKIP
-ok 6 selftests: net: tls

@thefossguy-ciq
Copy link

This makes no sense, especially numbers wise. 3 less tests are passing but only 1 more test is failing. Math ain't mathing.

@thefossguy-ciq
Copy link

@jainanmol84 can you run kselftest on the "old" (on the kernel without your patch) twice and see if that has any variance?

@PlaidCat
Copy link
Collaborator

If there are no obvious failures in the dmesg I'm willing to chalk this up to kselftest issues.

@thefossguy-ciq
Copy link

I tried this:

  1. Compile the kernel at commit 28a0306e2178.
  2. Boot into the kernel and run kselftests one after another.
  3. Reboot and then repeat step 2 once more.

That way, I got a total of 4 log files for a separate kselftest execution run.
kselftest-before-28a0306e2178-001.log: first run, first boot
kselftest-before-28a0306e2178-002.log: second run, first boot
kselftest-before-28a0306e2178-003.log: third run, second boot
kselftest-before-28a0306e2178-004.log: fourth run, second boot

And diffing them, I see a different kselftest failing, on the same kernel without rebooting.

# these are from the same boot, shouldn't differ, yet they do
$ git diff <(grep '^ok ' kselftest-before-28a0306e2178-001.log | sort | uniq) <(grep '^ok ' kselftest-before-28a0306e2178-002.log | sort | uniq) | grep '^-ok'
-ok 11 selftests: proc: proc-uptime-001
# same as before, 003 and 004 are on the same boot, shouldn't differ, yet they do
$ git diff <(grep '^ok ' kselftest-before-28a0306e2178-003.log | sort | uniq) <(grep '^ok ' kselftest-before-28a0306e2178-004.log | sort | uniq) | grep '^-ok'
-ok 29 selftests: kvm: vmx_preemption_timer_test
# actually, all 4 differ :)
$ git diff <(grep '^ok ' kselftest-before-28a0306e2178-001.log | sort | uniq) <(grep '^ok ' kselftest-before-28a0306e2178-002.log | sort | uniq) | grep '^-ok'
-ok 11 selftests: proc: proc-uptime-001

$ git diff <(grep '^ok ' kselftest-before-28a0306e2178-001.log | sort | uniq) <(grep '^ok ' kselftest-before-28a0306e2178-003.log | sort | uniq) | grep '^-ok'
-ok 6 selftests: net: tls

$ git diff <(grep '^ok ' kselftest-before-28a0306e2178-001.log | sort | uniq) <(grep '^ok ' kselftest-before-28a0306e2178-004.log | sort | uniq) | grep '^-ok'
-ok 29 selftests: kvm: vmx_preemption_timer_test

Copy link

@thefossguy-ciq thefossguy-ciq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for dragging the kselftest discussion. It was found that it is irrelevant to this particular PR. Hence, approving. 🚤

@thefossguy-ciq
Copy link

Forgot to specify this earlier: The tests from different (and unrelated) subsystems are failing. Indicating a deeper issue, which is irrelevant to this PR.

Copy link
Collaborator

@PlaidCat PlaidCat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@PlaidCat
Copy link
Collaborator

I've maybe not re-enforced this enough.
Not all the kselftests at the state they're in always behave the same and an issue in one unmaintained or poorly maintained kselftest from Upstream / RHEL can cause non-derministic problems later on in the run.
So i expect some fuzz in the kernel selftests as they stand.

Ideally I'd like a fresh vm to run each ksleftest individually on PR and we just run the kselftest for the subsystem we're operating in.

basically

  1. Kernel build
  2. kernel installed
  3. VM shutdown
  4. qcow2 copied to new cqow2 for each kselftest
  5. for each kselftest subsystem start a new fresh vm with corresponding qcow2 and run just that kselftest subsystem
    5a. repeast multiple times with new qcow2s from above
  6. agrigate results

Right now that is a little beyond our local systems

@pvts-mat
Copy link
Contributor

pvts-mat commented Jul 24, 2025

Right now that is a little beyond our local systems

@PlaidCat Actually this is right within reach of rocky-patching. It allows for running arbitrary selftests subset, with automatic control of VMs (starting / stopping / etc.). No qcows cloning for this, but can be easily added - it's implemented already to create test machines from the build machines.

But let me tell you right away - doing this for each test will blow out the testing time from around 2-3 hours to around 2-3 days.

@pvts-mat
Copy link
Contributor

No qcows cloning for this, but can be easily added - it's implemented already to create test machines from the build machines.

Well, scratch that. Nothing needs to be implemented. Here's the script which does this kind of testing for some 3 random tests drivers/dma-buf:udmabuf, net:icmp_redirect.sh, proc:setns-sysvipc:

CVE=CVE-2024-40956 ./ninja.sh -d explain _prepared__x86_64--build--ciqlts9_4-CVE-2024-40956
for t in drivers/dma-buf:udmabuf net:icmp_redirect.sh proc:setns-sysvipc; do
    TESTSUITE=".name | test(\"${t}\")" TESTS_PWRCYCLES_COUNT=1 CVE=CVE-2024-40956 ./ninja.sh _kselftests_run__x86_64--test--ciqlts9_4-CVE-2024-40956
    CVE=CVE-2024-40956 ./ninja.sh _invalidate__x86_64--test--ciqlts9_4-CVE-2024-40956
done

The results are in separate files tagged with timestamp, like tests/ciqlts9_4/x86_64/d2b099a5db0f7e045647aa46769f30c0fd02a549/session--2025-07-23--03-45-24.log, they can be cat'ed and fed to ktests.xsh diff for aggregation

@PlaidCat
Copy link
Collaborator

But let me tell you right away - doing this for each test will blow out the testing time from around 2-3 hours to around 2-3 days.

Yeah this is why I want multiple machines and its "Grand Design" (or my cruddy bullet points) knows that its more than just on the hosst

@pvts-mat
Copy link
Contributor

Sorry for coming to the party uninvited, but it just breaks my heart to see other people fighting the unnecessary fights. We could save so much energy not re-inventing the wheel and channel it to actually push the cart. @shreeya-patel98 shares this sentiment too, I see

@PlaidCat @thefossguy-ciq @jainanmol84 I think we need a script which installs all the necessary packages needed for kselftest. Many kselftests were failing for me initially because of not having some packages needed for that test. I then tried to install atleast the basic ones. But we all will have different results with different packages. So I will try to work on something where we have a standard installation script which everyone can use after setting up their VM.

This is exactly what was driving the rocky-patching project at https://gitlab.conclusive.pl/devices/rocky-patching. It's kinda big and odd, with some tools most would find exotic so it's not for everyone, but you can look @shreeya-patel98 at yaml files at https://gitlab.conclusive.pl/devices/rocky-patching/-/tree/master/jinja/cloud-init?ref_type=heads - they serve as basically the very scripts you want to write (launched automatically at machine's spin-up), so you can have some starting point. I would love to see what you ended up adding to them, as I still have some test suites not compiling properly.

Here's a file which categorizes the selftests according to how they behave (flappy / stable / broken / etc.) so that the problematic ones can be omitted and the tests can be ran smoothly: https://gitlab.conclusive.pl/devices/rocky-patching/-/blob/master/rocky.yml?ref_type=heads. Again, it's part of the framework, but may as well be used "manually" for whatever one needs. For example, I skip all the tests marked as "bad", because running them is meaningless and they often screw up the machine in some way.

@jainanmol84 jainanmol84 merged commit a5b8c8f into ciqlts9_4 Jul 28, 2025
4 checks passed
@jainanmol84 jainanmol84 deleted the {ajain}_ciqlts9_4 branch July 28, 2025 09:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants