[LTS 9.4] dmaengine: idxd: Fix possible Use-After-Free in irq_process_work_list #422

jainanmol84 · 2025-07-17T08:18:19Z

Commit Message Requirements
Built against Vault/LTS Environment
kABI Check Passed, where Valid (Pre 9.4 RT does not have kABI stability)
Boot Test
Kernel SelfTest results
Additional Tests as determined relevant

Commit message

jira VULN-8255
cve CVE-2024-40956
commit-author Li RongQing <[email protected]>
commit e3215deca4520773cd2b155bed164c12365149a7

Use list_for_each_entry_safe() to allow iterating through the list and deleting the entry in the iteration process. The descriptor is freed via idxd_desc_complete() and there's a slight chance may cause issue for the list iterator when the descriptor is reused by another thread without it being deleted from the list.

Fixes: 16e19e11228b ("dmaengine: idxd: Fix list corruption in description completion")
	Signed-off-by: Li RongQing <[email protected]>
	Reviewed-by: Dave Jiang <[email protected]>
	Reviewed-by: Fenghua Yu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
	Signed-off-by: Vinod Koul <[email protected]>
(cherry picked from commit e3215deca4520773cd2b155bed164c12365149a7)
	Signed-off-by: Anmol Jain <[email protected]>

Kernel build logs

/home/anmol/kernel-src-tree
no .config file found, moving on
[TIMER]{MRPROPER}: 0s
x86_64 architecture detected, copying config
'configs/kernel-x86_64-rhel.config' -> '.config'
Setting Local Version for build
CONFIG_LOCALVERSION="-_ajain__ciqlts9_4-b8631ac27"
Making olddefconfig
  HOSTCC  scripts/basic/fixdep
  HOSTCC  scripts/kconfig/conf.o
  HOSTCC  scripts/kconfig/confdata.o
  HOSTCC  scripts/kconfig/expr.o
  LEX     scripts/kconfig/lexer.lex.c
  YACC    scripts/kconfig/parser.tab.[ch]
  HOSTCC  scripts/kconfig/lexer.lex.o
  HOSTCC  scripts/kconfig/menu.o
  HOSTCC  scripts/kconfig/parser.tab.o
  HOSTCC  scripts/kconfig/preprocess.o
  HOSTCC  scripts/kconfig/symbol.o
  HOSTCC  scripts/kconfig/util.o
  HOSTLD  scripts/kconfig/conf
#
# configuration written to .config
#
Starting Build
  SYSHDR  arch/x86/include/generated/uapi/asm/unistd_32.h
  SYSHDR  arch/x86/include/generated/uapi/asm/unistd_64.h
  WRAP    arch/x86/include/generated/uapi/asm/bpf_perf_event.h
  WRAP    arch/x86/include/generated/uapi/asm/errno.h
  WRAP    arch/x86/include/generated/uapi/asm/fcntl.h
  WRAP    arch/x86/include/generated/uapi/asm/ioctl.h
  WRAP    arch/x86/include/generated/uapi/asm/ioctls.h
[--snip--]
  STRIP   /lib/modules/5.14.0-_ajain__ciqlts9_4-b8631ac27+/kernel/sound/virtio/virtio_snd.ko
  INSTALL /lib/modules/5.14.0-_ajain__ciqlts9_4-b8631ac27+/kernel/sound/x86/snd-hdmi-lpe-audio.ko
  STRIP   /lib/modules/5.14.0-_ajain__ciqlts9_4-b8631ac27+/kernel/sound/x86/snd-hdmi-lpe-audio.ko
  SIGN    /lib/modules/5.14.0-_ajain__ciqlts9_4-b8631ac27+/kernel/sound/virtio/virtio_snd.ko
  SIGN    /lib/modules/5.14.0-_ajain__ciqlts9_4-b8631ac27+/kernel/sound/x86/snd-hdmi-lpe-audio.ko
  INSTALL /lib/modules/5.14.0-_ajain__ciqlts9_4-b8631ac27+/kernel/sound/xen/snd_xen_front.ko
  INSTALL /lib/modules/5.14.0-_ajain__ciqlts9_4-b8631ac27+/kernel/virt/lib/irqbypass.ko
  STRIP   /lib/modules/5.14.0-_ajain__ciqlts9_4-b8631ac27+/kernel/sound/xen/snd_xen_front.ko
  STRIP   /lib/modules/5.14.0-_ajain__ciqlts9_4-b8631ac27+/kernel/virt/lib/irqbypass.ko
  SIGN    /lib/modules/5.14.0-_ajain__ciqlts9_4-b8631ac27+/kernel/sound/xen/snd_xen_front.ko
  SIGN    /lib/modules/5.14.0-_ajain__ciqlts9_4-b8631ac27+/kernel/virt/lib/irqbypass.ko
  DEPMOD  /lib/modules/5.14.0-_ajain__ciqlts9_4-b8631ac27+
[TIMER]{MODULES}: 17s
Making Install
sh ./arch/x86/boot/install.sh 5.14.0-_ajain__ciqlts9_4-b8631ac27+ \
	arch/x86/boot/bzImage System.map "/boot"
[TIMER]{INSTALL}: 38s
Checking kABI
Checking kABI
kABI check passed
Setting Default Kernel to /boot/vmlinuz-5.14.0-_ajain__ciqlts9_4-b8631ac27+ and Index to 5
The default is /boot/loader/entries/a2348f88b66c403aaebf889752103fac-5.14.0-_ajain__ciqlts9_4-b8631ac27+.conf with index 5 and kernel /boot/vmlinuz-5.14.0-_ajain__ciqlts9_4-b8631ac27+
The default is /boot/loader/entries/a2348f88b66c403aaebf889752103fac-5.14.0-_ajain__ciqlts9_4-b8631ac27+.conf with index 5 and kernel /boot/vmlinuz-5.14.0-_ajain__ciqlts9_4-b8631ac27+
Generating grub configuration file ...
Adding boot menu entry for UEFI Firmware Settings ...
done
Hopefully Grub2.0 took everything ... rebooting after time metrices
[TIMER]{MRPROPER}: 0s
[TIMER]{BUILD}: 3575s
[TIMER]{MODULES}: 17s
[TIMER]{INSTALL}: 38s
[TIMER]{TOTAL} 3634s
Rebooting in 10 seconds

kernel-build.log

Kselftests

$ grep '^ok ' kselftest-before.log | wc -l && grep '^ok ' kselftest-after.log | wc -l
364
363
$  grep '^not ok ' kselftest-before.log | wc -l && grep '^not ok ' kselftest-after.log | wc -l
74
75

kselftest-after.log
kselftest-before.log

jira VULN-8255 cve CVE-2024-40956 commit-author Li RongQing <[email protected]> commit e3215de Use list_for_each_entry_safe() to allow iterating through the list and deleting the entry in the iteration process. The descriptor is freed via idxd_desc_complete() and there's a slight chance may cause issue for the list iterator when the descriptor is reused by another thread without it being deleted from the list. Fixes: 16e19e1 ("dmaengine: idxd: Fix list corruption in description completion") Signed-off-by: Li RongQing <[email protected]> Reviewed-by: Dave Jiang <[email protected]> Reviewed-by: Fenghua Yu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Vinod Koul <[email protected]> (cherry picked from commit e3215de) Signed-off-by: Anmol Jain <[email protected]>

thefossguy-ciq · 2025-07-17T08:28:18Z

This PR is causing the tls kselftest under the net subsystem to fail.

$ git diff <(grep '^ok ' kselftest-before.log) <(grep '^ok ' kselftest-after.log) | grep '^-ok'
-ok 6 selftests: net: tls

PlaidCat · 2025-07-17T17:18:25Z

This PR is causing the tls kselftest under the net subsystem to fail.
$ git diff <(grep '^ok ' kselftest-before.log) <(grep '^ok ' kselftest-after.log) | grep '^-ok'
-ok 6 selftests: net: tls

If you diff the net: tls part the majority of SKIPS and fail come from

Failure setting TCP_ULP, testing without tls

Were these on a different VM? Maybe just the right TLS stuff isn't installed, I can't see why this change would impact net: tls like this

shreeya-patel98 · 2025-07-21T09:57:52Z

@PlaidCat @thefossguy-ciq @jainanmol84 I think we need a script which installs all the necessary packages needed for kselftest. Many kselftests were failing for me initially because of not having some packages needed for that test. I then tried to install atleast the basic ones. But we all will have different results with different packages. So I will try to work on something where we have a standard installation script which everyone can use after setting up their VM.

jainanmol84 · 2025-07-21T13:05:24Z

Were these on a different VM? Maybe just the right TLS stuff isn't installed, I can't see why this change would impact net: tls like this

This is the same VM—nothing changed in the environment

thefossguy-ciq · 2025-07-21T13:09:22Z

Does running the kselftests again help?

PlaidCat · 2025-07-21T13:11:20Z

I think this is one of those fun issues of kselftests being fickle

jainanmol84 · 2025-07-21T17:26:30Z

Does running the kselftests again help?

I re-ran the tests and got 2 additional failures.

[anmol@test-vim-enhanced-nomodule-lts94-testing ~]$ grep '^ok ' kselftest-before.log | wc -l && grep '^ok ' kselftest-after.log | wc -l
364
361
[anmol@test-vim-enhanced-nomodule-lts94-testing ~]$ grep '^not ok ' kselftest-before.log | wc -l && grep '^not ok ' kselftest-after.log | wc -l
74
75
[anmol@test-vim-enhanced-nomodule-lts94-testing ~]$ git diff <(grep '^ok ' kselftest-before.log) <(grep '^ok ' kselftest-after.log) | grep '^-ok'
-ok 1 selftests: mqueue: mq_open_tests # SKIP
-ok 2 selftests: mqueue: mq_perf_tests # SKIP
-ok 6 selftests: net: tls

thefossguy-ciq · 2025-07-22T05:19:17Z

This makes no sense, especially numbers wise. 3 less tests are passing but only 1 more test is failing. Math ain't mathing.

thefossguy-ciq · 2025-07-22T05:19:58Z

@jainanmol84 can you run kselftest on the "old" (on the kernel without your patch) twice and see if that has any variance?

PlaidCat · 2025-07-23T16:47:57Z

If there are no obvious failures in the dmesg I'm willing to chalk this up to kselftest issues.

thefossguy-ciq · 2025-07-24T17:27:19Z

I tried this:

Compile the kernel at commit 28a0306e2178.
Boot into the kernel and run kselftests one after another.
Reboot and then repeat step 2 once more.

That way, I got a total of 4 log files for a separate kselftest execution run.
kselftest-before-28a0306e2178-001.log: first run, first boot
kselftest-before-28a0306e2178-002.log: second run, first boot
kselftest-before-28a0306e2178-003.log: third run, second boot
kselftest-before-28a0306e2178-004.log: fourth run, second boot

And diffing them, I see a different kselftest failing, on the same kernel without rebooting.

# these are from the same boot, shouldn't differ, yet they do
$ git diff <(grep '^ok ' kselftest-before-28a0306e2178-001.log | sort | uniq) <(grep '^ok ' kselftest-before-28a0306e2178-002.log | sort | uniq) | grep '^-ok'
-ok 11 selftests: proc: proc-uptime-001

# same as before, 003 and 004 are on the same boot, shouldn't differ, yet they do
$ git diff <(grep '^ok ' kselftest-before-28a0306e2178-003.log | sort | uniq) <(grep '^ok ' kselftest-before-28a0306e2178-004.log | sort | uniq) | grep '^-ok'
-ok 29 selftests: kvm: vmx_preemption_timer_test

# actually, all 4 differ :)
$ git diff <(grep '^ok ' kselftest-before-28a0306e2178-001.log | sort | uniq) <(grep '^ok ' kselftest-before-28a0306e2178-002.log | sort | uniq) | grep '^-ok'
-ok 11 selftests: proc: proc-uptime-001

$ git diff <(grep '^ok ' kselftest-before-28a0306e2178-001.log | sort | uniq) <(grep '^ok ' kselftest-before-28a0306e2178-003.log | sort | uniq) | grep '^-ok'
-ok 6 selftests: net: tls

$ git diff <(grep '^ok ' kselftest-before-28a0306e2178-001.log | sort | uniq) <(grep '^ok ' kselftest-before-28a0306e2178-004.log | sort | uniq) | grep '^-ok'
-ok 29 selftests: kvm: vmx_preemption_timer_test

thefossguy-ciq

Sorry for dragging the kselftest discussion. It was found that it is irrelevant to this particular PR. Hence, approving. 🚤

thefossguy-ciq · 2025-07-24T17:29:51Z

Forgot to specify this earlier: The tests from different (and unrelated) subsystems are failing. Indicating a deeper issue, which is irrelevant to this PR.

PlaidCat

PlaidCat · 2025-07-24T17:58:43Z

I've maybe not re-enforced this enough.
Not all the kselftests at the state they're in always behave the same and an issue in one unmaintained or poorly maintained kselftest from Upstream / RHEL can cause non-derministic problems later on in the run.
So i expect some fuzz in the kernel selftests as they stand.

Ideally I'd like a fresh vm to run each ksleftest individually on PR and we just run the kselftest for the subsystem we're operating in.

basically

Kernel build
kernel installed
VM shutdown
qcow2 copied to new cqow2 for each kselftest
for each kselftest subsystem start a new fresh vm with corresponding qcow2 and run just that kselftest subsystem
5a. repeast multiple times with new qcow2s from above
agrigate results

Right now that is a little beyond our local systems

pvts-mat · 2025-07-24T18:17:05Z

Right now that is a little beyond our local systems

@PlaidCat Actually this is right within reach of rocky-patching. It allows for running arbitrary selftests subset, with automatic control of VMs (starting / stopping / etc.). No qcows cloning for this, but can be easily added - it's implemented already to create test machines from the build machines.

But let me tell you right away - doing this for each test will blow out the testing time from around 2-3 hours to around 2-3 days.

pvts-mat · 2025-07-24T18:45:35Z

No qcows cloning for this, but can be easily added - it's implemented already to create test machines from the build machines.

Well, scratch that. Nothing needs to be implemented. Here's the script which does this kind of testing for some 3 random tests drivers/dma-buf:udmabuf, net:icmp_redirect.sh, proc:setns-sysvipc:

CVE=CVE-2024-40956 ./ninja.sh -d explain _prepared__x86_64--build--ciqlts9_4-CVE-2024-40956
for t in drivers/dma-buf:udmabuf net:icmp_redirect.sh proc:setns-sysvipc; do
    TESTSUITE=".name | test(\"${t}\")" TESTS_PWRCYCLES_COUNT=1 CVE=CVE-2024-40956 ./ninja.sh _kselftests_run__x86_64--test--ciqlts9_4-CVE-2024-40956
    CVE=CVE-2024-40956 ./ninja.sh _invalidate__x86_64--test--ciqlts9_4-CVE-2024-40956
done

The results are in separate files tagged with timestamp, like tests/ciqlts9_4/x86_64/d2b099a5db0f7e045647aa46769f30c0fd02a549/session--2025-07-23--03-45-24.log, they can be cat'ed and fed to ktests.xsh diff for aggregation

PlaidCat · 2025-07-24T18:51:51Z

But let me tell you right away - doing this for each test will blow out the testing time from around 2-3 hours to around 2-3 days.

Yeah this is why I want multiple machines and its "Grand Design" (or my cruddy bullet points) knows that its more than just on the hosst

pvts-mat · 2025-07-24T19:17:49Z

Sorry for coming to the party uninvited, but it just breaks my heart to see other people fighting the unnecessary fights. We could save so much energy not re-inventing the wheel and channel it to actually push the cart. @shreeya-patel98 shares this sentiment too, I see

@PlaidCat @thefossguy-ciq @jainanmol84 I think we need a script which installs all the necessary packages needed for kselftest. Many kselftests were failing for me initially because of not having some packages needed for that test. I then tried to install atleast the basic ones. But we all will have different results with different packages. So I will try to work on something where we have a standard installation script which everyone can use after setting up their VM.

This is exactly what was driving the rocky-patching project at https://gitlab.conclusive.pl/devices/rocky-patching. It's kinda big and odd, with some tools most would find exotic so it's not for everyone, but you can look @shreeya-patel98 at yaml files at https://gitlab.conclusive.pl/devices/rocky-patching/-/tree/master/jinja/cloud-init?ref_type=heads - they serve as basically the very scripts you want to write (launched automatically at machine's spin-up), so you can have some starting point. I would love to see what you ended up adding to them, as I still have some test suites not compiling properly.

Here's a file which categorizes the selftests according to how they behave (flappy / stable / broken / etc.) so that the problematic ones can be omitted and the tests can be ran smoothly: https://gitlab.conclusive.pl/devices/rocky-patching/-/blob/master/rocky.yml?ref_type=heads. Again, it's part of the framework, but may as well be used "manually" for whatever one needs. For example, I skip all the tests marked as "bad", because running them is meaningless and they often screw up the machine in some way.

thefossguy-ciq requested review from kerneltoast, PlaidCat, thefossguy-ciq and shreeya-patel98 July 17, 2025 08:23

thefossguy-ciq approved these changes Jul 24, 2025

View reviewed changes

PlaidCat approved these changes Jul 24, 2025

View reviewed changes

jainanmol84 merged commit a5b8c8f into ciqlts9_4 Jul 28, 2025
4 checks passed

jainanmol84 deleted the {ajain}_ciqlts9_4 branch July 28, 2025 09:12

thefossguy-ciq mentioned this pull request Aug 1, 2025

[lts92] net/tipc: fix slab-use-after-free Read in tipc_aead_encrypt_done #457

Merged

[LTS 9.4] dmaengine: idxd: Fix possible Use-After-Free in irq_process_work_list #422

[LTS 9.4] dmaengine: idxd: Fix possible Use-After-Free in irq_process_work_list #422

Uh oh!

Conversation

jainanmol84 commented Jul 17, 2025

Commit message

Kernel build logs

Kselftests

Uh oh!

thefossguy-ciq commented Jul 17, 2025

Uh oh!

PlaidCat commented Jul 17, 2025

Uh oh!

shreeya-patel98 commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jainanmol84 commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thefossguy-ciq commented Jul 21, 2025

Uh oh!

PlaidCat commented Jul 21, 2025

Uh oh!

jainanmol84 commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thefossguy-ciq commented Jul 22, 2025

Uh oh!

thefossguy-ciq commented Jul 22, 2025

Uh oh!

PlaidCat commented Jul 23, 2025

Uh oh!

thefossguy-ciq commented Jul 24, 2025

Uh oh!

thefossguy-ciq left a comment

Choose a reason for hiding this comment

Uh oh!

thefossguy-ciq commented Jul 24, 2025

Uh oh!

PlaidCat left a comment

Choose a reason for hiding this comment

Uh oh!

PlaidCat commented Jul 24, 2025

Uh oh!

pvts-mat commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pvts-mat commented Jul 24, 2025

Uh oh!

PlaidCat commented Jul 24, 2025

Uh oh!

pvts-mat commented Jul 24, 2025

Uh oh!

Uh oh!

Uh oh!

shreeya-patel98 commented Jul 21, 2025 •

edited

Loading

jainanmol84 commented Jul 21, 2025 •

edited

Loading

jainanmol84 commented Jul 21, 2025 •

edited

Loading

pvts-mat commented Jul 24, 2025 •

edited

Loading