FIPS-9 Add missing code that is included in the RPM but not represented in the kernel src trees #66

PlaidCat · 2025-01-15T18:21:30Z

This exists in the current delivery spec but is not in the kernel and this work has long since been done, I do not see a public version of the dist-git so that will be communicated internally

This is just reading it to the certified and will rebase fips-9-compliant off this commit.

net-memcg: avoid stalls when under memory pressure

jira LE-1603
bugfix: cgroupv2 cannot dowload file larger than RAM limit commit-author Jakub Kicinski <[email protected]>
commit 720ca52bcef225b967a339e0fffb6d0c7e962240

As Shakeel explains the commit under Fixes had the unintended side-effect of no longer pre-loading the cached memory allowance. Even tho we previously dropped the first packet received when over memory limit - the consecutive ones would get thru by using the cache. The charging was happening in batches of 128kB, so we'd let in 128kB (truesize) worth of packets per one drop.

After the change we no longer force charge, there will be no cache filling side effects. This causes significant drops and connection stalls for workloads which use a lot of page cache, since we can't reclaim page cache under GFP_NOWAIT.

Some of the latency can be recovered by improving SACK reneg handling but nowhere near enough to get back to the pre-5.15 performance (the application I'm experimenting with still sees 5-10x worst latency).

Apply the suggested workaround of using GFP_ATOMIC. We will now be more permissive than previously as we'll drop _no_ packets in softirq when under pressure. But I can't think of any good and simple way to address that within networking.

Link: https://lore.kernel.org/all/[email protected]/
	Suggested-by: Shakeel Butt <[email protected]>
Fixes: 4b1327be9fe5 ("net-memcg: pass in gfp_t mask to mem_cgroup_charge_skmem()")
	Acked-by: Shakeel Butt <[email protected]>
	Acked-by: Roman Gushchin <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
	Signed-off-by: Jakub Kicinski <[email protected]>
(cherry picked from commit 720ca52bcef225b967a339e0fffb6d0c7e962240)
	Signed-off-by: Jonathan Maple <[email protected]>

jira LE-1603 bugfix: cgroupv2 cannot dowload file larger than RAM limit commit-author Jakub Kicinski <[email protected]> commit 720ca52 As Shakeel explains the commit under Fixes had the unintended side-effect of no longer pre-loading the cached memory allowance. Even tho we previously dropped the first packet received when over memory limit - the consecutive ones would get thru by using the cache. The charging was happening in batches of 128kB, so we'd let in 128kB (truesize) worth of packets per one drop. After the change we no longer force charge, there will be no cache filling side effects. This causes significant drops and connection stalls for workloads which use a lot of page cache, since we can't reclaim page cache under GFP_NOWAIT. Some of the latency can be recovered by improving SACK reneg handling but nowhere near enough to get back to the pre-5.15 performance (the application I'm experimenting with still sees 5-10x worst latency). Apply the suggested workaround of using GFP_ATOMIC. We will now be more permissive than previously as we'll drop _no_ packets in softirq when under pressure. But I can't think of any good and simple way to address that within networking. Link: https://lore.kernel.org/all/[email protected]/ Suggested-by: Shakeel Butt <[email protected]> Fixes: 4b1327b ("net-memcg: pass in gfp_t mask to mem_cgroup_charge_skmem()") Acked-by: Shakeel Butt <[email protected]> Acked-by: Roman Gushchin <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> (cherry picked from commit 720ca52) Signed-off-by: Jonathan Maple <[email protected]>

gvrose8192

Definitely need this one. Thanks!

PlaidCat · 2025-01-15T18:40:07Z

Definitely need this one. Thanks!

YEah Its in the RPM but not in our branch ... don't want to loose this and its probably worth bringing to 9.2 in general

…b folio JIRA: https://issues.redhat.com/browse/RHEL-84184 NOTE: this patch can also be found (by kerneloscope) on a dead parent branch in upstream mainline tree as: commit 1390a33 Conflicts: * include/linux/page-flags.h: RHEL-only hunk is required here to avoid breaking the build for kernel variants that disable CONFIG_TRANSPARENT_HUGEPAGE but keep CONFIG_HUGETLBFS enabled (-rt). This is because RHEL-9 misses upstream v6.10 commit 85edc15 ("mm: remove folio_prep_large_rmappable()") along with its accompanying series which are irrelevant for this backport work; This patch is a backport of the following upstream commit: commit f708f69 Author: Miaohe Lin <[email protected]> Date: Tue Jul 9 20:04:33 2024 +0800 mm/hugetlb: fix kernel NULL pointer dereference when migrating hugetlb folio A kernel crash was observed when migrating hugetlb folio: BUG: kernel NULL pointer dereference, address: 0000000000000008 PGD 0 P4D 0 Oops: Oops: 0002 [#1] PREEMPT SMP NOPTI CPU: 0 PID: 3435 Comm: bash Not tainted 6.10.0-rc6-00450-g8578ca01f21f #66 RIP: 0010:__folio_undo_large_rmappable+0x70/0xb0 RSP: 0018:ffffb165c98a7b38 EFLAGS: 00000097 RAX: fffffbbc44528090 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffffa30e000a2800 RSI: 0000000000000246 RDI: ffffa3153ffffcc0 RBP: fffffbbc44528000 R08: 0000000000002371 R09: ffffffffbe4e5868 R10: 0000000000000001 R11: 0000000000000001 R12: ffffa3153ffffcc0 R13: fffffbbc44468000 R14: 0000000000000001 R15: 0000000000000001 FS: 00007f5b3a716740(0000) GS:ffffa3151fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 000000010959a000 CR4: 00000000000006f0 Call Trace: <TASK> __folio_migrate_mapping+0x59e/0x950 __migrate_folio.constprop.0+0x5f/0x120 move_to_new_folio+0xfd/0x250 migrate_pages+0x383/0xd70 soft_offline_page+0x2ab/0x7f0 soft_offline_page_store+0x52/0x90 kernfs_fop_write_iter+0x12c/0x1d0 vfs_write+0x380/0x540 ksys_write+0x64/0xe0 do_syscall_64+0xb9/0x1d0 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f5b3a514887 RSP: 002b:00007ffe138fce68 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007f5b3a514887 RDX: 000000000000000c RSI: 0000556ab809ee10 RDI: 0000000000000001 RBP: 0000556ab809ee10 R08: 00007f5b3a5d1460 R09: 000000007fffffff R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000c R13: 00007f5b3a61b780 R14: 00007f5b3a617600 R15: 00007f5b3a616a00 It's because hugetlb folio is passed to __folio_undo_large_rmappable() unexpectedly. large_rmappable flag is imperceptibly set to hugetlb folio since commit f6a8dd9 ("hugetlb: convert alloc_buddy_hugetlb_folio to use a folio"). Then commit be9581e ("mm: fix crashes from deferred split racing folio migration") makes folio_migrate_mapping() call folio_undo_large_rmappable() triggering the bug. Fix this issue by clearing large_rmappable flag for hugetlb folios. They don't need that flag set anyway. Link: https://lkml.kernel.org/r/[email protected] Fixes: f6a8dd9 ("hugetlb: convert alloc_buddy_hugetlb_folio to use a folio") Fixes: be9581e ("mm: fix crashes from deferred split racing folio migration") Signed-off-by: Miaohe Lin <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Muchun Song <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Rafael Aquini <[email protected]>

PlaidCat self-assigned this Jan 15, 2025

PlaidCat requested review from gvrose8192, bmastbergen and jason-rodri January 15, 2025 18:21

gvrose8192 approved these changes Jan 15, 2025

View reviewed changes

bmastbergen approved these changes Jan 15, 2025

View reviewed changes

PlaidCat merged commit 4a5975c into fips-9-certified/5.14.0-284.30.1 Jan 22, 2025
4 checks passed

PlaidCat deleted the {jmaple}_sync_code_with_release_fips-9-certified branch January 22, 2025 22:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIPS-9 Add missing code that is included in the RPM but not represented in the kernel src trees #66

FIPS-9 Add missing code that is included in the RPM but not represented in the kernel src trees #66

PlaidCat commented Jan 15, 2025

gvrose8192 left a comment

PlaidCat commented Jan 15, 2025

FIPS-9 Add missing code that is included in the RPM but not represented in the kernel src trees #66

FIPS-9 Add missing code that is included in the RPM but not represented in the kernel src trees #66

Conversation

PlaidCat commented Jan 15, 2025

gvrose8192 left a comment

Choose a reason for hiding this comment

PlaidCat commented Jan 15, 2025