Skip to content

FIPS-9 Add missing code that is included in the RPM but not represented in the kernel src trees #66

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

PlaidCat
Copy link
Collaborator

This exists in the current delivery spec but is not in the kernel and this work has long since been done, I do not see a public version of the dist-git so that will be communicated internally

This is just reading it to the certified and will rebase fips-9-compliant off this commit.

net-memcg: avoid stalls when under memory pressure

jira LE-1603
bugfix: cgroupv2 cannot dowload file larger than RAM limit commit-author Jakub Kicinski <[email protected]>
commit 720ca52bcef225b967a339e0fffb6d0c7e962240

As Shakeel explains the commit under Fixes had the unintended side-effect of no longer pre-loading the cached memory allowance. Even tho we previously dropped the first packet received when over memory limit - the consecutive ones would get thru by using the cache. The charging was happening in batches of 128kB, so we'd let in 128kB (truesize) worth of packets per one drop.

After the change we no longer force charge, there will be no cache filling side effects. This causes significant drops and connection stalls for workloads which use a lot of page cache, since we can't reclaim page cache under GFP_NOWAIT.

Some of the latency can be recovered by improving SACK reneg handling but nowhere near enough to get back to the pre-5.15 performance (the application I'm experimenting with still sees 5-10x worst latency).

Apply the suggested workaround of using GFP_ATOMIC. We will now be more permissive than previously as we'll drop _no_ packets in softirq when under pressure. But I can't think of any good and simple way to address that within networking.

Link: https://lore.kernel.org/all/[email protected]/
	Suggested-by: Shakeel Butt <[email protected]>
Fixes: 4b1327be9fe5 ("net-memcg: pass in gfp_t mask to mem_cgroup_charge_skmem()")
	Acked-by: Shakeel Butt <[email protected]>
	Acked-by: Roman Gushchin <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
	Signed-off-by: Jakub Kicinski <[email protected]>
(cherry picked from commit 720ca52bcef225b967a339e0fffb6d0c7e962240)
	Signed-off-by: Jonathan Maple <[email protected]>

jira LE-1603
bugfix: cgroupv2 cannot dowload file larger than RAM limit
commit-author Jakub Kicinski <[email protected]>
commit 720ca52

As Shakeel explains the commit under Fixes had the unintended
side-effect of no longer pre-loading the cached memory allowance.
Even tho we previously dropped the first packet received when
over memory limit - the consecutive ones would get thru by using
the cache. The charging was happening in batches of 128kB, so
we'd let in 128kB (truesize) worth of packets per one drop.

After the change we no longer force charge, there will be no
cache filling side effects. This causes significant drops and
connection stalls for workloads which use a lot of page cache,
since we can't reclaim page cache under GFP_NOWAIT.

Some of the latency can be recovered by improving SACK reneg
handling but nowhere near enough to get back to the pre-5.15
performance (the application I'm experimenting with still
sees 5-10x worst latency).

Apply the suggested workaround of using GFP_ATOMIC. We will now
be more permissive than previously as we'll drop _no_ packets
in softirq when under pressure. But I can't think of any good
and simple way to address that within networking.

Link: https://lore.kernel.org/all/[email protected]/
	Suggested-by: Shakeel Butt <[email protected]>
Fixes: 4b1327b ("net-memcg: pass in gfp_t mask to mem_cgroup_charge_skmem()")
	Acked-by: Shakeel Butt <[email protected]>
	Acked-by: Roman Gushchin <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
	Signed-off-by: Jakub Kicinski <[email protected]>
(cherry picked from commit 720ca52)
	Signed-off-by: Jonathan Maple <[email protected]>
Copy link

@gvrose8192 gvrose8192 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely need this one. Thanks!

@PlaidCat
Copy link
Collaborator Author

Definitely need this one. Thanks!

YEah Its in the RPM but not in our branch ... don't want to loose this and its probably worth bringing to 9.2 in general

@PlaidCat PlaidCat merged commit 4a5975c into fips-9-certified/5.14.0-284.30.1 Jan 22, 2025
4 checks passed
@PlaidCat PlaidCat deleted the {jmaple}_sync_code_with_release_fips-9-certified branch January 22, 2025 22:33
github-actions bot pushed a commit that referenced this pull request Apr 26, 2025
…b folio

JIRA: https://issues.redhat.com/browse/RHEL-84184
NOTE: this patch can also be found (by kerneloscope) on a dead parent branch
      in upstream mainline tree as:
      commit 1390a33
Conflicts:
  * include/linux/page-flags.h: RHEL-only hunk is required here to avoid breaking
    the build for kernel variants that disable CONFIG_TRANSPARENT_HUGEPAGE but
    keep CONFIG_HUGETLBFS enabled (-rt). This is because RHEL-9 misses upstream
    v6.10 commit 85edc15 ("mm: remove folio_prep_large_rmappable()") along
    with its accompanying series which are irrelevant for this backport work;

This patch is a backport of the following upstream commit:
commit f708f69
Author: Miaohe Lin <[email protected]>
Date:   Tue Jul 9 20:04:33 2024 +0800

    mm/hugetlb: fix kernel NULL pointer dereference when migrating hugetlb folio

    A kernel crash was observed when migrating hugetlb folio:

    BUG: kernel NULL pointer dereference, address: 0000000000000008
    PGD 0 P4D 0
    Oops: Oops: 0002 [#1] PREEMPT SMP NOPTI
    CPU: 0 PID: 3435 Comm: bash Not tainted 6.10.0-rc6-00450-g8578ca01f21f #66
    RIP: 0010:__folio_undo_large_rmappable+0x70/0xb0
    RSP: 0018:ffffb165c98a7b38 EFLAGS: 00000097
    RAX: fffffbbc44528090 RBX: 0000000000000000 RCX: 0000000000000000
    RDX: ffffa30e000a2800 RSI: 0000000000000246 RDI: ffffa3153ffffcc0
    RBP: fffffbbc44528000 R08: 0000000000002371 R09: ffffffffbe4e5868
    R10: 0000000000000001 R11: 0000000000000001 R12: ffffa3153ffffcc0
    R13: fffffbbc44468000 R14: 0000000000000001 R15: 0000000000000001
    FS:  00007f5b3a716740(0000) GS:ffffa3151fc00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000008 CR3: 000000010959a000 CR4: 00000000000006f0
    Call Trace:
     <TASK>
     __folio_migrate_mapping+0x59e/0x950
     __migrate_folio.constprop.0+0x5f/0x120
     move_to_new_folio+0xfd/0x250
     migrate_pages+0x383/0xd70
     soft_offline_page+0x2ab/0x7f0
     soft_offline_page_store+0x52/0x90
     kernfs_fop_write_iter+0x12c/0x1d0
     vfs_write+0x380/0x540
     ksys_write+0x64/0xe0
     do_syscall_64+0xb9/0x1d0
     entry_SYSCALL_64_after_hwframe+0x77/0x7f
    RIP: 0033:0x7f5b3a514887
    RSP: 002b:00007ffe138fce68 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
    RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007f5b3a514887
    RDX: 000000000000000c RSI: 0000556ab809ee10 RDI: 0000000000000001
    RBP: 0000556ab809ee10 R08: 00007f5b3a5d1460 R09: 000000007fffffff
    R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000c
    R13: 00007f5b3a61b780 R14: 00007f5b3a617600 R15: 00007f5b3a616a00

    It's because hugetlb folio is passed to __folio_undo_large_rmappable()
    unexpectedly.  large_rmappable flag is imperceptibly set to hugetlb folio
    since commit f6a8dd9 ("hugetlb: convert alloc_buddy_hugetlb_folio to
    use a folio").  Then commit be9581e ("mm: fix crashes from deferred
    split racing folio migration") makes folio_migrate_mapping() call
    folio_undo_large_rmappable() triggering the bug.  Fix this issue by
    clearing large_rmappable flag for hugetlb folios.  They don't need that
    flag set anyway.

    Link: https://lkml.kernel.org/r/[email protected]
    Fixes: f6a8dd9 ("hugetlb: convert alloc_buddy_hugetlb_folio to use a folio")
    Fixes: be9581e ("mm: fix crashes from deferred split racing folio migration")
    Signed-off-by: Miaohe Lin <[email protected]>
    Cc: Hugh Dickins <[email protected]>
    Cc: Matthew Wilcox (Oracle) <[email protected]>
    Cc: Muchun Song <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>

Signed-off-by: Rafael Aquini <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants