Skip to content

[rocky8_10] Hisotry rebuild to kernel-4.18.0-553.50.1.el8_10 #227

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 32 commits into from
Apr 23, 2025

Conversation

PlaidCat
Copy link
Collaborator

General Process:

Checking Rebuild Commits for potentially missing commits:

[jmaple@devbox kernel-src-tree]$ cat ciq/ciq_backports/kernel-4.18.0-553.50.1.el8_10/rebuild.details.txt
Rebuild_History BUILDABLE
Rebuilding Kernel from rpm changelog with Fuzz Limit: 87.50%
Number of commits in upstream range v4.18~1..kernel-mainline: 538898
Number of commits in rpm: 41
Number of commits matched with upstream: 31 (75.61%)
Number of commits in upstream but not in rpm: 538867
Number of commits NOT found in upstream: 10 (24.39%)

Rebuilding Kernel on Branch rocky8_10_rebuild_kernel-4.18.0-553.50.1.el8_10 for kernel-4.18.0-553.50.1.el8_10
Clean Cherry Picks: 13 (41.94%)
Empty Cherry Picks: 18 (58.06%)
_______________________________

__EMPTY COMMITS__________________________
ce895cf15ab60b93464ebbb515f2fc9e7a8cef9a gfs2: Remove misleading comments in gfs2_evict_inode
03ff3781bf6c149554d88e7b702a3abd5e400dc0 gfs2: gfs2_evict_inode clarification
86934198eefa10a71f35162b06c44c36d85b98ba gfs2: Clear flags when withdraw prevents xmote
9947a06d29c0a30da88cdc6376ca5fd87083e130 gfs2: do_xmote fixes
1e86044402c45b70a9b31beeaefb5cc732a7470c gfs2: Remove and replace gfs2_glock_queue_work
8bbfde0875590b71f012bd8b0c9cb988c9a873b9 gfs2: Add GLF_PENDING_REPLY flag
3774f53d7f0b30a996eab4a1264611489b48f14c gfs2: Replace GIF_DEFER_DELETE with GLF_DEFER_DELETE
0b93bac2271e11beb980fca037a34a9819c7dc37 gfs2: Remove LM_FLAG_PRIORITY flag
bb25b97562e52b2b5808b348db32568b1f5394b5 gfs2: remove dead code in add_to_queue
0360faca5d4dfc18d06644c7661cea1dc2b44dcf gfs2: Remove more dead code in add_to_queue
a431d49243a012738f132054b2303e0815663aac gfs2: Fix request cancelation bug
6cb3b1c2df87a8048ee1d54ec16d2e757af86c7f gfs2: Fix additional unlikely request cancelation race
9136cad723ec3e5ab5ca85a839f151abf1c9a106 gfs2: Prevent inode creation race (2)
b1c2cb86f4a7861480ad54bb9a58df3cbebf8e92 x86/xen: use new hypercall functions instead of hypercall page
7fa0da5373685e7ed249af3fa317ab1e1ba8b0a6 x86/xen: remove hypercall page
0e2bddf9e5f926ce32ed635012d0f8a0b54075d5 ice: add ice_adapter for shared data across PFs on the same NIC
d29a8134c78232213fb88f20d7ae865ec364e367 ice: avoid the PTP hardware semaphore in gettimex64 path
22118810fc7cc98f3afb38919348060ab67ddc5b ice: fold ice_ptp_read_time into ice_ptp_gettimex64

__CHANGES NOT IN UPSTREAM________________
Adding prod certs and changed cert date to 20210620
Adding Rocky secure boot certs
Fixing vmlinuz removal
Fixing UEFI CA path
Porting to 8.10, debranding and Rocky branding
Fixing pesign_key_name values
redhat: drop Y issues from changelog
md/md-bitmap: fix writing non bitmap changes local to RHEL
raid1: update discard granularity when adding new disk
rhel-8.10: gate kernel on kernel-qe tests results not cki ones

BUILD

/mnt/code/kernel-src-tree-build
no .config file found, moving on
[TIMER]{MRPROPER}: 0s
x86_64 architecture detected, copying config
'configs/kernel-x86_64.config' -> '.config'
Setting Local Version for build
CONFIG_LOCALVERSION="-rocky8_10_rebuild-32fa0f457b22"
Making olddefconfig
  HOSTCC  scripts/basic/fixdep
  HOSTCC  scripts/kconfig/conf.o
  HOSTCC  scripts/kconfig/zconf.tab.o
  HOSTLD  scripts/kconfig/conf
scripts/kconfig/conf  --olddefconfig Kconfig
#
# configuration written to .config
#
Starting Build
scripts/kconfig/conf  --syncconfig Kconfig
  SYSTBL  arch/x86/include/generated/asm/syscalls_32.h
  SYSHDR  arch/x86/include/generated/asm/unistd_32_ia32.h
  SYSHDR  arch/x86/include/generated/asm/unistd_64_x32.h

...skipping...
  LD [M]  sound/xen/snd_xen_front.ko
  LD [M]  virt/lib/irqbypass.ko
[TIMER]{BUILD}: 1916s
Making Modules
  INSTALL arch/x86/crypto/camellia-aesni-avx-x86_64.ko
  INSTALL arch/x86/crypto/blowfish-x86_64.ko

  INSTALL sound/xen/snd_xen_front.ko
  INSTALL virt/lib/irqbypass.ko
  DEPMOD  4.18.0-rocky8_10_rebuild-32fa0f457b22+
[TIMER]{MODULES}: 21s
Making Install
sh ./arch/x86/boot/install.sh 4.18.0-rocky8_10_rebuild-32fa0f457b22+ arch/x86/boot/bzImage \
        System.map "/boot"
[TIMER]{INSTALL}: 24s
Checking kABI
Checking kABI
kABI check passed
Setting Default Kernel to /boot/vmlinuz-4.18.0-rocky8_10_rebuild-32fa0f457b22+ and Index to 2
Hopefully Grub2.0 took everything ... rebooting after time metrices
[TIMER]{MRPROPER}: 0s
[TIMER]{BUILD}: 1916s
[TIMER]{MODULES}: 21s
[TIMER]{INSTALL}: 24s
[TIMER]{TOTAL} 1967s
Rebooting in 10 seconds

Boot

Linux r8-sigcloud-builder 4.18.0-rocky8_10_rebuild-32fa0f457b22+ #1 SMP Tue Apr 22 22:08:56 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Kselftest

$ grep '^ok ' 4.18.0-jmaple_sig-cloud-8_4.18.0-553.47.1.el8_10-226faf214012+.keselftest.log | wc -l
206

$ grep '^ok ' kselftest.resf_kernel-4.18.0-553.50.1.el8_10.4.18.0-rocky8_10_rebuild-32fa0f457b22+.log | wc -l
206

PlaidCat added 30 commits April 22, 2025 17:49
jira LE-2815
Rebuild_History Non-Buildable kernel-4.18.0-553.50.1.el8_10
commit-author Andreas Gruenbacher <[email protected]>
commit ce895cf
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.50.1.el8_10/ce895cf1.failed

	Signed-off-by: Andreas Gruenbacher <[email protected]>
(cherry picked from commit ce895cf)
	Signed-off-by: Jonathan Maple <[email protected]>

# Conflicts:
#	fs/gfs2/super.c
jira LE-2815
Rebuild_History Non-Buildable kernel-4.18.0-553.50.1.el8_10
commit-author Andreas Gruenbacher <[email protected]>
commit 03ff378
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.50.1.el8_10/03ff3781.failed

When function evict_should_delete() returns SHOULD_DEFER_EVICTION, gh is
never initialized, but that isn't obvious; if it did initialize gh and
then return SHOULD_DEFER_EVICTION, gfs2_evict_inode() would fail to
release it.  To clarify the code, change gfs2_evict_inode() to always
check if gh needs to be released, no matter what evict_should_delete()
returns.

	Signed-off-by: Andreas Gruenbacher <[email protected]>
(cherry picked from commit 03ff378)
	Signed-off-by: Jonathan Maple <[email protected]>

# Conflicts:
#	fs/gfs2/super.c
jira LE-2815
Rebuild_History Non-Buildable kernel-4.18.0-553.50.1.el8_10
commit-author Bob Peterson <[email protected]>
commit 865cc3e

Before this patch, gfs2 would deadlock because of the following
sequence during mount:

mount
   gfs2_fill_super
      gfs2_make_fs_rw <--- Detects IO error with glock
         kthread_stop(sdp->sd_quotad_process);
            <--- Blocked waiting for quotad to finish

logd
   Detects IO error and the need to withdraw
   calls gfs2_withdraw
      gfs2_make_fs_ro
         kthread_stop(sdp->sd_quotad_process);
            <--- Blocked waiting for quotad to finish

gfs2_quotad
   gfs2_statfs_sync
      gfs2_glock_wait <---- Blocked waiting for statfs glock to be granted

glock_work_func
   do_xmote <---Detects IO error, can't release glock: blocked on withdraw
      glops->go_inval
      glock_blocked_by_withdraw
         requeue glock work & exit <--- work requeued, blocked by withdraw

This patch makes a special exception for the statfs system inode glock,
which allows the statfs glock UNLOCK to proceed normally. That allows the
quotad daemon to exit during the withdraw, which allows the logd daemon
to exit during the withdraw, which allows the mount to exit.

	Signed-off-by: Bob Peterson <[email protected]>
	Signed-off-by: Andreas Gruenbacher <[email protected]>
(cherry picked from commit 865cc3e)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-2815
Rebuild_History Non-Buildable kernel-4.18.0-553.50.1.el8_10
commit-author Bob Peterson <[email protected]>
commit 8693419
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.50.1.el8_10/86934198.failed

There are a couple places in function do_xmote where normal processing
is circumvented due to withdraws in progress. However, since we bypass
most of do_xmote() we bypass telling dlm to lock the dlm lock, which
means dlm will never respond with a completion callback. Since the
completion callback ordinarily clears GLF_LOCK, this patch changes
function do_xmote to handle those situations more gracefully so the
file system may be unmounted after withdraw.

A very similar situation happens with the GLF_DEMOTE_IN_PROGRESS flag,
which is cleared by function finish_xmote(). Since the withdraw causes
us to skip the majority of do_xmote, it therefore also skips the call
to finish_xmote() so the DEMOTE_IN_PROGRESS flag needs to be cleared
manually.

	Signed-off-by: Bob Peterson <[email protected]>
	Signed-off-by: Andreas Gruenbacher <[email protected]>
(cherry picked from commit 8693419)
	Signed-off-by: Jonathan Maple <[email protected]>

# Conflicts:
#	fs/gfs2/glock.c
jira LE-2815
Rebuild_History Non-Buildable kernel-4.18.0-553.50.1.el8_10
commit-author Andreas Gruenbacher <[email protected]>
commit 9947a06
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.50.1.el8_10/9947a06d.failed

Function do_xmote() is called with the glock spinlock held.  Commit
8693419 added a 'goto skip_inval' statement at the beginning of the
function to further below where the glock spinlock is expected not to be
held anymore.  Then it added code there that requires the glock spinlock
to be held.  This doesn't make sense; fix this up by dropping and
retaking the spinlock where needed.

In addition, when ->lm_lock() returned an error, do_xmote() didn't fail
the locking operation, and simply left the glock hanging; fix that as
well.  (This is a much older error.)

Fixes: 8693419 ("gfs2: Clear flags when withdraw prevents xmote")
	Signed-off-by: Andreas Gruenbacher <[email protected]>
(cherry picked from commit 9947a06)
	Signed-off-by: Jonathan Maple <[email protected]>

# Conflicts:
#	fs/gfs2/glock.c
jira LE-2815
Rebuild_History Non-Buildable kernel-4.18.0-553.50.1.el8_10
commit-author Andreas Gruenbacher <[email protected]>
commit 1e86044
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.50.1.el8_10/1e860444.failed

There are no more callers of gfs2_glock_queue_work() left, so remove
that helper.  With that, we can now rename __gfs2_glock_queue_work()
back to gfs2_glock_queue_work() to get rid of some unnecessary clutter.

	Signed-off-by: Andreas Gruenbacher <[email protected]>
(cherry picked from commit 1e86044)
	Signed-off-by: Jonathan Maple <[email protected]>

# Conflicts:
#	fs/gfs2/glock.c
jira LE-2815
Rebuild_History Non-Buildable kernel-4.18.0-553.50.1.el8_10
commit-author Andreas Gruenbacher <[email protected]>
commit 8bbfde0
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.50.1.el8_10/8bbfde08.failed

Introduce a new GLF_PENDING_REPLY flag to indicate that a reply from DLM
is expected.  Include that flag in glock dumps to show more clearly
what's going on.  (When the GLF_PENDING_REPLY flag is set, the GLF_LOCK
flag will also be set but the GLF_LOCK flag alone isn't sufficient to
tell that we are waiting for a DLM reply.)

	Signed-off-by: Andreas Gruenbacher <[email protected]>
(cherry picked from commit 8bbfde0)
	Signed-off-by: Jonathan Maple <[email protected]>

# Conflicts:
#	fs/gfs2/glock.c
jira LE-2815
Rebuild_History Non-Buildable kernel-4.18.0-553.50.1.el8_10
commit-author Andreas Gruenbacher <[email protected]>
commit 3774f53
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.50.1.el8_10/3774f53d.failed

Having this flag attached to the iopen glock instead of the inode is
much simpler; it eliminates a protential weird race in gfs2_try_evict().

	Signed-off-by: Andreas Gruenbacher <[email protected]>
(cherry picked from commit 3774f53)
	Signed-off-by: Jonathan Maple <[email protected]>

# Conflicts:
#	fs/gfs2/incore.h
jira LE-2815
Rebuild_History Non-Buildable kernel-4.18.0-553.50.1.el8_10
commit-author Andreas Gruenbacher <[email protected]>
commit 0b93bac
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.50.1.el8_10/0b93bac2.failed

The last user of this flag was removed in commit b77b4a4 ("gfs2:
Rework freeze / thaw logic").

	Signed-off-by: Andreas Gruenbacher <[email protected]>
(cherry picked from commit 0b93bac)
	Signed-off-by: Jonathan Maple <[email protected]>

# Conflicts:
#	fs/gfs2/glock.c
jira LE-2815
Rebuild_History Non-Buildable kernel-4.18.0-553.50.1.el8_10
commit-author Su Hui <[email protected]>
commit bb25b97
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.50.1.el8_10/bb25b975.failed

clang static analyzer complains that value stored to 'gh' is never read.
The code of this line is useless after commit 0b93bac
("gfs2: Remove LM_FLAG_PRIORITY flag"). Remove this code to save space.

	Signed-off-by: Su Hui <[email protected]>
	Signed-off-by: Andreas Gruenbacher <[email protected]>
(cherry picked from commit bb25b97)
	Signed-off-by: Jonathan Maple <[email protected]>

# Conflicts:
#	fs/gfs2/glock.c
jira LE-2815
Rebuild_History Non-Buildable kernel-4.18.0-553.50.1.el8_10
commit-author Andreas Gruenbacher <[email protected]>
commit 0360fac
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.50.1.el8_10/0360faca.failed

Remove some more dead code in add_to_queue() that commit 0b93bac
("gfs2: Remove LM_FLAG_PRIORITY flag") has rendered obsolete.  This is a
continuation of commit 3302764610057 ("gfs2: remove dead code in
add_to_queue"); no functional change.

	Signed-off-by: Andreas Gruenbacher <[email protected]>
(cherry picked from commit 0360fac)
	Signed-off-by: Jonathan Maple <[email protected]>

# Conflicts:
#	fs/gfs2/glock.c
jira LE-2815
Rebuild_History Non-Buildable kernel-4.18.0-553.50.1.el8_10
commit-author Andreas Gruenbacher <[email protected]>
commit d838605

In run_queue(), check if the queue of pending requests is empty instead
of blindly assuming that it won't be.

	Signed-off-by: Andreas Gruenbacher <[email protected]>
(cherry picked from commit d838605)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-2815
Rebuild_History Non-Buildable kernel-4.18.0-553.50.1.el8_10
commit-author Andreas Gruenbacher <[email protected]>
commit a431d49
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.50.1.el8_10/a431d492.failed

In finish_xmote(), when a locking request is canceled, the corresponding
holder is moved to the tail of the holders list instead of being
dequeued immediately.  When there is only a single holder, the canceled
locking request is then immediately repeated.  This makes no sense; it
looks like another remnant of LM_FLAG_PRIORITY support.

Instead, dequeue canceled holders and proceed with the next holder in
finish_xmote().  We can then easily detect in gfs2_glock_dq() when a
holder has been canceled.

	Signed-off-by: Andreas Gruenbacher <[email protected]>
(cherry picked from commit a431d49)
	Signed-off-by: Jonathan Maple <[email protected]>

# Conflicts:
#	fs/gfs2/glock.c
jira LE-2815
Rebuild_History Non-Buildable kernel-4.18.0-553.50.1.el8_10
commit-author Andreas Gruenbacher <[email protected]>
commit 6cb3b1c
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.50.1.el8_10/6cb3b1c2.failed

In gfs2_glock_dq(), we must drop the glock spin lock before calling
->lm_cancel, but this means that in the meantime, the operation we are
trying to cancel could complete.  If the operation completes
unsuccessfully, another holder can end up at the head of the queue and
another ->lm_lock operation can get started.  In this case, we would end
up canceling that second operation by accident.

To prevent that, introduce a new GLF_CANCELING flag.  Set that flag in
gfs2_glock_dq() when trying to cancel an operation.  When seeing that
flag, finish_xmote() will then keep the GLF_LOCK flag set to prevent
other glock operations from taking place.  gfs2_glock_dq() then
completes the cancelation attempt by clearing GLF_LOCK and
GLF_CANCELING.

In addition, add a missing GLF_DEMOTE_IN_PROGRESS check in
gfs2_glock_dq() to make sure that we won't accidentally cancel a demote
request.

	Signed-off-by: Andreas Gruenbacher <[email protected]>
(cherry picked from commit 6cb3b1c)
	Signed-off-by: Jonathan Maple <[email protected]>

# Conflicts:
#	fs/gfs2/glock.c
#	fs/gfs2/incore.h
#	fs/gfs2/trace_gfs2.h
jira LE-2815
Rebuild_History Non-Buildable kernel-4.18.0-553.50.1.el8_10
commit-author Andreas Gruenbacher <[email protected]>
commit 9136cad
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.50.1.el8_10/9136cad7.failed

In gfs2_try_evict(), we try grabbing the inode to evict, we try to evict
it, and then we try grabbing it again to see if it still exists.  There
is no guarantee that we will end up with the same inode both times; the
inode validity check that commit ffd1cf0 ("gfs2: Prevent inode
creation race") added to the first grab is actually needed both times.

(To avoid code duplication, add a grab_existing_inode() helper.)

	Signed-off-by: Andreas Gruenbacher <[email protected]>
(cherry picked from commit 9136cad)
	Signed-off-by: Jonathan Maple <[email protected]>

# Conflicts:
#	fs/gfs2/glock.c
jira LE-2815
Rebuild_History Non-Buildable kernel-4.18.0-553.50.1.el8_10
commit-author Andreas Gruenbacher <[email protected]>
commit e9e38ed

In evict_should_delete(), when gfs2_upgrade_iopen_glock() fails, we
detach the iopen glock from the inode without calling
glock_clear_object().  This leads to a warning in glock_set_object()
when the same inode is recreated and the glock is reused.
Fix that by only detaching the iopen glock in gfs2_evict_inode().

In addition, remove the dequeue code from evict_should_delete(); we
already perform a conditional dequeue in gfs2_evict_inode().

	Signed-off-by: Andreas Gruenbacher <[email protected]>
(cherry picked from commit e9e38ed)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-2815
Rebuild_History Non-Buildable kernel-4.18.0-553.50.1.el8_10
commit-author Andreas Gruenbacher <[email protected]>
commit 79fe790

In glock_set_object() and glock_clear_object(), there is no need to
print the glock type and number when we dump the entire glock, anyway.

	Signed-off-by: Andreas Gruenbacher <[email protected]>
(cherry picked from commit 79fe790)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-2815
Rebuild_History Non-Buildable kernel-4.18.0-553.50.1.el8_10
commit-author Andreas Gruenbacher <[email protected]>
commit 41a8e04

In gfs2_evict_inode(), in the unlikely case that we cannot defer
deleting the inode, it is not safe to fall back to deleting the inode;
the only valid choice we have is to skip the delete.

In addition, in evict_should_delete(), if we cannot lock the inode glock
exclusively, we are in a bad enough state that skipping the delete is
likely a better choice than trying to recover from the failure later.

Fixes: c5b7a24 ("gfs2: Only defer deletes when we have an iopen glock")
	Signed-off-by: Andreas Gruenbacher <[email protected]>
(cherry picked from commit 41a8e04)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-2815
cve CVE-2024-53241
Rebuild_History Non-Buildable kernel-4.18.0-553.50.1.el8_10
commit-author Juergen Gross <[email protected]>
commit b1c2cb8
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.50.1.el8_10/b1c2cb86.failed

Call the Xen hypervisor via the new xen_hypercall_func static-call
instead of the hypercall page.

This is part of XSA-466 / CVE-2024-53241.

	Reported-by: Andrew Cooper <[email protected]>
	Signed-off-by: Juergen Gross <[email protected]>
Co-developed-by: Peter Zijlstra <[email protected]>
Co-developed-by: Josh Poimboeuf <[email protected]>
(cherry picked from commit b1c2cb8)
	Signed-off-by: Jonathan Maple <[email protected]>

# Conflicts:
#	arch/x86/include/asm/xen/hypercall.h
jira LE-2815
cve CVE-2024-53241
Rebuild_History Non-Buildable kernel-4.18.0-553.50.1.el8_10
commit-author Juergen Gross <[email protected]>
commit 7fa0da5
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.50.1.el8_10/7fa0da53.failed

The hypercall page is no longer needed. It can be removed, as from the
Xen perspective it is optional.

But, from Linux's perspective, it removes naked RET instructions that
escape the speculative protections that Call Depth Tracking and/or
Untrain Ret are trying to achieve.

This is part of XSA-466 / CVE-2024-53241.

	Reported-by: Andrew Cooper <[email protected]>
	Signed-off-by: Juergen Gross <[email protected]>
	Reviewed-by: Andrew Cooper <[email protected]>
	Reviewed-by: Jan Beulich <[email protected]>
(cherry picked from commit 7fa0da5)
	Signed-off-by: Jonathan Maple <[email protected]>

# Conflicts:
#	arch/x86/include/asm/xen/hypercall.h
#	arch/x86/kernel/callthunks.c
#	arch/x86/kernel/vmlinux.lds.S
#	arch/x86/xen/enlighten.c
#	arch/x86/xen/enlighten_pvh.c
#	arch/x86/xen/xen-head.S
jira LE-2815
Rebuild_History Non-Buildable kernel-4.18.0-553.50.1.el8_10
commit-author Christoph Hellwig <[email protected]>
commit 59cefee

Set BITMAP_WRITE_ERROR directly in write_sb_page instead of propagating
the error to the caller and setting it there.

	Signed-off-by: Christoph Hellwig <[email protected]>
	Reviewed-by: Hannes Reinecke <[email protected]>
	Reviewed-by: Johannes Thumshirn <[email protected]>
	Reviewed-by: Himanshu Madhani <[email protected]>
	Signed-off-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
(cherry picked from commit 59cefee)
	Signed-off-by: Jonathan Maple <[email protected]>
…_unmap

jira LE-2815
Rebuild_History Non-Buildable kernel-4.18.0-553.50.1.el8_10
commit-author Christoph Hellwig <[email protected]>
commit 546ac0b

Just a small tidyup to prepare for bigger changes.

	Signed-off-by: Christoph Hellwig <[email protected]>
	Reviewed-by: Hannes Reinecke <[email protected]>
	Reviewed-by: Johannes Thumshirn <[email protected]>
	Reviewed-by: Himanshu Madhani <[email protected]>
	Signed-off-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
(cherry picked from commit 546ac0b)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-2815
Rebuild_History Non-Buildable kernel-4.18.0-553.50.1.el8_10
commit-author Christoph Hellwig <[email protected]>
commit 9234851

Don't bother allocating an extra buffer in the I/O failure handler and
instead use the printk built-in format to print the last 4 path name
components.

	Signed-off-by: Christoph Hellwig <[email protected]>
	Reviewed-by: Hannes Reinecke <[email protected]>
	Reviewed-by: Johannes Thumshirn <[email protected]>
	Reviewed-by: Himanshu Madhani <[email protected]>
	Signed-off-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
(cherry picked from commit 9234851)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-2815
Rebuild_History Non-Buildable kernel-4.18.0-553.50.1.el8_10
commit-author Ofir Gal <[email protected]>
commit ab99a87

__write_sb_page() rounds up the io size to the optimal io size if it
doesn't exceed the data offset, but it doesn't check the final size
exceeds the bitmap length.

For example:
page count      - 1
page size       - 4K
data offset     - 1M
optimal io size - 256K

The final io size would be 256K (64 pages) but md_bitmap_storage_alloc()
allocated 1 page, the IO would write 1 valid page and 63 pages that
happens to be allocated afterwards. This leaks memory to the raid device
superblock.

This issue caused a data transfer failure in nvme-tcp. The network
drivers checks the first page of an IO with sendpage_ok(), it returns
true if the page isn't a slabpage and refcount >= 1. If the page
!sendpage_ok() the network driver disables MSG_SPLICE_PAGES.

As of now the network layer assumes all the pages of the IO are
sendpage_ok() when MSG_SPLICE_PAGES is on.

The bitmap pages aren't slab pages, the first page of the IO is
sendpage_ok(), but the additional pages that happens to be allocated
after the bitmap pages might be !sendpage_ok(). That cause
skb_splice_from_iter() to stop the data transfer, in the case below it
hangs 'mdadm --create'.

The bug is reproducible, in order to reproduce we need nvme-over-tcp
controllers with optimal IO size bigger than PAGE_SIZE. Creating a raid
with bitmap over those devices reproduces the bug.

In order to simulate large optimal IO size you can use dm-stripe with a
single device.
Script to reproduce the issue on top of brd devices using dm-stripe is
attached below (will be added to blktest).

I have added some logs to test the theory:
...
md: created bitmap (1 pages) for device md127
__write_sb_page before md_super_write offset: 16, size: 262144. pfn: 0x53ee
=== __write_sb_page before md_super_write. logging pages ===
pfn: 0x53ee, slab: 0 <-- the only page that allocated for the bitmap
pfn: 0x53ef, slab: 1
pfn: 0x53f0, slab: 0
pfn: 0x53f1, slab: 0
pfn: 0x53f2, slab: 0
pfn: 0x53f3, slab: 1
...
nvme_tcp: sendpage_ok - pfn: 0x53ee, len: 262144, offset: 0
skbuff: before sendpage_ok() - pfn: 0x53ee
skbuff: before sendpage_ok() - pfn: 0x53ef
WARNING at net/core/skbuff.c:6848 skb_splice_from_iter+0x142/0x450
skbuff: !sendpage_ok - pfn: 0x53ef. is_slab: 1, page_count: 1
...

	Cc: [email protected]
	Reviewed-by: Christoph Hellwig <[email protected]>
	Signed-off-by: Ofir Gal <[email protected]>
	Signed-off-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
(cherry picked from commit ab99a87)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-2815
Rebuild_History Non-Buildable kernel-4.18.0-553.50.1.el8_10
commit-author Gerd Bayer <[email protected]>
commit 2bcae12

Remove the erroneous unmap in case no DMA mapping was established

The multi-packet WQE transmit code attempts to obtain a DMA mapping for
the skb. This could fail, e.g. under memory pressure, when the IOMMU
driver just can't allocate more memory for page tables. While the code
tries to handle this in the path below the err_unmap label it erroneously
unmaps one entry from the sq's FIFO list of active mappings. Since the
current map attempt failed this unmap is removing some random DMA mapping
that might still be required. If the PCI function now presents that IOVA,
the IOMMU may assumes a rogue DMA access and e.g. on s390 puts the PCI
function in error state.

The erroneous behavior was seen in a stress-test environment that created
memory pressure.

Fixes: 5af75c7 ("net/mlx5e: Enhanced TX MPWQE for SKBs")
	Signed-off-by: Gerd Bayer <[email protected]>
	Reviewed-by: Zhu Yanjun <[email protected]>
	Acked-by: Maxim Mikityanskiy <[email protected]>
	Signed-off-by: Saeed Mahameed <[email protected]>
(cherry picked from commit 2bcae12)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-2815
Rebuild_History Non-Buildable kernel-4.18.0-553.50.1.el8_10
commit-author Michal Schmidt <[email protected]>
commit 0e2bddf
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.50.1.el8_10/0e2bddf9.failed

There is a need for synchronization between ice PFs on the same physical
adapter.

Add a "struct ice_adapter" for holding data shared between PFs of the
same multifunction PCI device. The struct is refcounted - each ice_pf
holds a reference to it.

Its first use will be for PTP. I expect it will be useful also to
improve the ugliness that is ice_prot_id_tbl.

	Reviewed-by: Przemek Kitszel <[email protected]>
	Signed-off-by: Michal Schmidt <[email protected]>
	Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
	Signed-off-by: Tony Nguyen <[email protected]>
(cherry picked from commit 0e2bddf)
	Signed-off-by: Jonathan Maple <[email protected]>

# Conflicts:
#	drivers/net/ethernet/intel/ice/Makefile
#	drivers/net/ethernet/intel/ice/ice.h
jira LE-2815
Rebuild_History Non-Buildable kernel-4.18.0-553.50.1.el8_10
commit-author Michal Schmidt <[email protected]>
commit d29a813
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.50.1.el8_10/d29a8134.failed

The PTP hardware semaphore (PFTSYN_SEM) is used to synchronize
operations that program the PTP timers. The operations involve issuing
commands to the sideband queue. The E810 does not have a hardware
sideband queue, so the admin queue is used. The admin queue is slow.
I have observed delays in hundreds of milliseconds waiting for
ice_sq_done.

When phc2sys reads the time from the ice PTP clock and PFTSYN_SEM is
held by a task performing one of the slow operations, ice_ptp_lock can
easily time out. phc2sys gets -EBUSY and the kernel prints:
  ice 0000:XX:YY.0: PTP failed to get time
These messages appear once every few seconds, causing log spam.

The E810 datasheet recommends an algorithm for reading the upper 64 bits
of the GLTSYN_TIME register. It matches what's implemented in
ice_ptp_read_src_clk_reg. It is robust against wrap-around, but not
necessarily against the concurrent setting of the register (with
GLTSYN_CMD_{INIT,ADJ}_TIME commands). Perhaps that's why
ice_ptp_gettimex64 also takes PFTSYN_SEM.

The race with time setters can be prevented without relying on the PTP
hardware semaphore. Using the "ice_adapter" from the previous patch,
we can have a common spinlock for the PFs that share the clock hardware.
It will protect the reading and writing to the GLTSYN_TIME register.
The writing is performed indirectly, by the hardware, as a result of
the driver writing GLTSYN_CMD_SYNC in ice_ptp_exec_tmr_cmd. I wasn't
sure if the ice_flush there is enough to make sure GLTSYN_TIME has been
updated, but it works well in my testing.

My test code can be seen here:
https://gitlab.com/mschmidt2/linux/-/commits/ice-ptp-host-side-lock-10
It consists of:
 - kernel threads reading the time in a busy loop and looking at the
   deltas between consecutive values, reporting new maxima.
 - a shell script that sets the time repeatedly;
 - a bpftrace probe to produce a histogram of the measured deltas.
Without the spinlock ptp_gltsyn_time_lock, it is easy to see tearing.
Deltas in the [2G, 4G) range appear in the histograms.
With the spinlock added, there is no tearing and the biggest delta I saw
was in the range [1M, 2M), that is under 2 ms.

	Reviewed-by: Jacob Keller <[email protected]>
	Reviewed-by: Przemek Kitszel <[email protected]>
	Signed-off-by: Michal Schmidt <[email protected]>
	Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
	Signed-off-by: Tony Nguyen <[email protected]>
(cherry picked from commit d29a813)
	Signed-off-by: Jonathan Maple <[email protected]>

# Conflicts:
#	drivers/net/ethernet/intel/ice/ice_adapter.c
#	drivers/net/ethernet/intel/ice/ice_adapter.h
jira LE-2815
Rebuild_History Non-Buildable kernel-4.18.0-553.50.1.el8_10
commit-author Michal Schmidt <[email protected]>
commit 2211881
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.50.1.el8_10/22118810.failed

This is a cleanup. It is unnecessary to have this function just to call
another function.

	Reviewed-by: Przemek Kitszel <[email protected]>
	Signed-off-by: Michal Schmidt <[email protected]>
	Reviewed-by: Sai Krishna <[email protected]>
	Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
	Reviewed-by: Kalesh AP <[email protected]>
	Signed-off-by: Tony Nguyen <[email protected]>
(cherry picked from commit 2211881)
	Signed-off-by: Jonathan Maple <[email protected]>

# Conflicts:
#	drivers/net/ethernet/intel/ice/ice_ptp.c
…ut payload

jira LE-2815
Rebuild_History Non-Buildable kernel-4.18.0-553.50.1.el8_10
commit-author Long Li <[email protected]>
commit 87c4b5e

In StorVSC, payload->range.len is used to indicate if this SCSI command
carries payload. This data is allocated as part of the private driver data
by the upper layer and may get passed to lower driver uninitialized.

For example, the SCSI error handling mid layer may send TEST_UNIT_READY or
REQUEST_SENSE while reusing the buffer from a failed command. The private
data section may have stale data from the previous command.

If the SCSI command doesn't carry payload, the driver may use this value as
is for communicating with host, resulting in possible corruption.

Fix this by always initializing this value.

Fixes: be0cf6c ("scsi: storvsc: Set the tablesize based on the information given by the host")
	Cc: [email protected]
	Tested-by: Roman Kisel <[email protected]>
	Reviewed-by: Roman Kisel <[email protected]>
	Reviewed-by: Michael Kelley <[email protected]>
	Signed-off-by: Long Li <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
	Signed-off-by: Martin K. Petersen <[email protected]>
(cherry picked from commit 87c4b5e)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-2815
cve CVE-2024-53150
Rebuild_History Non-Buildable kernel-4.18.0-553.50.1.el8_10
commit-author Takashi Iwai <[email protected]>
commit a3dd4d6

The current USB-audio driver code doesn't check bLength of each
descriptor at traversing for clock descriptors.  That is, when a
device provides a bogus descriptor with a shorter bLength, the driver
might hit out-of-bounds reads.

For addressing it, this patch adds sanity checks to the validator
functions for the clock descriptor traversal.  When the descriptor
length is shorter than expected, it's skipped in the loop.

For the clock source and clock multiplier descriptors, we can just
check bLength against the sizeof() of each descriptor type.
OTOH, the clock selector descriptor of UAC2 and UAC3 has an array
of bNrInPins elements and two more fields at its tail, hence those
have to be checked in addition to the sizeof() check.

	Reported-by: Benoît Sevens <[email protected]>
	Cc: <[email protected]>
Link: https://lore.kernel.org/[email protected]
Link: https://patch.msgid.link/[email protected]
	Signed-off-by: Takashi Iwai <[email protected]>
(cherry picked from commit a3dd4d6)
	Signed-off-by: Jonathan Maple <[email protected]>
…rect values in perf_quiet_option()

jira LE-2815
Rebuild_History Non-Buildable kernel-4.18.0-553.50.1.el8_10
commit-author Yang Jihong <[email protected]>
commit 188ac72

When perf uses quiet mode, perf_quiet_option() sets the 'debug_peo_args'
variable to -1, and display_attr() incorrectly determines the value of
'debug_peo_args'.  As a result, unexpected information is displayed.

Before:

  # perf record --quiet -- ls > /dev/null
  ------------------------------------------------------------
  perf_event_attr:
    size                             128
    { sample_period, sample_freq }   4000
    sample_type                      IP|TID|TIME|PERIOD
    read_format                      ID|LOST
    disabled                         1
    inherit                          1
    mmap                             1
    comm                             1
    freq                             1
    enable_on_exec                   1
    task                             1
    precise_ip                       3
    sample_id_all                    1
    exclude_guest                    1
    mmap2                            1
    comm_exec                        1
    ksymbol                          1
    bpf_event                        1
  ------------------------------------------------------------
  ...

After:
  # perf record --quiet -- ls > /dev/null
  #

redirect_to_stderr is a similar problem.

Fixes: f78eaef ("perf tools: Allow to force redirect pr_debug to stderr.")
Fixes: ccd2674 ("perf tool: Provide an option to print perf_event_open args and return value")
	Suggested-by: Adrian Hunter <[email protected]>
	Reviewed-by: Adrian Hunter <[email protected]>
	Signed-off-by: Yang Jihong <[email protected]>
	Cc: Alexander Shishkin <[email protected]>
	Cc: Andi Kleen <[email protected]>
	Cc: Carsten Haitzler <[email protected]>
	Cc: Ian Rogers <[email protected]>
	Cc: Ingo Molnar <[email protected]>
	Cc: Jiri Olsa <[email protected]>
	Cc: Leo Yan <[email protected]>
	Cc: Mark Rutland <[email protected]>
	Cc: [email protected]
	Cc: Masami Hiramatsu <[email protected]>
	Cc: Namhyung Kim <[email protected]>
	Cc: Peter Zijlstra <[email protected]>
	Cc: Ravi Bangoria <[email protected]>
	Cc: Ravi Bangoria <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
	Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
(cherry picked from commit 188ac72)
	Signed-off-by: Jonathan Maple <[email protected]>
Rebuild_History BUILDABLE
Rebuilding Kernel from rpm changelog with Fuzz Limit: 87.50%
Number of commits in upstream range v4.18~1..kernel-mainline: 538898
Number of commits in rpm: 41
Number of commits matched with upstream: 31 (75.61%)
Number of commits in upstream but not in rpm: 538867
Number of commits NOT found in upstream: 10 (24.39%)

Rebuilding Kernel on Branch rocky8_10_rebuild_kernel-4.18.0-553.50.1.el8_10 for kernel-4.18.0-553.50.1.el8_10
Clean Cherry Picks: 13 (41.94%)
Empty Cherry Picks: 18 (58.06%)
_______________________________

Full Details Located here:
ciq/ciq_backports/kernel-4.18.0-553.50.1.el8_10/rebuild.details.txt

Includes:
* git commit header above
* Empty Commits with upstream SHA
* RPM ChangeLog Entries that could not be matched

Individual Empty Commit failures contained in the same containing directory.
The git message for empty commits will have the path for the failed commit.
File names are the first 8 characters of the upstream SHA
Copy link

@thefossguy-ciq thefossguy-ciq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚤

Copy link
Collaborator

@bmastbergen bmastbergen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥌

@PlaidCat PlaidCat merged commit 32fa0f4 into rocky8_10 Apr 23, 2025
2 checks passed
@PlaidCat PlaidCat deleted the rocky8_10_rebuild branch April 23, 2025 14:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants