Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vz on Intel: kernel 6.12 doesn't boot (when CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS is set) #3334

Open
AkihiroSuda opened this issue Mar 11, 2025 · 5 comments

Comments

@AkihiroSuda
Copy link
Member

AkihiroSuda commented Mar 11, 2025

Linux kernel v6.12 does not boot with vz on Intel.

No error message is printed on video nor on serial.

✅ Boots

❌ Doesn't boot (still boots with QEMU)

The issue might be a regression in kernel 6.12, although Alpine still boots with kernel 6.12.

Note

Update: This turned out to be a regression in torvalds/linux@70044df250d0 x86/pkeys: Update PKRU to enable all pkeys before XSAVE.

Alpine is not affected because Alpine does not use CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS.

Test environment:

  • Lima v1.0.6
  • macOS 13.5.1
  • MacBook Pro 13-inch, 2020, Four Thunderbolt 3 ports (Intel(R) Core(TM) i7-1068NG7 CPU @ 2.30GHz)

Thanks to @trodemaster for originally reporting this issue.
https://cloud-native.slack.com/archives/C043N6ZFV9S/p1741464872994929


Workaround 1: use QEMU

limactl create --vm-type=qemu ...

Workaround 2: append nopku to the kernel cmdline

images:
- location: "https://cloud-images.ubuntu.com/plucky/20250310/plucky-server-cloudimg-amd64.img"
  arch: "x86_64"
  kernel:
    location: https://cloud-images.ubuntu.com/plucky/20250310/unpacked/plucky-server-cloudimg-amd64-vmlinuz-generic
    digest: sha256:547c4316eadc8e46b043b5658fd4c08d62b3522c07a7eb94692e1b7d8827bf52
    cmdline: root=LABEL=cloudimg-rootfs ro console=tty1 console=ttyAMA0 nopku
  initrd:
    location: https://cloud-images.ubuntu.com/plucky/20250310/unpacked/plucky-server-cloudimg-amd64-initrd-generic
    digest: sha256:2842bb61052f77c1f1301394c5db215e3d89696b122accd42b7a7b30ae0c64d4
@jandubois
Copy link
Member

Maybe related: containers/podman#25121

@AkihiroSuda
Copy link
Member Author

AkihiroSuda commented Mar 11, 2025

Got UD2 (0F 0B) exception, by directly specifying vmlinuz and initd in Apple's LinuxVirtualMachine example

WARNING: CPU: 0 PID: 1 at arch/x86/kernel/fpu/xstate.c:1009 get_xsave_addr_user+0x4d/0x80
[...]
? exc_invalid_op+0x18/0x80
[...]

$ LinuxVirtualMachine plucky-server-cloudimg-amd64-vmlinuz-generic plucky-server-cloudimg-amd64-initrd-generic 
[    0.593036] Linux agpgart interface v0.103
[    0.596341] loop: module loaded
[    0.596739] ACPI: bus type drm_connector registered
[    0.597164] tun: Universal TUN/TAP device driver, 1.6
[    0.597354] PPP generic driver version 2.4.2
[    0.597758] i8042: PNP: No PS/2 controller found.
[    0.597969] mousedev: PS/2 mouse device common for all mice
[    0.598153] rtc_cmos 00:01: RTC can wake from S4
[    0.599346] rtc_cmos 00:01: registered as rtc0
[    0.599652] rtc_cmos 00:01: setting system clock to 2025-03-11T07:28:46 UTC (1741678126)
[    0.599832] rtc_cmos 00:01: alarms up to one day, y3k, 114 bytes nvram, hpet irqs
[    0.599982] i2c_dev: i2c /dev entries driver
[    0.600079] device-mapper: core: CONFIG_IMA_DISABLE_HTABLE is disabled. Duplicate IMA measurements will not be recorded in the IMA log.
[    0.600199] device-mapper: uevent: version 1.0.3
[    0.600312] device-mapper: ioctl: 4.48.0-ioctl (2023-03-01) initialised: [email protected]
[    0.600418] platform eisa.0: Probing EISA bus 0
[    0.600599] platform eisa.0: EISA: Detected 0 cards
[    0.600681] intel_pstate: CPU model not supported
[    0.600799] drop_monitor: Initializing network drop monitor service
[    0.601029] NET: Registered PF_INET6 protocol family
[    0.793259] Freeing initrd memory: 33876K
[    0.804204] Segment Routing with IPv6
[    0.804448] In-situ OAM (IOAM) with IPv6
[    0.804714] NET: Registered PF_PACKET protocol family
[    0.805259] Key type dns_resolver registered
[    0.806629] IPI shorthand broadcast: enabled
[    0.808020] sched_clock: Marking stable (813718500, -5989395)->(934071025, -126341920)
[    0.808645] registered taskstats version 1
[    0.809224] Loading compiled-in X.509 certificates
[    0.810408] Loaded X.509 cert 'Build time autogenerated kernel key: 9d1a7c7ef258b10187d72d1f2b0105f3bd19cdfa'
[    0.811090] Loaded X.509 cert 'Canonical Ltd. Live Patch Signing: 14df34d1a87cf37625abec039ef2bf521249b969'
[    0.811722] Loaded X.509 cert 'Canonical Ltd. Kernel Module Signing: 88f752e560a1e0737e31163a466ad7b70a850c19'
[    0.811954] blacklist: Loading compiled-in revocation X.509 certificates
[    0.812063] Loaded X.509 cert 'Canonical Ltd. Secure Boot Signing: 61482aa2830d0ab2ad5af10b7250da9033ddcef0'
[    0.812266] Loaded X.509 cert 'Canonical Ltd. Secure Boot Signing (2017): 242ade75ac4a15e50d50c84b0d45ff3eae707a03'
[    0.812592] Loaded X.509 cert 'Canonical Ltd. Secure Boot Signing (ESM 2018): 365188c1d374d6b07c3c8f240f8ef722433d6a8b'
[    0.812886] Loaded X.509 cert 'Canonical Ltd. Secure Boot Signing (2019): c0746fd6c5da3ae827864651ad66ae47fe24b3e8'
[    0.813196] Loaded X.509 cert 'Canonical Ltd. Secure Boot Signing (2021 v1): a8d54bbb3825cfb94fa13c9f8a594a195c107b8d'
[    0.813297] Loaded X.509 cert 'Canonical Ltd. Secure Boot Signing (2021 v2): 4cf046892d6fd3c9a5b03f98d845f90851dc6a8c'
[    0.813485] Loaded X.509 cert 'Canonical Ltd. Secure Boot Signing (2021 v3): 100437bb6de6e469b581e61cd66bce3ef4ed53af'
[    0.813772] Loaded X.509 cert 'Canonical Ltd. Secure Boot Signing (Ubuntu Core 2019): c1d57b8f6b743f23ee41f4f7ee292f06eecadfb9'
[    0.819505] Demotion targets for Node 0: null
[    0.820018] Key type .fscrypt registered
[    0.820114] Key type fscrypt-provisioning registered
[    0.829247] cryptd: max_cpu_qlen set to 1000
[    0.835219] AES CTR mode by8 optimization enabled
[    0.855439] Key type encrypted registered
[    0.855774] AppArmor: AppArmor sha256 policy hashing enabled
[    0.855902] ima: No TPM chip found, activating TPM-bypass!
[    0.855956] Loading compiled-in module X.509 certificates
[    0.856505] Loaded X.509 cert 'Build time autogenerated kernel key: 9d1a7c7ef258b10187d72d1f2b0105f3bd19cdfa'
[    0.856637] ima: Allocated hash algorithm: sha256
[    0.856824] ima: No architecture policies found
[    0.856883] evm: Initialising EVM extended attributes:
[    0.856977] evm: security.selinux
[    0.857047] evm: security.SMACK64
[    0.857090] evm: security.SMACK64EXEC
[    0.857161] evm: security.SMACK64TRANSMUTE
[    0.857205] evm: security.SMACK64MMAP
[    0.857291] evm: security.apparmor
[    0.857344] evm: security.ima
[    0.857379] evm: security.capability
[    0.857514] evm: HMAC attrs: 0x1
[    0.858038] PM:   Magic number: 1:474:470
[    0.858387] BIOS EDD facility v0.16 2004-Jun-25, 16 devices found
[    0.861648] RAS: Correctable Errors collector initialized.
[    0.876182] clk: Disabling unused clocks
[    0.876378] PM: genpd: Disabling unused power domains
[    0.883290] Freeing unused decrypted memory: 2028K
[    0.884321] Freeing unused kernel image (initmem) memory: 5044K
[    0.884663] Write protecting the kernel read-only data: 38912k
[    0.885596] Freeing unused kernel image (rodata/data gap) memory: 1024K
[    0.930404] x86/mm: Checked W+X mappings: passed, no W+X pages found.
[    0.930592] x86/mm: Checking user space page tables
[    0.973558] x86/mm: Checked W+X mappings: passed, no W+X pages found.
[    0.973802] Run /init as init process
[    0.975586] ------------[ cut here ]------------
[    0.975797] WARNING: CPU: 0 PID: 1 at arch/x86/kernel/fpu/xstate.c:1009 get_xsave_addr_user+0x4d/0x80
[    0.976024] Modules linked in: aesni_intel crypto_simd cryptd
[    0.976232] CPU: 0 UID: 0 PID: 1 Comm: init Not tainted 6.12.0-16-generic #16-Ubuntu
[    0.976471] RIP: 0010:get_xsave_addr_user+0x4d/0x80
[    0.976652] Code: 48 23 15 5e e9 d6 01 74 21 48 63 c9 48 83 f9 13 73 2a 8b 14 8d c0 09 be b1 c9 48 01 d0 31 d2 31 c9 31 f6 31 ff c3 cc cc cc cc <0f> 0b c9 31 c0 31 d2 31 c9 31 f6 31 ff c3 cc cc cc cc 48 89 ce 48
[    0.976863] RSP: 0018:ffffbeb58000f968 EFLAGS: 00010246
[    0.977037] RAX: 00007ffe13b5a700 RBX: 0000000000000000 RCX: 0000000000000009
[    0.977203] RDX: 0000000000000000 RSI: 0000000000000009 RDI: 00007ffe13b5a700
[    0.977444] RBP: ffffbeb58000f978 R08: 0000000000000000 R09: 0000000000000000
[    0.977574] R10: 0000000000000000 R11: 0000000000000000 R12: 00007ffe13b5a700
[    0.977715] R13: ffff9b12813a5640 R14: ffff9b12813a3000 R15: 00000000000000e7
[    0.977809] FS:  00007f8943fb9740(0000) GS:ffff9b12fdc00000(0000) knlGS:0000000000000000
[    0.977906] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.978052] CR2: 00007ffe13b5a900 CR3: 0000000001180002 CR4: 0000000000770ef0
[    0.978172] PKRU: 55555554
[    0.978220] Call Trace:
[    0.978264]  <TASK>
[    0.978357]  ? show_trace_log_lvl+0x1be/0x310
[    0.978427]  ? show_trace_log_lvl+0x1be/0x310
[    0.978494]  ? copy_fpstate_to_sigframe+0x209/0x2d0
[    0.978569]  ? show_regs.part.0+0x22/0x30
[    0.978632]  ? show_regs.cold+0x8/0x10
[    0.978690]  ? get_xsave_addr_user+0x4d/0x80
[    0.978753]  ? __warn.cold+0xac/0x10c
[    0.978816]  ? get_xsave_addr_user+0x4d/0x80
[    0.978888]  ? report_bug+0x114/0x160
[    0.978959]  ? handle_bug+0x6e/0xb0
[    0.979036]  ? exc_invalid_op+0x18/0x80
[    0.979107]  ? asm_exc_invalid_op+0x1b/0x20
[    0.979179]  ? get_xsave_addr_user+0x4d/0x80
[    0.979254]  copy_fpstate_to_sigframe+0x209/0x2d0
[    0.979341]  get_sigframe+0xf8/0x2a0
[    0.979406]  x64_setup_rt_frame+0x6b/0x330
[    0.979464]  handle_signal+0x10c/0x170
[    0.979555]  arch_do_signal_or_restart+0xb6/0x110
[    0.979616]  syscall_exit_to_user_mode+0x146/0x1d0
[    0.979668]  do_syscall_64+0x8a/0x170
[    0.979729]  ? __alloc_pages_noprof+0x163/0x340
[    0.979810]  ? native_flush_tlb_one_user+0x8f/0xa0
[    0.979885]  ? __mod_memcg_lruvec_state+0xec/0x210
[    0.979964]  ? __lruvec_stat_mod_folio+0x8b/0xf0
[    0.980026]  ? flush_tlb_mm_range+0x16b/0x1d0
[    0.980088]  ? set_ptes.isra.0+0x48/0xb0
[    0.980150]  ? wp_page_copy+0x435/0x670
[    0.980219]  ? do_wp_page+0xd1/0x580
[    0.980280]  ? handle_pte_fault+0x1ca/0x1d0
[    0.980340]  ? __handle_mm_fault+0x3d2/0x7a0
[    0.980409]  ? __count_memcg_events+0x85/0x160
[    0.980471]  ? count_memcg_events.constprop.0+0x2a/0x50
[    0.980531]  ? handle_mm_fault+0x1bb/0x2d0
[    0.980588]  ? do_user_addr_fault+0x5e9/0x7e0
[    0.980665]  ? arch_exit_to_user_mode_prepare.isra.0+0x22/0xd0
[    0.980758]  ? irqentry_exit_to_user_mode+0x2d/0x1d0
[    0.980845]  ? irqentry_exit+0x43/0x50
[    0.980913]  ? exc_page_fault+0x96/0x1c0
[    0.980971]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[    0.981071] RIP: 0033:0x7f894405b046
[    0.981146] Code: 00 00 48 8b 15 b3 1d 17 00 64 89 02 48 c7 c2 ff ff ff ff 48 8b 5d f8 c9 48 89 d0 c3 0f 1f 84 00 00 00 00 00 48 8b 45 10 0f 05 <48> 63 d0 3d 00 f0 ff ff 77 10 48 8b 5d f8 48 89 d0 c9 c3 0f 1f 80
[    0.981251] RSP: 002b:00007ffe13b5b240 EFLAGS: 00000202 ORIG_RAX: 000000000000003d
[    0.981333] RAX: 0000000000000054 RBX: 00007f8943fb9740 RCX: 00007f894405b046
[    0.981411] RDX: 0000000000000000 RSI: 00007ffe13b5b29c RDI: ffffffffffffffff
[    0.981491] RBP: 00007ffe13b5b250 R08: 0000000000000000 R09: 0000000000000000
[    0.981581] R10: 0000000000000000 R11: 0000000000000202 R12: 00007ffe13b5b2a0
[    0.981655] R13: 000061fbbeded2a0 R14: 000061fbbedee600 R15: 0000000000000001
[    0.981719]  </TASK>
[    0.981771] ---[ end trace 0000000000000000 ]---

Ubuntu 24.10's pair of vmlinuz and initramfs still gives a busybox shell.

For the kernel builds in https://kernel.ubuntu.com/mainline/ , it seems that v6.12-rc1 is the first build that introduced this regression. v6.11.11 boots.

Still not sure why Alpine's kernel v6.12 works. They don't seem to have a relevant patch in their tree. Probably their kernel config avoids the relevant code path.

@AkihiroSuda
Copy link
Member Author

AkihiroSuda commented Mar 11, 2025

Regression in torvalds/linux@70044df250d0 x86/pkeys: Update PKRU to enable all pkeys before XSAVE

Not sure if this commit should be reverted, or should be fixed on Apple's side.

Alpine is unaffected because Alpine does not set CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS=y.


Reported to Apple and Linux regressions list

@AkihiroSuda AkihiroSuda changed the title vz on Intel: kernel 6.12 doesn't boot (except Alpine?) vz on Intel: kernel 6.12 doesn't boot (when CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS is set) Mar 11, 2025
@AkihiroSuda
Copy link
Member Author

Workaround: append nopku to the kernel cmdline

images:
- location: "https://cloud-images.ubuntu.com/plucky/20250310/plucky-server-cloudimg-amd64.img"
  arch: "x86_64"
  kernel:
    location: https://cloud-images.ubuntu.com/plucky/20250310/unpacked/plucky-server-cloudimg-amd64-vmlinuz-generic
    digest: sha256:547c4316eadc8e46b043b5658fd4c08d62b3522c07a7eb94692e1b7d8827bf52
    cmdline: root=LABEL=cloudimg-rootfs ro console=tty1 console=ttyAMA0 nopku
  initrd:
    location: https://cloud-images.ubuntu.com/plucky/20250310/unpacked/plucky-server-cloudimg-amd64-initrd-generic
    digest: sha256:2842bb61052f77c1f1301394c5db215e3d89696b122accd42b7a7b30ae0c64d4

intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this issue Mar 12, 2025
OSPKE seems broken on Apple Virtualization.

  WARNING: CPU: 0 PID: 1 at arch/x86/kernel/fpu/xstate.c:1003 get_xsave_addr_user+0x28/0x40
  (...)
  Call Trace:
   <TASK>
   ? get_xsave_addr_user+0x28/0x40
   ? __warn.cold+0x8e/0xea
   ? get_xsave_addr_user+0x28/0x40
   ? report_bug+0xff/0x140
   ? handle_bug+0x3b/0x70
   ? exc_invalid_op+0x17/0x70
   ? asm_exc_invalid_op+0x1a/0x20
   ? get_xsave_addr_user+0x28/0x40
   copy_fpstate_to_sigframe+0x1be/0x380
   ? __put_user_8+0x11/0x20
   get_sigframe+0xf1/0x280
   x64_setup_rt_frame+0x67/0x2c0
   arch_do_signal_or_restart+0x1b3/0x240
   syscall_exit_to_user_mode+0xb0/0x130
   do_syscall_64+0xab/0x1a0
   entry_SYSCALL_64_after_hwframe+0x77/0x7f

Tested on macOS 13.5.1 running on MacBook Pro 2020 with
Intel(R) Core(TM) i7-1068NG7 CPU @ 2.30GHz.

Fixes: 70044df ("x86/pkeys: Update PKRU to enable all pkeys before XSAVE")
Link: https://lore.kernel.org/regressions/CAG8fp8QvH71Wi_y7b7tgFp7knK38rfrF7rRHh-gFKqeS0gxY6Q@mail.gmail.com/T/#u
Link: lima-vm/lima#3334
Signed-off-by: Akihiro Suda <[email protected]>
@trodemaster
Copy link

I have confirmed that the workaround is functional! TIL that you can specify the kernel command line this way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants