aboutsummaryrefslogtreecommitdiffstats
path: root/arch/x86
AgeCommit message (Collapse)Author
2024-04-10x86/cpufeatures: Add CPUID_LNX_5 to track recently added Linux-defined wordSean Christopherson
commit 8cb4a9a82b21623dbb4b3051dd30d98356cf95bc upstream. Add CPUID_LNX_5 to track cpufeatures' word 21, and add the appropriate compile-time assert in KVM to prevent direct lookups on the features in CPUID_LNX_5. KVM uses X86_FEATURE_* flags to manage guest CPUID, and so must translate features that are scattered by Linux from the Linux-defined bit to the hardware-defined bit, i.e. should never try to directly access scattered features in guest CPUID. Opportunistically add NR_CPUID_WORDS to enum cpuid_leafs, along with a compile-time assert in KVM's CPUID infrastructure to ensure that future additions update cpuid_leafs along with NCAPINTS. No functional change intended. Fixes: 7f274e609f3d ("x86/cpufeatures: Add new word for scattered features") Cc: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-10x86/cpufeatures: Add new word for scattered featuresSandipan Das
commit 7f274e609f3d5f45c22b1dd59053f6764458b492 upstream. Add a new word for scattered features because all free bits among the existing Linux-defined auxiliary flags have been exhausted. Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/8380d2a0da469a1f0ad75b8954a79fb689599ff6.1711091584.git.sandipan.das@amd.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-10x86/cpu: Enable STIBP on AMD if Automatic IBRS is enabledKim Phillips
commit fd470a8beed88440b160d690344fbae05a0b9b1b upstream. Unlike Intel's Enhanced IBRS feature, AMD's Automatic IBRS does not provide protection to processes running at CPL3/user mode, see section "Extended Feature Enable Register (EFER)" in the APM v2 at https://bugzilla.kernel.org/attachment.cgi?id=304652 Explicitly enable STIBP to protect against cross-thread CPL3 branch target injections on systems with Automatic IBRS enabled. Also update the relevant documentation. Fixes: e7862eda309e ("x86/cpu: Support AMD Automatic IBRS") Reported-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230720194727.67022-1-kim.phillips@amd.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-10x86/static_call: Add support for Jcc tail-callsPeter Zijlstra
commit 923510c88d2b7d947c4217835fd9ca6bd65cc56c upstream. Clang likes to create conditional tail calls like: 0000000000000350 <amd_pmu_add_event>: 350: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 351: R_X86_64_NONE __fentry__-0x4 355: 48 83 bf 20 01 00 00 00 cmpq $0x0,0x120(%rdi) 35d: 0f 85 00 00 00 00 jne 363 <amd_pmu_add_event+0x13> 35f: R_X86_64_PLT32 __SCT__amd_pmu_branch_add-0x4 363: e9 00 00 00 00 jmp 368 <amd_pmu_add_event+0x18> 364: R_X86_64_PLT32 __x86_return_thunk-0x4 Where 0x35d is a static call site that's turned into a conditional tail-call using the Jcc class of instructions. Teach the in-line static call text patching about this. Notably, since there is no conditional-ret, in that case patch the Jcc to point at an empty stub function that does the ret -- or the return thunk when needed. Reported-by: "Erhard F." <erhard_f@mailbox.org> Change-Id: I99c8fc3f721e5d1c74f06710b38d4bac5230303a Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Link: https://lore.kernel.org/r/Y9Kdg9QjHkr9G5b5@hirez.programming.kicks-ass.net [cascardo: __static_call_validate didn't have the bool tramp argument] Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-10x86/alternatives: Teach text_poke_bp() to patch Jcc.d32 instructionsPeter Zijlstra
commit ac0ee0a9560c97fa5fe1409e450c2425d4ebd17a upstream. In order to re-write Jcc.d32 instructions text_poke_bp() needs to be taught about them. The biggest hurdle is that the whole machinery is currently made for 5 byte instructions and extending this would grow struct text_poke_loc which is currently a nice 16 bytes and used in an array. However, since text_poke_loc contains a full copy of the (s32) displacement, it is possible to map the Jcc.d32 2 byte opcodes to Jcc.d8 1 byte opcode for the int3 emulation. This then leaves the replacement bytes; fudge that by only storing the last 5 bytes and adding the rule that 'length == 6' instruction will be prefixed with a 0x0f byte. Change-Id: Ie3f72c6b92f865d287c8940e5a87e59d41cfaa27 Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20230123210607.115718513@infradead.org [cascardo: there is no emit_call_track_retpoline] Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-10x86/alternatives: Introduce int3_emulate_jcc()Peter Zijlstra
commit db7adcfd1cec4e95155e37bc066fddab302c6340 upstream. Move the kprobe Jcc emulation into int3_emulate_jcc() so it can be used by more code -- specifically static_call() will need this. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20230123210607.057678245@infradead.org Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-10x86/asm: Differentiate between code and function alignmentThomas Gleixner
commit 8eb5d34e77c63fde8af21c691bcf6e3cd87f7829 upstream. Create SYM_F_ALIGN to differentiate alignment requirements between SYM_CODE and SYM_FUNC. This distinction is useful later when adding padding in front of functions; IOW this allows following the compiler's patchable-function-entry option. [peterz: Changelog] Change-Id: I4f9bc0507e5c3fdb3e0839806989efc305e0a758 Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220915111143.824822743@infradead.org [cascardo: adjust for missing commit c4691712b546 ("x86/linkage: Add ENDBR to SYM_FUNC_START*()")] Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-10arch: Introduce CONFIG_FUNCTION_ALIGNMENTPeter Zijlstra
commit d49a0626216b95cd4bf696f6acf55f39a16ab0bb upstream. Generic function-alignment infrastructure. Architectures can select FUNCTION_ALIGNMENT_xxB symbols; the FUNCTION_ALIGNMENT symbol is then set to the largest such selected size, 0 otherwise. >From this the -falign-functions compiler argument and __ALIGN macro are set. This incorporates the DEBUG_FORCE_FUNCTION_ALIGN_64B knob and future alignment requirements for x86_64 (later in this series) into a single place. NOTE: also removes the 0x90 filler byte from the generic __ALIGN primitive, that value makes no sense outside of x86. NOTE: .balign 0 reverts to a no-op. Requested-by: Linus Torvalds <torvalds@linux-foundation.org> Change-Id: I053b3c408d56988381feb8c8bdb5e27ea221755f Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220915111143.719248727@infradead.org [cascardo: adjust context at arch/x86/Kconfig] Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-10KVM/x86: Export RFDS_NO and RFDS_CLEAR to guestsPawan Gupta
commit 2a0180129d726a4b953232175857d442651b55a0 upstream. Mitigation for RFDS requires RFDS_CLEAR capability which is enumerated by MSR_IA32_ARCH_CAPABILITIES bit 27. If the host has it set, export it to guests so that they can deploy the mitigation. RFDS_NO indicates that the system is not vulnerable to RFDS, export it to guests so that they don't deploy the mitigation unnecessarily. When the host is not affected by X86_BUG_RFDS, but has RFDS_NO=0, synthesize RFDS_NO to the guest. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-10x86/rfds: Mitigate Register File Data Sampling (RFDS)Pawan Gupta
commit 8076fcde016c9c0e0660543e67bff86cb48a7c9c upstream. RFDS is a CPU vulnerability that may allow userspace to infer kernel stale data previously used in floating point registers, vector registers and integer registers. RFDS only affects certain Intel Atom processors. Intel released a microcode update that uses VERW instruction to clear the affected CPU buffers. Unlike MDS, none of the affected cores support SMT. Add RFDS bug infrastructure and enable the VERW based mitigation by default, that clears the affected buffers just before exiting to userspace. Also add sysfs reporting and cmdline parameter "reg_file_data_sampling" to control the mitigation. For details see: Documentation/admin-guide/hw-vuln/reg-file-data-sampling.rst [ pawan: - Resolved conflicts in sysfs reporting. - s/ATOM_GRACEMONT/ALDERLAKE_N/ATOM_GRACEMONT is called ALDERLAKE_N in 6.6. ] Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-10x86/mmio: Disable KVM mitigation when X86_FEATURE_CLEAR_CPU_BUF is setPawan Gupta
commit e95df4ec0c0c9791941f112db699fae794b9862a upstream. Currently MMIO Stale Data mitigation for CPUs not affected by MDS/TAA is to only deploy VERW at VMentry by enabling mmio_stale_data_clear static branch. No mitigation is needed for kernel->user transitions. If such CPUs are also affected by RFDS, its mitigation may set X86_FEATURE_CLEAR_CPU_BUF to deploy VERW at kernel->user and VMentry. This could result in duplicate VERW at VMentry. Fix this by disabling mmio_stale_data_clear static branch when X86_FEATURE_CLEAR_CPU_BUF is enabled. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-10KVM/VMX: Move VERW closer to VMentry for MDS mitigationPawan Gupta
commit 43fb862de8f628c5db5e96831c915b9aebf62d33 upstream. During VMentry VERW is executed to mitigate MDS. After VERW, any memory access like register push onto stack may put host data in MDS affected CPU buffers. A guest can then use MDS to sample host data. Although likelihood of secrets surviving in registers at current VERW callsite is less, but it can't be ruled out. Harden the MDS mitigation by moving the VERW mitigation late in VMentry path. Note that VERW for MMIO Stale Data mitigation is unchanged because of the complexity of per-guest conditional VERW which is not easy to handle that late in asm with no GPRs available. If the CPU is also affected by MDS, VERW is unconditionally executed late in asm regardless of guest having MMIO access. [ pawan: conflict resolved in backport ] Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/all/20240213-delay-verw-v8-6-a6216d83edb7%40linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-10KVM/VMX: Use BT+JNC, i.e. EFLAGS.CF to select VMRESUME vs. VMLAUNCHSean Christopherson
commit 706a189dcf74d3b3f955e9384785e726ed6c7c80 upstream. Use EFLAGS.CF instead of EFLAGS.ZF to track whether to use VMRESUME versus VMLAUNCH. Freeing up EFLAGS.ZF will allow doing VERW, which clobbers ZF, for MDS mitigations as late as possible without needing to duplicate VERW for both paths. [ pawan: resolved merge conflict in __vmx_vcpu_run in backport. ] Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Link: https://lore.kernel.org/all/20240213-delay-verw-v8-5-a6216d83edb7%40linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-10x86/bugs: Use ALTERNATIVE() instead of mds_user_clear static keyPawan Gupta
commit 6613d82e617dd7eb8b0c40b2fe3acea655b1d611 upstream. The VERW mitigation at exit-to-user is enabled via a static branch mds_user_clear. This static branch is never toggled after boot, and can be safely replaced with an ALTERNATIVE() which is convenient to use in asm. Switch to ALTERNATIVE() to use the VERW mitigation late in exit-to-user path. Also remove the now redundant VERW in exc_nmi() and arch_exit_to_user_mode(). Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20240213-delay-verw-v8-4-a6216d83edb7%40linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-10x86/entry_32: Add VERW just before userspace transitionPawan Gupta
commit a0e2dab44d22b913b4c228c8b52b2a104434b0b3 upstream. As done for entry_64, add support for executing VERW late in exit to user path for 32-bit mode. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20240213-delay-verw-v8-3-a6216d83edb7%40linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-10x86/entry_64: Add VERW just before userspace transitionPawan Gupta
commit 3c7501722e6b31a6e56edd23cea5e77dbb9ffd1a upstream. Mitigation for MDS is to use VERW instruction to clear any secrets in CPU Buffers. Any memory accesses after VERW execution can still remain in CPU buffers. It is safer to execute VERW late in return to user path to minimize the window in which kernel data can end up in CPU buffers. There are not many kernel secrets to be had after SWITCH_TO_USER_CR3. Add support for deploying VERW mitigation after user register state is restored. This helps minimize the chances of kernel data ending up into CPU buffers after executing VERW. Note that the mitigation at the new location is not yet enabled. Corner case not handled ======================= Interrupts returning to kernel don't clear CPUs buffers since the exit-to-user path is expected to do that anyways. But, there could be a case when an NMI is generated in kernel after the exit-to-user path has cleared the buffers. This case is not handled and NMI returning to kernel don't clear CPU buffers because: 1. It is rare to get an NMI after VERW, but before returning to user. 2. For an unprivileged user, there is no known way to make that NMI less rare or target it. 3. It would take a large number of these precisely-timed NMIs to mount an actual attack. There's presumably not enough bandwidth. 4. The NMI in question occurs after a VERW, i.e. when user state is restored and most interesting data is already scrubbed. Whats left is only the data that NMI touches, and that may or may not be of any interest. [ pawan: resolved conflict for hunk swapgs_restore_regs_and_return_to_usermode ] Suggested-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20240213-delay-verw-v8-2-a6216d83edb7%40linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-10x86/bugs: Add asm helpers for executing VERWPawan Gupta
commit baf8361e54550a48a7087b603313ad013cc13386 upstream. MDS mitigation requires clearing the CPU buffers before returning to user. This needs to be done late in the exit-to-user path. Current location of VERW leaves a possibility of kernel data ending up in CPU buffers for memory accesses done after VERW such as: 1. Kernel data accessed by an NMI between VERW and return-to-user can remain in CPU buffers since NMI returning to kernel does not execute VERW to clear CPU buffers. 2. Alyssa reported that after VERW is executed, CONFIG_GCC_PLUGIN_STACKLEAK=y scrubs the stack used by a system call. Memory accesses during stack scrubbing can move kernel stack contents into CPU buffers. 3. When caller saved registers are restored after a return from function executing VERW, the kernel stack accesses can remain in CPU buffers(since they occur after VERW). To fix this VERW needs to be moved very late in exit-to-user path. In preparation for moving VERW to entry/exit asm code, create macros that can be used in asm. Also make VERW patching depend on a new feature flag X86_FEATURE_CLEAR_CPU_BUF. [pawan: - Runtime patch jmp instead of verw in macro CLEAR_CPU_BUFFERS due to lack of relative addressing support for relocations in kernels < v6.5. - Add UNWIND_HINT_EMPTY to avoid warning: arch/x86/entry/entry.o: warning: objtool: mds_verw_sel+0x0: unreachable instruction] Reported-by: Alyssa Milburn <alyssa.milburn@intel.com> Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20240213-delay-verw-v8-1-a6216d83edb7%40linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-10x86/asm: Add _ASM_RIP() macro for x86-64 (%rip) suffixH. Peter Anvin (Intel)
commit f87bc8dc7a7c438c70f97b4e51c76a183313272e upstream. Add a macro _ASM_RIP() to add a (%rip) suffix on 64 bits only. This is useful for immediate memory references where one doesn't want gcc to possibly use a register indirection as it may in the case of an "m" constraint. [ pawan: resolved merged conflict for __ASM_REGPFX ] Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Link: https://lkml.kernel.org/r/20210910195910.2542662-3-hpa@zytor.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-10KVM: SVM: Flush pages under kvm->lock to fix UAF in svm_register_enc_region()Sean Christopherson
commit 5ef1d8c1ddbf696e47b226e11888eaf8d9e8e807 upstream. Do the cache flush of converted pages in svm_register_enc_region() before dropping kvm->lock to fix use-after-free issues where region and/or its array of pages could be freed by a different task, e.g. if userspace has __unregister_enc_region_locked() already queued up for the region. Note, the "obvious" alternative of using local variables doesn't fully resolve the bug, as region->pages is also dynamically allocated. I.e. the region structure itself would be fine, but region->pages could be freed. Flushing multiple pages under kvm->lock is unfortunate, but the entire flow is a rare slow path, and the manual flush is only needed on CPUs that lack coherency for encrypted memory. Fixes: 19a23da53932 ("Fix unsynchronized access to sev members through svm_register_enc_region") Reported-by: Gabe Kirkpatrick <gkirkpatrick@google.com> Cc: Josh Eads <josheads@google.com> Cc: Peter Gonda <pgonda@google.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20240217013430.2079561-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-10x86/pm: Work around false positive kmemleak report in msr_build_context()Anton Altaparmakov
[ Upstream commit e3f269ed0accbb22aa8f25d2daffa23c3fccd407 ] Since: 7ee18d677989 ("x86/power: Make restore_processor_context() sane") kmemleak reports this issue: unreferenced object 0xf68241e0 (size 32): comm "swapper/0", pid 1, jiffies 4294668610 (age 68.432s) hex dump (first 32 bytes): 00 cc cc cc 29 10 01 c0 00 00 00 00 00 00 00 00 ....)........... 00 42 82 f6 cc cc cc cc cc cc cc cc cc cc cc cc .B.............. backtrace: [<461c1d50>] __kmem_cache_alloc_node+0x106/0x260 [<ea65e13b>] __kmalloc+0x54/0x160 [<c3858cd2>] msr_build_context.constprop.0+0x35/0x100 [<46635aff>] pm_check_save_msr+0x63/0x80 [<6b6bb938>] do_one_initcall+0x41/0x1f0 [<3f3add60>] kernel_init_freeable+0x199/0x1e8 [<3b538fde>] kernel_init+0x1a/0x110 [<938ae2b2>] ret_from_fork+0x1c/0x28 Which is a false positive. Reproducer: - Run rsync of whole kernel tree (multiple times if needed). - start a kmemleak scan - Note this is just an example: a lot of our internal tests hit these. The root cause is similar to the fix in: b0b592cf0836 x86/pm: Fix false positive kmemleak report in msr_build_context() ie. the alignment within the packed struct saved_context which has everything unaligned as there is only "u16 gs;" at start of struct where in the past there were four u16 there thus aligning everything afterwards. The issue is with the fact that Kmemleak only searches for pointers that are aligned (see how pointers are scanned in kmemleak.c) so when the struct members are not aligned it doesn't see them. Testing: We run a lot of tests with our CI, and after applying this fix we do not see any kmemleak issues any more whilst without it we see hundreds of the above report. From a single, simple test run consisting of 416 individual test cases on kernel 5.10 x86 with kmemleak enabled we got 20 failures due to this, which is quite a lot. With this fix applied we get zero kmemleak related failures. Fixes: 7ee18d677989 ("x86/power: Make restore_processor_context() sane") Signed-off-by: Anton Altaparmakov <anton@tuxera.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: "Rafael J. Wysocki" <rafael@kernel.org> Cc: stable@vger.kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20240314142656.17699-1-anton@tuxera.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-04-10x86/CPU/AMD: Update the Zenbleed microcode revisionsBorislav Petkov (AMD)
[ Upstream commit 5c84b051bd4e777cf37aaff983277e58c99618d5 ] Update them to the correct revision numbers. Fixes: 522b1d69219d ("x86/cpu/amd: Add a Zenbleed fix") Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-04-10KVM: x86: Use a switch statement and macros in __feature_translate()Jim Mattson
commit 80c883db87d9ffe2d685e91ba07a087b1c246c78 upstream. Use a switch statement with macro-generated case statements to handle translating feature flags in order to reduce the probability of runtime errors due to copy+paste goofs, to make compile-time errors easier to debug, and to make the code more readable. E.g. the compiler won't directly generate an error for duplicate if statements if (x86_feature == X86_FEATURE_SGX1) return KVM_X86_FEATURE_SGX1; else if (x86_feature == X86_FEATURE_SGX2) return KVM_X86_FEATURE_SGX1; and so instead reverse_cpuid_check() will fail due to the untranslated entry pointing at a Linux-defined leaf, which provides practically no hint as to what is broken arch/x86/kvm/reverse_cpuid.h:108:2: error: call to __compiletime_assert_450 declared with 'error' attribute: BUILD_BUG_ON failed: x86_leaf == CPUID_LNX_4 BUILD_BUG_ON(x86_leaf == CPUID_LNX_4); ^ whereas duplicate case statements very explicitly point at the offending code: arch/x86/kvm/reverse_cpuid.h:125:2: error: duplicate case value '361' KVM_X86_TRANSLATE_FEATURE(SGX2); ^ arch/x86/kvm/reverse_cpuid.h:124:2: error: duplicate case value '360' KVM_X86_TRANSLATE_FEATURE(SGX1); ^ And without macros, the opposite type of copy+paste goof doesn't generate any error at compile-time, e.g. this yields no complaints: case X86_FEATURE_SGX1: return KVM_X86_FEATURE_SGX1; case X86_FEATURE_SGX2: return KVM_X86_FEATURE_SGX1; Note, __feature_translate() is forcibly inlined and the feature is known at compile-time, so the code generation between an if-elif sequence and a switch statement should be identical. Signed-off-by: Jim Mattson <jmattson@google.com> Link: https://lore.kernel.org/r/20231024001636.890236-2-jmattson@google.com [sean: use a macro, rewrite changelog] Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-10KVM: x86: Advertise CPUID.(EAX=7,ECX=2):EDX[5:0] to userspaceJim Mattson
commit eefe5e6682099445f77f2d97d4c525f9ac9d9b07 upstream. The low five bits {INTEL_PSFD, IPRED_CTRL, RRSBA_CTRL, DDPD_U, BHI_CTRL} advertise the availability of specific bits in IA32_SPEC_CTRL. Since KVM dynamically determines the legal IA32_SPEC_CTRL bits for the underlying hardware, the hard work has already been done. Just let userspace know that a guest can use these IA32_SPEC_CTRL bits. The sixth bit (MCDT_NO) states that the processor does not exhibit MXCSR Configuration Dependent Timing (MCDT) behavior. This is an inherent property of the physical processor that is inherited by the virtual CPU. Pass that information on to userspace. Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Chao Gao <chao.gao@intel.com> Link: https://lore.kernel.org/r/20231024001636.890236-1-jmattson@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-10KVM: x86: Update KVM-only leaf handling to allow for 100% KVM-only leafsSean Christopherson
commit 047c7229906152fb85c23dc18fd25a00cd7cb4de upstream. Rename kvm_cpu_cap_init_scattered() to kvm_cpu_cap_init_kvm_defined() in anticipation of adding KVM-only CPUID leafs that aren't recognized by the kernel and thus not scattered, i.e. for leafs that are 100% KVM-defined. Adjust/add comments to kvm_only_cpuid_leafs and KVM_X86_FEATURE to document how to create new kvm_only_cpuid_leafs entries for scattered features as well as features that are entirely unknown to the kernel. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221125125845.1182922-3-jiaxi.chen@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-10x86/bugs: Use sysfs_emit()Borislav Petkov
commit 1d30800c0c0ae1d086ffad2bdf0ba4403370f132 upstream. Those mitigations are very talkative; use the printing helper which pays attention to the buffer size. Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220809153419.10182-1-bp@alien8.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-10x86/cpu: Support AMD Automatic IBRSKim Phillips
commit e7862eda309ecfccc36bb5558d937ed3ace07f3f upstream. The AMD Zen4 core supports a new feature called Automatic IBRS. It is a "set-and-forget" feature that means that, like Intel's Enhanced IBRS, h/w manages its IBRS mitigation resources automatically across CPL transitions. The feature is advertised by CPUID_Fn80000021_EAX bit 8 and is enabled by setting MSR C000_0080 (EFER) bit 21. Enable Automatic IBRS by default if the CPU feature is present. It typically provides greater performance over the incumbent generic retpolines mitigation. Reuse the SPECTRE_V2_EIBRS spectre_v2_mitigation enum. AMD Automatic IBRS and Intel Enhanced IBRS have similar enablement. Add NO_EIBRS_PBRSB to cpu_vuln_whitelist, since AMD Automatic IBRS isn't affected by PBRSB-eIBRS. The kernel command line option spectre_v2=eibrs is used to select AMD Automatic IBRS, if available. Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Sean Christopherson <seanjc@google.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/r/20230124163319.2277355-8-kim.phillips@amd.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-28Merge tag 'v5.15.153' into v5.15/standard/baseBruce Ashfield
Linux 5.15.153 # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCgAdFiEE4n5dijQDou9mhzu83qZv95d3LNwFAmYDTp4ACgkQ3qZv95d3 # LNytng/+LA5JjTwpYPtEJVEdEo2RIAZHC8BXmgSVclP7nskYDQL+P70G3FdDvJQT # DS90xLpuyrJsYVbro9dhJ7+nkc5c62043t46fZdOfSh4jqq1oj7TQ8HLLqnOzaqu # 5/C5YSDDxqoiUsG8IgUg9qlBQATpKlD2/gEY7Vl6JL1xsatviCX5Slnrlm2MC6rZ # Ojv3ROKOk9OHfdwRi9RYUvH+Q05Wh/bkv4ZGJwhlGb3YbuKeYHqiS1Bpvo9BVpG7 # 7kZPOFhhcJ2RZ2zdcCpwTMWgiTEZV4CtL8to0pHoDs2/wswv8arDMyE+LXr8iQIj # XYyyLuDParb9lJzr5PgvN8CFlF2EktrycsaT0Jk4ekiUQsdq4bvyKpBkbv1HtPm1 # MpkH58xQf590A7k+wki0WQI8Q4oLmWCbzkdw2LIbojISa5Tza9ivq6pcMBaugRTY # DXpphQuw9ScT35yrXdaPLnjc/PpNIjxqCcSvcewfOggfaU13kghfmaANiBvjLSJs # R91tHo26P3I1li3YvI4dc5rq8AIJhFwvcqsQ6Uv7pkc1pR6H1viIBJMb68swRFL4 # 4imFR5gCLuoks8lIVcDLNNebnbmzu7Fhdh42asQMRajSDY/TqlSO7g+QiAd/nK/x # GVV+RFxAvadYO2yQqdpKL68EOnhSl/b+t0T2EpKsPDSdEATsFdo= # =KytE # -----END PGP SIGNATURE----- # gpg: Signature made Tue 26 Mar 2024 06:39:26 PM EDT # gpg: using RSA key E27E5D8A3403A2EF66873BBCDEA66FF797772CDC # gpg: Can't check signature: No public key
2024-03-26x86, relocs: Ignore relocations in .notes sectionKees Cook
[ Upstream commit aaa8736370db1a78f0e8434344a484f9fd20be3b ] When building with CONFIG_XEN_PV=y, .text symbols are emitted into the .notes section so that Xen can find the "startup_xen" entry point. This information is used prior to booting the kernel, so relocations are not useful. In fact, performing relocations against the .notes section means that the KASLR base is exposed since /sys/kernel/notes is world-readable. To avoid leaking the KASLR base without breaking unprivileged tools that are expecting to read /sys/kernel/notes, skip performing relocations in the .notes section. The values readable in .notes are then identical to those found in System.map. Reported-by: Guixiong Wei <guixiongwei@gmail.com> Closes: https://lore.kernel.org/all/20240218073501.54555-1-guixiongwei@gmail.com/ Fixes: 5ead97c84fa7 ("xen: Core Xen implementation") Fixes: da1a679cde9b ("Add /sys/kernel/notes") Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-26x86/mm: Disallow vsyscall page read for copy_from_kernel_nofault()Hou Tao
[ Upstream commit 32019c659ecfe1d92e3bf9fcdfbb11a7c70acd58 ] When trying to use copy_from_kernel_nofault() to read vsyscall page through a bpf program, the following oops was reported: BUG: unable to handle page fault for address: ffffffffff600000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 3231067 P4D 3231067 PUD 3233067 PMD 3235067 PTE 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 1 PID: 20390 Comm: test_progs ...... 6.7.0+ #58 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ...... RIP: 0010:copy_from_kernel_nofault+0x6f/0x110 ...... Call Trace: <TASK> ? copy_from_kernel_nofault+0x6f/0x110 bpf_probe_read_kernel+0x1d/0x50 bpf_prog_2061065e56845f08_do_probe_read+0x51/0x8d trace_call_bpf+0xc5/0x1c0 perf_call_bpf_enter.isra.0+0x69/0xb0 perf_syscall_enter+0x13e/0x200 syscall_trace_enter+0x188/0x1c0 do_syscall_64+0xb5/0xe0 entry_SYSCALL_64_after_hwframe+0x6e/0x76 </TASK> ...... ---[ end trace 0000000000000000 ]--- The oops is triggered when: 1) A bpf program uses bpf_probe_read_kernel() to read from the vsyscall page and invokes copy_from_kernel_nofault() which in turn calls __get_user_asm(). 2) Because the vsyscall page address is not readable from kernel space, a page fault exception is triggered accordingly. 3) handle_page_fault() considers the vsyscall page address as a user space address instead of a kernel space address. This results in the fix-up setup by bpf not being applied and a page_fault_oops() is invoked due to SMAP. Considering handle_page_fault() has already considered the vsyscall page address as a userspace address, fix the problem by disallowing vsyscall page read for copy_from_kernel_nofault(). Originally-by: Thomas Gleixner <tglx@linutronix.de> Reported-by: syzbot+72aa0161922eba61b50e@syzkaller.appspotmail.com Closes: https://lore.kernel.org/bpf/CAG48ez06TZft=ATH1qh2c5mpS5BT8UakwNkzi6nvK5_djC-4Nw@mail.gmail.com Reported-by: xingwei lee <xrivendell7@gmail.com> Closes: https://lore.kernel.org/bpf/CABOYnLynjBoFZOf3Z4BhaZkc5hx_kHfsjiW+UWLoB=w33LvScw@mail.gmail.com Signed-off-by: Hou Tao <houtao1@huawei.com> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240202103935.3154011-3-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-26x86/mm: Move is_vsyscall_vaddr() into asm/vsyscall.hHou Tao
[ Upstream commit ee0e39a63b78849f8abbef268b13e4838569f646 ] Move is_vsyscall_vaddr() into asm/vsyscall.h to make it available for copy_from_kernel_nofault_allowed() in arch/x86/mm/maccess.c. Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20240202103935.3154011-2-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-26x86/xen: Add some null pointer checking to smp.cKunwu Chan
[ Upstream commit 3693bb4465e6e32a204a5b86d3ec7e6b9f7e67c2 ] kasprintf() returns a pointer to dynamically allocated memory which can be NULL upon failure. Ensure the allocation was successful by checking the pointer validity. Signed-off-by: Kunwu Chan <chentao@kylinos.cn> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202401161119.iof6BQsf-lkp@intel.com/ Suggested-by: Markus Elfring <Markus.Elfring@web.de> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20240119094948.275390-1-chentao@kylinos.cn Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-12Merge tag 'v5.15.151' into v5.15/standard/baseBruce Ashfield
This is the 5.15.151 stable release # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmXogAMACgkQONu9yGCS # aT4u9Q/+LMOVzmgMPWkZFql87GEiFbUfbXi3rCO5XAwWnPPnFLr4qnhZKl3shaLt # dWCgKuFOpNviWVWcKjUw/d/49Bt7vR8rW3l11NLI5d30SroT3lD5pA1Y5bWDWT33 # 9dNejZtXEJYT4oAITg+6DG9MbqLk/3Dkh4QVXIlEu7F70Yovgtki9z9Jv1cegaHU # fb0mys5yb04b074H2Gmj0/qDUNRVmFRkz4vutF99xojlQRCwqlzXzqvdZJnuenl3 # J6otRgwPvZSg3PzRSLxTWI2/6ot+zGA8sNmS0s73qyU7U+kjanVp5uTZbi2VP9BN # h7CUtl8moK/tFa+xIPFaOSaCS4FUS9DX+LifsV8FH0c+rXaMhy5yCDBezrRQg2gf # QXG/umXGi+Wv4uT4KkVbywZtdZp6dvNL7TZtPd7p+MSAe8+/QXn5QI4yRlokvDoR # iUVi4JClRCFYJHLJLdzOFuKzgQHK1Kg82duf8RDspbp5ukq62OYavXjRWoxgTDvd # Uro8tO8blZ3Tdp6Y1uCS+US+nGdrDkgmbVV2LN3UyH26A2ic6x5h4hLCCcGYWGaV # zEDfohi8fzlzHsneHXS6lKc/CBFonUQ0ME63TsS8JP/dOwO75QeM3iusjfXQFFnL # oTj+mhMgJ2fRSsoEV63RSzaTG5ewL25pmfTUnYMQHWiziJBe5cE= # =UFYi # -----END PGP SIGNATURE----- # gpg: Signature made Wed 06 Mar 2024 09:38:59 AM EST # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2024-03-06x86/cpu/intel: Detect TME keyid bits before setting MTRR mask registersPaolo Bonzini
commit 6890cb1ace350b4386c8aee1343dc3b3ddd214da upstream. MKTME repurposes the high bit of physical address to key id for encryption key and, even though MAXPHYADDR in CPUID[0x80000008] remains the same, the valid bits in the MTRR mask register are based on the reduced number of physical address bits. detect_tme() in arch/x86/kernel/cpu/intel.c detects TME and subtracts it from the total usable physical bits, but it is called too late. Move the call to early_init_intel() so that it is called in setup_arch(), before MTRRs are setup. This fixes boot on TDX-enabled systems, which until now only worked with "disable_mtrr_cleanup". Without the patch, the values written to the MTRRs mask registers were 52-bit wide (e.g. 0x000fffff_80000800) and the writes failed; with the patch, the values are 46-bit wide, which matches the reduced MAXPHYADDR that is shown in /proc/cpuinfo. Reported-by: Zixi Chen <zixchen@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc:stable@vger.kernel.org Link: https://lore.kernel.org/all/20240131230902.1867092-3-pbonzini%40redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-03Merge tag 'v5.15.150' into v5.15/standard/baseBruce Ashfield
This is the 5.15.150 stable release # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmXhyIMACgkQONu9yGCS # aT5h3hAAjwfrpfkyZCFnYb5KWR06B1quaY2u0Tw5wKoLvtIhbnxBeNW0w+YuXr2p # GV6Dlozx4RnRdnJFIs0R98APPgw7SY8QRfPTsr2xoMr96FUg6VFuZFv4HyRBbDac # NqKOVQvWXGgC/56uGnw1IcewUcZtmT/QfwthEtfjfBzKdqJC0HnGhSGPDUeimw9k # h9jghsdWYZB4ykMSz22VngsztJJ60t5EAlwEMueBQsQeRPODcvGE1va9Jlpgjw66 # D4e6+ZGosQP/f9lezvYiESV1dzSRUTpgyl8wky3hSoYQmSBMi6BHq2v6ECn9t2LK # bVMVRzY+OK1AHXzkxHsb/QhHJfS4YNbHmFzpHVenf6I/VsFsxwtKeoT23PCeAMqh # 72v2RnJNDCSdydb3iYKx8s7hpkWumPS4LS0EOEh64+gFo/+TfMyWqIeSJfGmyhx8 # 60DX4FLWjqymP96OU7UjbvnlBhGlL2OmoDzgxGaQ9jFH3Kfmx30yMY3RtIb3Qhic # OiATTmRmRwICUYNskXn3e1jFVW/2a/aEEanCqEi2xBCzGMSzqeAw4hv9EIZOgVcE # FC3a09HpR4G4xP57ZYcz4Dry9xs8HoJJrAQwUUJ6X7SqHZWaiSt92ZblebLeaAbH # uo+PFmG7m9HSv/s66JB5+fW7H6RRnM+ozJ91Iy5N3T356n03N2o= # =hhwy # -----END PGP SIGNATURE----- # gpg: Signature made Fri 01 Mar 2024 07:22:27 AM EST # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2024-03-01x86/alternative: Make custom return thunk unconditionalPeter Zijlstra
Upstream commit: 095b8303f3835c68ac4a8b6d754ca1c3b6230711 There is infrastructure to rewrite return thunks to point to any random thunk one desires, unwrap that from CALL_THUNKS, which up to now was the sole user of that. [ bp: Make the thunks visible on 32-bit and add ifdeffery for the 32-bit builds. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230814121148.775293785@infradead.org Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-01Revert "x86/alternative: Make custom return thunk unconditional"Borislav Petkov (AMD)
This reverts commit 08f7cfd44f77b2796582bc26164fdef44dd33b6c. Revert the backport of upstream commit: 095b8303f383 ("x86/alternative: Make custom return thunk unconditional") in order to backport the full version now that 770ae1b70952 ("x86/returnthunk: Allow different return thunks") has been backported. Revert it here so that the build breakage is kept at minimum. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-01x86/returnthunk: Allow different return thunksPeter Zijlstra
Upstream commit: 770ae1b709528a6a173b5c7b183818ee9b45e376 In preparation for call depth tracking on Intel SKL CPUs, make it possible to patch in a SKL specific return thunk. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220915111147.680469665@infradead.org Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-01x86/ftrace: Use alternative RET encodingPeter Zijlstra
Upstream commit: 1f001e9da6bbf482311e45e48f53c2bd2179e59c Use the return thunk in ftrace trampolines, if needed. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-01x86/ibt,paravirt: Use text_gen_insn() for paravirt_patch()Peter Zijlstra
Upstream commit: ba27d1a80871eb8dbeddf34ec7d396c149cbb8d7 Less duplication is more better. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lore.kernel.org/r/20220308154317.697253958@infradead.org Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-01x86/text-patching: Make text_gen_insn() play nice with ANNOTATE_NOENDBRPeter Zijlstra
Upstream commit: bbf92368b0b1fe472d489e62d3340d7897e9c697 Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lore.kernel.org/r/20220308154317.638561109@infradead.org Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-01Revert "x86/ftrace: Use alternative RET encoding"Borislav Petkov (AMD)
This reverts commit 3eb602ad6a94a76941f93173131a71ad36fa1324. Revert the backport of upstream commit 1f001e9da6bb ("x86/ftrace: Use alternative RET encoding") in favor of a proper backport after backporting the commit which adds __text_gen_insn(). Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-01x86/fpu: Stop relying on userspace for info to fault in xsave bufferAndrei Vagin
commit d877550eaf2dc9090d782864c96939397a3c6835 upstream. Before this change, the expected size of the user space buffer was taken from fx_sw->xstate_size. fx_sw->xstate_size can be changed from user-space, so it is possible construct a sigreturn frame where: * fx_sw->xstate_size is smaller than the size required by valid bits in fx_sw->xfeatures. * user-space unmaps parts of the sigrame fpu buffer so that not all of the buffer required by xrstor is accessible. In this case, xrstor tries to restore and accesses the unmapped area which results in a fault. But fault_in_readable succeeds because buf + fx_sw->xstate_size is within the still mapped area, so it goes back and tries xrstor again. It will spin in this loop forever. Instead, fault in the maximum size which can be touched by XRSTOR (taken from fpstate->user_size). [ dhansen: tweak subject / changelog ] Fixes: fcb3635f5018 ("x86/fpu/signal: Handle #PF in the direct restore path") Reported-by: Konstantin Bogomolov <bogomolov@google.com> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrei Vagin <avagin@google.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc:stable@vger.kernel.org Link: https://lore.kernel.org/all/20240130063603.3392627-1-avagin%40google.com Link: https://lore.kernel.org/all/20240130063603.3392627-1-avagin%40google.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-02-26Merge tag 'v5.15.149' into v5.15/standard/baseBruce Ashfield
This is the 5.15.149 stable release # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmXYT2wACgkQONu9yGCS # aT6i6w//WPR54VwLS08xIhUBZlaUbSxtQW4IASxE87RklBpl60NvxGQjYpmRTuim # 428ek+PI6qLJnQpESN4s4zf2/q2+VUmbp98Bib4Yi2AtO0GRlk9kg10CA+Rda32l # qddiOpjIZfW2PkxegZ2tE29plgDQT3GWWjaDVSuZMvTVJmftvw6j4A0T9QiXeXOo # jGWzzQSOhbebKxXeSnecNg9CFGmzt9YeiJOPO05f67c8MK9JurF6WxkCVZ9VxF2b # INF21WUqKRB7gNdsJQb8sbaf0sjnVMOpP1lTjcU/IFEKmriVyT/ImyX9KPS3p7cj # 3ZAIZzmCVSM/ATYJm52p31QhytGHe45qf6knr2GQ/FJOhQxsGHKtj47zao9lo2Hi # /JpvyVnh1dWmcv2LtrDPjmRf/QnixVX3Kp/mn4zamp9n7/DqsJ14DSwf4plo/d7d # jN41GBb69e8kI+9rB8uK6pD8ua+gmXaIJW6DF4SySzovG4WYKE88xecrsmegSlOH # AOlr3JWzRE0PW3JHkk9Jc0ZPJrHivp1jv7RPS8i0A1JT+/dFx/B/BdDwhNLX3P/w # ysLaxaF9BAB4+5O6CsW6U0y9CFEZgbBKsVJu5ZY6rvZA8YDPgcfoUB5bZKRnWYcz # Menq1D7tr8hZbQWXbOgu/XyNJJXMl3C2b/j2V2GpNybVm/uXQrs= # =HI6Z # -----END PGP SIGNATURE----- # gpg: Signature made Fri 23 Feb 2024 02:55:24 AM EST # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2024-02-23x86/mm/ident_map: Use gbpages only where full GB page should be mapped.Steve Wahl
commit d794734c9bbfe22f86686dc2909c25f5ffe1a572 upstream. When ident_pud_init() uses only gbpages to create identity maps, large ranges of addresses not actually requested can be included in the resulting table; a 4K request will map a full GB. On UV systems, this ends up including regions that will cause hardware to halt the system if accessed (these are marked "reserved" by BIOS). Even processor speculation into these regions is enough to trigger the system halt. Only use gbpages when map creation requests include the full GB page of space. Fall back to using smaller 2M pages when only portions of a GB page are included in the request. No attempt is made to coalesce mapping requests. If a request requires a map entry at the 2M (pmd) level, subsequent mapping requests within the same 1G region will also be at the pmd level, even if adjacent or overlapping such requests could have been combined to map a full gbpage. Existing usage starts with larger regions and then adds smaller regions, so this should not have any great consequence. [ dhansen: fix up comment formatting, simplifty changelog ] Signed-off-by: Steve Wahl <steve.wahl@hpe.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20240126164841.170866-1-steve.wahl%40hpe.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-02-23x86/Kconfig: Transmeta Crusoe is CPU family 5, not 6Aleksander Mazur
commit f6a1892585cd19e63c4ef2334e26cd536d5b678d upstream. The kernel built with MCRUSOE is unbootable on Transmeta Crusoe. It shows the following error message: This kernel requires an i686 CPU, but only detected an i586 CPU. Unable to boot - please use a kernel appropriate for your CPU. Remove MCRUSOE from the condition introduced in commit in Fixes, effectively changing X86_MINIMUM_CPU_FAMILY back to 5 on that machine, which matches the CPU family given by CPUID. [ bp: Massage commit message. ] Fixes: 25d76ac88821 ("x86/Kconfig: Explicitly enumerate i686-class CPUs in Kconfig") Signed-off-by: Aleksander Mazur <deweloper@wp.pl> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: <stable@kernel.org> Link: https://lore.kernel.org/r/20240123134309.1117782-1-deweloper@wp.pl Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-02-23arch: consolidate arch_irq_work_raise prototypesArnd Bergmann
[ Upstream commit 64bac5ea17d527872121adddfee869c7a0618f8f ] The prototype was hidden in an #ifdef on x86, which causes a warning: kernel/irq_work.c:72:13: error: no previous prototype for 'arch_irq_work_raise' [-Werror=missing-prototypes] Some architectures have a working prototype, while others don't. Fix this by providing it in only one place that is always visible. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Acked-by: Guo Ren <guoren@kernel.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-23x86/mce: Mark fatal MCE's page as poison to avoid panic in the kdump kernelZhiquan Li
[ Upstream commit 9f3b130048bfa2e44a8cfb1b616f826d9d5d8188 ] Memory errors don't happen very often, especially fatal ones. However, in large-scale scenarios such as data centers, that probability increases with the amount of machines present. When a fatal machine check happens, mce_panic() is called based on the severity grading of that error. The page containing the error is not marked as poison. However, when kexec is enabled, tools like makedumpfile understand when pages are marked as poison and do not touch them so as not to cause a fatal machine check exception again while dumping the previous kernel's memory. Therefore, mark the page containing the error as poisoned so that the kexec'ed kernel can avoid accessing the page. [ bp: Rewrite commit message and comment. ] Co-developed-by: Youquan Song <youquan.song@intel.com> Signed-off-by: Youquan Song <youquan.song@intel.com> Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Link: https://lore.kernel.org/r/20231014051754.3759099-1-zhiquan1.li@intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-23x86/boot: Ignore NMIs during very early bootJun'ichi Nomura
[ Upstream commit 78a509fba9c9b1fcb77f95b7c6be30da3d24823a ] When there are two racing NMIs on x86, the first NMI invokes NMI handler and the 2nd NMI is latched until IRET is executed. If panic on NMI and panic kexec are enabled, the first NMI triggers panic and starts booting the next kernel via kexec. Note that the 2nd NMI is still latched. During the early boot of the next kernel, once an IRET is executed as a result of a page fault, then the 2nd NMI is unlatched and invokes the NMI handler. However, NMI handler is not set up at the early stage of boot, which results in a boot failure. Avoid such problems by setting up a NOP handler for early NMIs. [ mingo: Refined the changelog. ] Signed-off-by: Jun'ichi Nomura <junichi.nomura@nec.com> Signed-off-by: Derek Barbosa <debarbos@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-23x86/entry/ia32: Ensure s32 is sign extended to s64Richard Palethorpe
commit 56062d60f117dccfb5281869e0ab61e090baf864 upstream. Presently ia32 registers stored in ptregs are unconditionally cast to unsigned int by the ia32 stub. They are then cast to long when passed to __se_sys*, but will not be sign extended. This takes the sign of the syscall argument into account in the ia32 stub. It still casts to unsigned int to avoid implementation specific behavior. However then casts to int or unsigned int as necessary. So that the following cast to long sign extends the value. This fixes the io_pgetevents02 LTP test when compiled with -m32. Presently the systemcall io_pgetevents_time64() unexpectedly accepts -1 for the maximum number of events. It doesn't appear other systemcalls with signed arguments are effected because they all have compat variants defined and wired up. Fixes: ebeb8c82ffaf ("syscalls/x86: Use 'struct pt_regs' based syscall calling for IA32_EMULATION and x32") Suggested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Richard Palethorpe <rpalethorpe@suse.com> Signed-off-by: Nikolay Borisov <nik.borisov@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240110130122.3836513-1-nik.borisov@suse.com Link: https://lore.kernel.org/ltp/20210921130127.24131-1-rpalethorpe@suse.com/ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-02-06Merge tag 'v5.15.148' into v5.15/standard/baseBruce Ashfield
This is the 5.15.148 stable release # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmWy5lAACgkQONu9yGCS # aT7qwRAAl0pWI9Dwywt2Yn7Rd350Vz974RsctlTWtheBztaZ/589Vdj4xO6do3A1 # FWjG0bWBibs1TnfazplWNSApxHy+eYnxwiWUzxPDhJnRJGp4I0NBsnCkRjNCVRqN # xlXvzc6vwdnNs9enX/NSTUvA3+AkHRURJbPO5MJypmFL6+eR6SibTBWRwsLZGtTR # KB9OZZM8UFwH5DPO0VYEVqyfBeehXCM3FukQ5M/2QjRTPpu8PyX1hNOu3rGak/jr # bnL4AdHG3RcdbKc6r3nMSzW3ep0nJ3pgi2eZlH6QsV+fBxiuw6vdzk9LwUhc1ggR # DaFlPbczM6hSkLW06zI2Z8cmkLlWfR3zINin+Y4CM0Tl5CSjW9gImj9XqP97Oenc # H5P72ErFx0U9sNmkmr2KR3MiRP1ESLxgc2dwJc5AAJHT75WZO6fZiEXnKVH9sf6M # pXNu7GiGcW+2kwnVInUxce3CaoTYtCFjrYmVJ7c+GSseMpgKneXSTARr6Kd3HtBS # F7R6Uwqy/I6m0JtXYpLbc9TGXypzVygLMTZBbxAU9KOfaaWhbyVWw73dmP5EZQGr # GSHysnBGTX0liUxbEVisS/0ClFsb2ws1BSSiJ9TaaOVyg7f9xC0R/17C9jUSaQOS # QCwIPdQuKsLs4zM3IuWGi7vlSFYkEz4VljC4abe3OfkptjoozdI= # =hc5X # -----END PGP SIGNATURE----- # gpg: Signature made Thu 25 Jan 2024 05:53:04 PM EST # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key