aboutsummaryrefslogtreecommitdiffstats
path: root/arch/x86
AgeCommit message (Collapse)Author
2022-05-09Merge tag 'v5.4.192' into v5.4/standard/baseBruce Ashfield
This is the 5.4.192 stable release # gpg: Signature made Mon 09 May 2022 03:03:50 AM EDT # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2022-05-09Merge tag 'v5.4.191' into v5.4/standard/baseBruce Ashfield
This is the 5.4.191 stable release # gpg: Signature made Wed 27 Apr 2022 07:50:55 AM EDT # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2022-05-09x86/cpu: Load microcode during restore_processor_state()Borislav Petkov
commit f9e14dbbd454581061c736bf70bf5cbb15ac927c upstream. When resuming from system sleep state, restore_processor_state() restores the boot CPU MSRs. These MSRs could be emulated by microcode. If microcode is not loaded yet, writing to emulated MSRs leads to unchecked MSR access error: ... PM: Calling lapic_suspend+0x0/0x210 unchecked MSR access error: WRMSR to 0x10f (tried to write 0x0...0) at rIP: ... (native_write_msr) Call Trace: <TASK> ? restore_processor_state x86_acpi_suspend_lowlevel acpi_suspend_enter suspend_devices_and_enter pm_suspend.cold state_store kobj_attr_store sysfs_kf_write kernfs_fop_write_iter new_sync_write vfs_write ksys_write __x64_sys_write do_syscall_64 entry_SYSCALL_64_after_hwframe RIP: 0033:0x7fda13c260a7 To ensure microcode emulated MSRs are available for restoration, load the microcode on the boot CPU before restoring these MSRs. [ Pawan: write commit message and productize it. ] Fixes: e2a1256b17b1 ("x86/speculation: Restore speculation related MSRs during S3 resume") Reported-by: Kyle D. Pelton <kyle.d.pelton@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Tested-by: Kyle D. Pelton <kyle.d.pelton@intel.com> Cc: stable@vger.kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=215841 Link: https://lore.kernel.org/r/4350dfbf785cd482d3fafa72b2b49c83102df3ce.1650386317.git.pawan.kumar.gupta@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-09x86: __memcpy_flushcache: fix wrong alignment if size > 2^32Mikulas Patocka
[ Upstream commit a6823e4e360fe975bd3da4ab156df7c74c8b07f3 ] The first "if" condition in __memcpy_flushcache is supposed to align the "dest" variable to 8 bytes and copy data up to this alignment. However, this condition may misbehave if "size" is greater than 4GiB. The statement min_t(unsigned, size, ALIGN(dest, 8) - dest); casts both arguments to unsigned int and selects the smaller one. However, the cast truncates high bits in "size" and it results in misbehavior. For example: suppose that size == 0x100000001, dest == 0x200000002 min_t(unsigned, size, ALIGN(dest, 8) - dest) == min_t(0x1, 0xe) == 0x1; ... dest += 0x1; so we copy just one byte "and" dest remains unaligned. This patch fixes the bug by replacing unsigned with size_t. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-27stat: fix inconsistency between struct stat and struct compat_statMikulas Patocka
[ Upstream commit 932aba1e169090357a77af18850a10c256b50819 ] struct stat (defined in arch/x86/include/uapi/asm/stat.h) has 32-bit st_dev and st_rdev; struct compat_stat (defined in arch/x86/include/asm/compat.h) has 16-bit st_dev and st_rdev followed by a 16-bit padding. This patch fixes struct compat_stat to match struct stat. [ Historical note: the old x86 'struct stat' did have that 16-bit field that the compat layer had kept around, but it was changes back in 2003 by "struct stat - support larger dev_t": https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=e95b2065677fe32512a597a79db94b77b90c968d and back in those days, the x86_64 port was still new, and separate from the i386 code, and had already picked up the old version with a 16-bit st_dev field ] Note that we can't change compat_dev_t because it is used by compat_loop_info. Also, if the st_dev and st_rdev values are 32-bit, we don't have to use old_valid_dev to test if the value fits into them. This fixes -EOVERFLOW on filesystems that are on NVMe because NVMe uses the major number 259. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: Andreas Schwab <schwab@linux-m68k.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-20Merge tag 'v5.4.189' into v5.4/standard/baseBruce Ashfield
This is the 5.4.189 stable release # gpg: Signature made Fri 15 Apr 2022 08:18:47 AM EDT # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2022-04-15x86/speculation: Restore speculation related MSRs during S3 resumePawan Gupta
commit e2a1256b17b16f9b9adf1b6fea56819e7b68e463 upstream. After resuming from suspend-to-RAM, the MSRs that control CPU's speculative execution behavior are not being restored on the boot CPU. These MSRs are used to mitigate speculative execution vulnerabilities. Not restoring them correctly may leave the CPU vulnerable. Secondary CPU's MSRs are correctly being restored at S3 resume by identify_secondary_cpu(). During S3 resume, restore these MSRs for boot CPU when restoring its processor state. Fixes: 772439717dbf ("x86/bugs/intel: Set proper CPU features and setup RDS") Reported-by: Neelima Krishnan <neelima.krishnan@intel.com> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Tested-by: Neelima Krishnan <neelima.krishnan@intel.com> Acked-by: Borislav Petkov <bp@suse.de> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-15x86/pm: Save the MSR validity status at context setupPawan Gupta
commit 73924ec4d560257004d5b5116b22a3647661e364 upstream. The mechanism to save/restore MSRs during S3 suspend/resume checks for the MSR validity during suspend, and only restores the MSR if its a valid MSR. This is not optimal, as an invalid MSR will unnecessarily throw an exception for every suspend cycle. The more invalid MSRs, higher the impact will be. Check and save the MSR validity at setup. This ensures that only valid MSRs that are guaranteed to not throw an exception will be attempted during suspend. Fixes: 7a9c2dd08ead ("x86/pm: Introduce quirk framework to save/restore extra MSR registers around suspend/resume") Suggested-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-15xen: delay xen_hvm_init_time_ops() if kdump is boot on vcpu>=32Dongli Zhang
[ Upstream commit eed05744322da07dd7e419432dcedf3c2e017179 ] The sched_clock() can be used very early since commit 857baa87b642 ("sched/clock: Enable sched clock early"). In addition, with commit 38669ba205d1 ("x86/xen/time: Output xen sched_clock time from 0"), kdump kernel in Xen HVM guest may panic at very early stage when accessing &__this_cpu_read(xen_vcpu)->time as in below: setup_arch() -> init_hypervisor_platform() -> x86_init.hyper.init_platform = xen_hvm_guest_init() -> xen_hvm_init_time_ops() -> xen_clocksource_read() -> src = &__this_cpu_read(xen_vcpu)->time; This is because Xen HVM supports at most MAX_VIRT_CPUS=32 'vcpu_info' embedded inside 'shared_info' during early stage until xen_vcpu_setup() is used to allocate/relocate 'vcpu_info' for boot cpu at arbitrary address. However, when Xen HVM guest panic on vcpu >= 32, since xen_vcpu_info_reset(0) would set per_cpu(xen_vcpu, cpu) = NULL when vcpu >= 32, xen_clocksource_read() on vcpu >= 32 would panic. This patch calls xen_hvm_init_time_ops() again later in xen_hvm_smp_prepare_boot_cpu() after the 'vcpu_info' for boot vcpu is registered when the boot vcpu is >= 32. This issue can be reproduced on purpose via below command at the guest side when kdump/kexec is enabled: "taskset -c 33 echo c > /proc/sysrq-trigger" The bugfix for PVM is not implemented due to the lack of testing environment. [boris: xen_hvm_init_time_ops() returns on errors instead of jumping to end] Cc: Joe Jin <joe.jin@oracle.com> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/20220302164032.14569-3-dongli.zhang@oracle.com Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-15KVM: x86/svm: Clear reserved bits written to PerfEvtSeln MSRsJim Mattson
[ Upstream commit 9b026073db2f1ad0e4d8b61c83316c8497981037 ] AMD EPYC CPUs never raise a #GP for a WRMSR to a PerfEvtSeln MSR. Some reserved bits are cleared, and some are not. Specifically, on Zen3/Milan, bits 19 and 42 are not cleared. When emulating such a WRMSR, KVM should not synthesize a #GP, regardless of which bits are set. However, undocumented bits should not be passed through to the hardware MSR. So, rather than checking for reserved bits and synthesizing a #GP, just clear the reserved bits. This may seem pedantic, but since KVM currently does not support the "Host/Guest Only" bits (41:40), it is necessary to clear these bits rather than synthesizing #GP, because some popular guests (e.g Linux) will set the "Host Only" bit even on CPUs that don't support EFER.SVME, and they don't expect a #GP. For example, root@Ubuntu1804:~# perf stat -e r26 -a sleep 1 Performance counter stats for 'system wide': 0 r26 1.001070977 seconds time elapsed Feb 23 03:59:58 Ubuntu1804 kernel: [ 405.379957] unchecked MSR access error: WRMSR to 0xc0010200 (tried to write 0x0000020000130026) at rIP: 0xffffffff9b276a28 (native_write_msr+0x8/0x30) Feb 23 03:59:58 Ubuntu1804 kernel: [ 405.379958] Call Trace: Feb 23 03:59:58 Ubuntu1804 kernel: [ 405.379963] amd_pmu_disable_event+0x27/0x90 Fixes: ca724305a2b0 ("KVM: x86/vPMU: Implement AMD vPMU code for KVM") Reported-by: Lotus Fenn <lotusf@google.com> Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Like Xu <likexu@tencent.com> Reviewed-by: David Dunn <daviddunn@google.com> Message-Id: <20220226234131.2167175-1-jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-15KVM: x86: Forbid VMM to set SYNIC/STIMER MSRs when SynIC wasn't activatedVitaly Kuznetsov
commit b1e34d325397a33d97d845e312d7cf2a8b646b44 upstream. Setting non-zero values to SYNIC/STIMER MSRs activates certain features, this should not happen when KVM_CAP_HYPERV_SYNIC{,2} was not activated. Note, it would've been better to forbid writing anything to SYNIC/STIMER MSRs, including zeroes, however, at least QEMU tries clearing HV_X64_MSR_STIMER0_CONFIG without SynIC. HV_X64_MSR_EOM MSR is somewhat 'special' as writing zero there triggers an action, this also should not happen when SynIC wasn't activated. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220325132140.25650-4-vkuznets@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-15KVM: x86/mmu: do compare-and-exchange of gPTE via the user addressPaolo Bonzini
commit 2a8859f373b0a86f0ece8ec8312607eacf12485d upstream. FNAME(cmpxchg_gpte) is an inefficient mess. It is at least decent if it can go through get_user_pages_fast(), but if it cannot then it tries to use memremap(); that is not just terribly slow, it is also wrong because it assumes that the VM_PFNMAP VMA is contiguous. The right way to do it would be to do the same thing as hva_to_pfn_remapped() does since commit add6a0cd1c5b ("KVM: MMU: try to fix up page faults before giving up", 2016-07-05), using follow_pte() and fixup_user_fault() to determine the correct address to use for memremap(). To do this, one could for example extract hva_to_pfn() for use outside virt/kvm/kvm_main.c. But really there is no reason to do that either, because there is already a perfectly valid address to do the cmpxchg() on, only it is a userspace address. That means doing user_access_begin()/user_access_end() and writing the code in assembly to handle exceptions correctly. Worse, the guest PTE can be 8-byte even on i686 so there is the extra complication of using cmpxchg8b to account for. But at least it is an efficient mess. (Thanks to Linus for suggesting improvement on the inline assembly). Reported-by: Qiuhao Li <qiuhao@sysec.org> Reported-by: Gaoning Pan <pgn@zju.edu.cn> Reported-by: Yongkang Jia <kangel@zju.edu.cn> Reported-by: syzbot+6cde2282daa792c49ab8@syzkaller.appspotmail.com Debugged-by: Tadeusz Struk <tadeusz.struk@linaro.org> Tested-by: Maxim Levitsky <mlevitsk@redhat.com> Cc: stable@vger.kernel.org Fixes: bd53cb35a3e9 ("X86/KVM: Handle PFNs outside of kernel reach when touching GPTEs") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-15KVM: x86: fix sending PV IPILi RongQing
commit c15e0ae42c8e5a61e9aca8aac920517cf7b3e94e upstream. If apic_id is less than min, and (max - apic_id) is greater than KVM_IPI_CLUSTER_SIZE, then the third check condition is satisfied but the new apic_id does not fit the bitmask. In this case __send_ipi_mask should send the IPI. This is mostly theoretical, but it can happen if the apic_ids on three iterations of the loop are for example 1, KVM_IPI_CLUSTER_SIZE, 0. Fixes: aaffcfd1e82 ("KVM: X86: Implement PV IPIs in linux guest") Signed-off-by: Li RongQing <lirongqing@baidu.com> Message-Id: <1646814944-51801-1-git-send-email-lirongqing@baidu.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-15xen: fix is_xen_pmu()Juergen Gross
[ Upstream commit de2ae403b4c0e79a3410e63bc448542fbb9f9bfc ] is_xen_pmu() is taking the cpu number as parameter, but it is not using it. Instead it just tests whether the Xen PMU initialization on the current cpu did succeed. As this test is done by checking a percpu pointer, preemption needs to be disabled in order to avoid switching the cpu while doing the test. While resuming from suspend() this seems not to be the case: [ 88.082751] ACPI: PM: Low-level resume complete [ 88.087933] ACPI: EC: EC started [ 88.091464] ACPI: PM: Restoring platform NVS memory [ 88.097166] xen_acpi_processor: Uploading Xen processor PM info [ 88.103850] Enabling non-boot CPUs ... [ 88.108128] installing Xen timer for CPU 1 [ 88.112763] BUG: using smp_processor_id() in preemptible [00000000] code: systemd-sleep/7138 [ 88.122256] caller is is_xen_pmu+0x12/0x30 [ 88.126937] CPU: 0 PID: 7138 Comm: systemd-sleep Tainted: G W 5.16.13-2.fc32.qubes.x86_64 #1 [ 88.137939] Hardware name: Star Labs StarBook/StarBook, BIOS 7.97 03/21/2022 [ 88.145930] Call Trace: [ 88.148757] <TASK> [ 88.151193] dump_stack_lvl+0x48/0x5e [ 88.155381] check_preemption_disabled+0xde/0xe0 [ 88.160641] is_xen_pmu+0x12/0x30 [ 88.164441] xen_smp_intr_init_pv+0x75/0x100 Fix that by replacing is_xen_pmu() by a simple boolean variable which reflects the Xen PMU initialization state on cpu 0. Modify xen_pmu_init() to return early in case it is being called for a cpu other than cpu 0 and the boolean variable not being set. Fixes: bf6dfb154d93 ("xen/PMU: PMU emulation code") Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/20220325142002.31789-1-jgross@suse.com Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-15KVM: x86/emulator: Defer not-present segment check in ↵Hou Wenlong
__load_segment_descriptor() [ Upstream commit ca85f002258fdac3762c57d12d5e6e401b6a41af ] Per Intel's SDM on the "Instruction Set Reference", when loading segment descriptor, not-present segment check should be after all type and privilege checks. But the emulator checks it first, then #NP is triggered instead of #GP if privilege fails and segment is not present. Put not-present segment check after type and privilege checks in __load_segment_descriptor(). Fixes: 38ba30ba51a00 (KVM: x86 emulator: Emulate task switch in emulator.c) Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com> Message-Id: <52573c01d369f506cadcf7233812427cf7db81a7.1644292363.git.houwenlong.hwl@antgroup.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-15KVM: x86: Fix emulation in writing cr8Zhenzhong Duan
[ Upstream commit f66af9f222f08d5b11ea41c1bd6c07a0f12daa07 ] In emulation of writing to cr8, one of the lowest four bits in TPR[3:0] is kept. According to Intel SDM 10.8.6.1(baremetal scenario): "APIC.TPR[bits 7:4] = CR8[bits 3:0], APIC.TPR[bits 3:0] = 0"; and SDM 28.3(use TPR shadow): "MOV to CR8. The instruction stores bits 3:0 of its source operand into bits 7:4 of VTPR; the remainder of VTPR (bits 3:0 and bits 31:8) are cleared."; and AMD's APM 16.6.4: "Task Priority Sub-class (TPS)-Bits 3 : 0. The TPS field indicates the current sub-priority to be used when arbitrating lowest-priority messages. This field is written with zero when TPR is written using the architectural CR8 register."; so in KVM emulated scenario, clear TPR[3:0] to make a consistent behavior as in other scenarios. This doesn't impact evaluation and delivery of pending virtual interrupts because processor does not use the processor-priority sub-class to determine which interrupts to delivery and which to inhibit. Sub-class is used by hardware to arbitrate lowest priority interrupts, but KVM just does a round-robin style delivery. Fixes: b93463aa59d6 ("KVM: Accelerated apic support") Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220210094506.20181-1-zhenzhong.duan@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-15perf/x86/intel/pt: Fix address filter config for 32-bit kernelAdrian Hunter
[ Upstream commit e5524bf1047eb3b3f3f33b5f59897ba67b3ade87 ] Change from shifting 'unsigned long' to 'u64' to prevent the config bits being lost on a 32-bit kernel. Fixes: eadf48cab4b6b0 ("perf/x86/intel/pt: Add support for address range filtering in PT") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220131072453.2839535-5-adrian.hunter@intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-11Merge tag 'v5.4.188' into v5.4/standard/baseBruce Ashfield
This is the 5.4.188 stable release # gpg: Signature made Mon 28 Mar 2022 02:46:53 AM EDT # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2022-03-28ACPI / x86: Work around broken XSDT on Advantech DAC-BJ01 boardMark Cilissen
commit e702196bf85778f2c5527ca47f33ef2e2fca8297 upstream. On this board the ACPI RSDP structure points to both a RSDT and an XSDT, but the XSDT points to a truncated FADT. This causes all sorts of trouble and usually a complete failure to boot after the following error occurs: ACPI Error: Unsupported address space: 0x20 (*/hwregs-*) ACPI Error: AE_SUPPORT, Unable to initialize fixed events (*/evevent-*) ACPI: Unable to start ACPI Interpreter This leaves the ACPI implementation in such a broken state that subsequent kernel subsystem initialisations go wrong, resulting in among others mismapped PCI memory, SATA and USB enumeration failures, and freezes. As this is an older embedded platform that will likely never see any BIOS updates to address this issue and its default shipping OS only complies to ACPI 1.0, work around this by forcing `acpi=rsdt`. This patch, applied on top of Linux 5.10.102, was confirmed on real hardware to fix the issue. Signed-off-by: Mark Cilissen <mark@yotsuba.nl> Cc: All applicable <stable@vger.kernel.org> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-21Merge tag 'v5.4.185' into v5.4/standard/baseBruce Ashfield
This is the 5.4.185 stable release # gpg: Signature made Wed 16 Mar 2022 08:33:40 AM EDT # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2022-03-21Merge tag 'v5.4.184' into v5.4/standard/baseBruce Ashfield
This is the 5.4.184 stable release # gpg: Signature made Fri 11 Mar 2022 05:22:43 AM EST # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2022-03-16KVM: SVM: Don't flush cache if hardware enforces cache coherency across ↵Krish Sadhukhan
encryption domains commit e1ebb2b49048c4767cfa0d8466f9c701e549fa5e upstream. In some hardware implementations, coherency between the encrypted and unencrypted mappings of the same physical page in a VM is enforced. In such a system, it is not required for software to flush the VM's page from all CPU caches in the system prior to changing the value of the C-bit for the page. So check that bit before flushing the cache. Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lkml.kernel.org/r/20200917212038.5090-4-krish.sadhukhan@oracle.com [ The linux-5.4.y stable branch does not have the Linux 5.7 refactoring commit eaf78265a4ab ("KVM: SVM: Move SEV code to separate file") so the change was manually applied to sev_clflush_pages() in arch/x86/kvm/svm.c. ] Signed-off-by: Liam Merwick <liam.merwick@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-16x86/mm/pat: Don't flush cache if hardware enforces cache coherency across ↵Krish Sadhukhan
encryption domnains commit 75d1cc0e05af579301ce4e49cf6399be4b4e6e76 upstream. In some hardware implementations, coherency between the encrypted and unencrypted mappings of the same physical page is enforced. In such a system, it is not required for software to flush the page from all CPU caches in the system prior to changing the value of the C-bit for the page. So check that bit before flushing the cache. [ bp: Massage commit message. ] Suggested-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200917212038.5090-3-krish.sadhukhan@oracle.com Signed-off-by: Liam Merwick <liam.merwick@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-16x86/cpu: Add hardware-enforced cache coherency as a CPUID featureKrish Sadhukhan
commit 5866e9205b47a983a77ebc8654949f696342f2ab upstream. In some hardware implementations, coherency between the encrypted and unencrypted mappings of the same physical page is enforced. In such a system, it is not required for software to flush the page from all CPU caches in the system prior to changing the value of the C-bit for a page. This hardware- enforced cache coherency is indicated by EAX[10] in CPUID leaf 0x8000001f. [ bp: Use one of the free slots in word 3. ] Suggested-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200917212038.5090-2-krish.sadhukhan@oracle.com Signed-off-by: Liam Merwick <liam.merwick@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-16x86/cpufeatures: Mark two free bits in word 3Borislav Petkov
commit fbd5969d1ff2598143d6a6fbc9491a9e40ab9b82 upstream. ... so that they get reused when needed. No functional changes. Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200604104150.2056-1-bp@alien8.de Signed-off-by: Liam Merwick <liam.merwick@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-11x86/speculation: Warn about eIBRS + LFENCE + Unprivileged eBPF + SMTJosh Poimboeuf
commit 0de05d056afdb00eca8c7bbb0c79a3438daf700c upstream. The commit 44a3918c8245 ("x86/speculation: Include unprivileged eBPF status in Spectre v2 mitigation reporting") added a warning for the "eIBRS + unprivileged eBPF" combination, which has been shown to be vulnerable against Spectre v2 BHB-based attacks. However, there's no warning about the "eIBRS + LFENCE retpoline + unprivileged eBPF" combo. The LFENCE adds more protection by shortening the speculation window after a mispredicted branch. That makes an attack significantly more difficult, even with unprivileged eBPF. So at least for now the logic doesn't warn about that combination. But if you then add SMT into the mix, the SMT attack angle weakens the effectiveness of the LFENCE considerably. So extend the "eIBRS + unprivileged eBPF" warning to also include the "eIBRS + LFENCE + unprivileged eBPF + SMT" case. [ bp: Massage commit message. ] Suggested-by: Alyssa Milburn <alyssa.milburn@linux.intel.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-11x86/speculation: Warn about Spectre v2 LFENCE mitigationJosh Poimboeuf
commit eafd987d4a82c7bb5aa12f0e3b4f8f3dea93e678 upstream. With: f8a66d608a3e ("x86,bugs: Unconditionally allow spectre_v2=retpoline,amd") it became possible to enable the LFENCE "retpoline" on Intel. However, Intel doesn't recommend it, as it has some weaknesses compared to retpoline. Now AMD doesn't recommend it either. It can still be left available as a cmdline option. It's faster than retpoline but is weaker in certain scenarios -- particularly SMT, but even non-SMT may be vulnerable in some cases. So just unconditionally warn if the user requests it on the cmdline. [ bp: Massage commit message. ] Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-11x86/speculation: Use generic retpoline by default on AMDKim Phillips
commit 244d00b5dd4755f8df892c86cab35fb2cfd4f14b upstream. AMD retpoline may be susceptible to speculation. The speculation execution window for an incorrect indirect branch prediction using LFENCE/JMP sequence may potentially be large enough to allow exploitation using Spectre V2. By default, don't use retpoline,lfence on AMD. Instead, use the generic retpoline. Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-11x86/speculation: Include unprivileged eBPF status in Spectre v2 mitigation ↵Josh Poimboeuf
reporting commit 44a3918c8245ab10c6c9719dd12e7a8d291980d8 upstream. With unprivileged eBPF enabled, eIBRS (without retpoline) is vulnerable to Spectre v2 BHB-based attacks. When both are enabled, print a warning message and report it in the 'spectre_v2' sysfs vulnerabilities file. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> [fllinden@amazon.com: backported to 5.4] Signed-off-by: Frank van der Linden <fllinden@amazon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-11x86/speculation: Add eIBRS + Retpoline optionsPeter Zijlstra
commit 1e19da8522c81bf46b335f84137165741e0d82b7 upstream. Thanks to the chaps at VUsec it is now clear that eIBRS is not sufficient, therefore allow enabling of retpolines along with eIBRS. Add spectre_v2=eibrs, spectre_v2=eibrs,lfence and spectre_v2=eibrs,retpoline options to explicitly pick your preferred means of mitigation. Since there's new mitigations there's also user visible changes in /sys/devices/system/cpu/vulnerabilities/spectre_v2 to reflect these new mitigations. [ bp: Massage commit message, trim error messages, do more precise eIBRS mode checking. ] Co-developed-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Patrick Colp <patrick.colp@oracle.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-11x86/speculation: Rename RETPOLINE_AMD to RETPOLINE_LFENCEPeter Zijlstra (Intel)
commit d45476d9832409371537013ebdd8dc1a7781f97a upstream. The RETPOLINE_AMD name is unfortunate since it isn't necessarily AMD only, in fact Hygon also uses it. Furthermore it will likely be sufficient for some Intel processors. Therefore rename the thing to RETPOLINE_LFENCE to better describe what it is. Add the spectre_v2=retpoline,lfence option as an alias to spectre_v2=retpoline,amd to preserve existing setups. However, the output of /sys/devices/system/cpu/vulnerabilities/spectre_v2 will be changed. [ bp: Fix typos, massage. ] Co-developed-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> [fllinden@amazon.com: backported to 5.4] Signed-off-by: Frank van der Linden <fllinden@amazon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-11x86,bugs: Unconditionally allow spectre_v2=retpoline,amdPeter Zijlstra
commit f8a66d608a3e471e1202778c2a36cbdc96bae73b upstream. Currently Linux prevents usage of retpoline,amd on !AMD hardware, this is unfriendly and gets in the way of testing. Remove this restriction. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Borislav Petkov <bp@suse.de> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/r/20211026120310.487348118@infradead.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-11x86/speculation: Merge one test in spectre_v2_user_select_mitigation()Borislav Petkov
upstream commit a5ce9f2bb665d1d2b31f139a02dbaa2dfbb62fa6 upstream. Merge the test whether the CPU supports STIBP into the test which determines whether STIBP is required. Thus try to simplify what is already an insane logic. Remove a superfluous newline in a comment, while at it. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Anthony Steinhauser <asteinhauser@google.com> Link: https://lkml.kernel.org/r/20200615065806.GB14668@zn.tnic [fllinden@amazon.com: fixed contextual conflict (comment) for 5.4] Signed-off-by: Frank van der Linden <fllinden@amazon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-02Merge tag 'v5.4.182' into v5.4/standard/baseBruce Ashfield
This is the 5.4.182 stable release # gpg: Signature made Wed 02 Mar 2022 05:41:32 AM EST # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2022-03-02Merge tag 'v5.4.181' into v5.4/standard/baseBruce Ashfield
This is the 5.4.181 stable release # gpg: Signature made Wed 23 Feb 2022 06:00:05 AM EST # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2022-03-02Merge tag 'v5.4.180' into v5.4/standard/baseBruce Ashfield
This is the 5.4.180 stable release # gpg: Signature made Wed 16 Feb 2022 06:53:12 AM EST # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2022-03-02x86/fpu: Correct pkru/xstate inconsistencyBrian Geffon
When eagerly switching PKRU in switch_fpu_finish() it checks that current is not a kernel thread as kernel threads will never use PKRU. It's possible that this_cpu_read_stable() on current_task (ie. get_current()) is returning an old cached value. To resolve this reference next_p directly rather than relying on current. As written it's possible when switching from a kernel thread to a userspace thread to observe a cached PF_KTHREAD flag and never restore the PKRU. And as a result this issue only occurs when switching from a kernel thread to a userspace thread, switching from a non kernel thread works perfectly fine because all that is considered in that situation are the flags from some other non kernel task and the next fpu is passed in to switch_fpu_finish(). This behavior only exists between 5.2 and 5.13 when it was fixed by a rewrite decoupling PKRU from xstate, in: commit 954436989cc5 ("x86/fpu: Remove PKRU handling from switch_fpu_finish()") Unfortunately backporting the fix from 5.13 is probably not realistic as it's part of a 60+ patch series which rewrites most of the PKRU handling. Fixes: 0cecca9d03c9 ("x86/fpu: Eager switch PKRU state") Signed-off-by: Brian Geffon <bgeffon@google.com> Signed-off-by: Willis Kung <williskung@google.com> Tested-by: Willis Kung <williskung@google.com> Cc: <stable@vger.kernel.org> # v5.4.x Cc: <stable@vger.kernel.org> # v5.10.x Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-23KVM: x86/pmu: Use AMD64_RAW_EVENT_MASK for PERF_TYPE_RAWJim Mattson
[ Upstream commit 710c476514313c74045c41c0571bb5178fd16e3d ] AMD's event select is 3 nybbles, with the high nybble in bits 35:32 of a PerfEvtSeln MSR. Don't mask off the high nybble when configuring a RAW perf event. Fixes: ca724305a2b0 ("KVM: x86/vPMU: Implement AMD vPMU code for KVM") Signed-off-by: Jim Mattson <jmattson@google.com> Message-Id: <20220203014813.2130559-2-jmattson@google.com> Reviewed-by: David Dunn <daviddunn@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-02-23Revert "svm: Add warning message for AVIC IPI invalid target"Sean Christopherson
commit dd4589eee99db8f61f7b8f7df1531cad3f74a64d upstream. Remove a WARN on an "AVIC IPI invalid target" exit, the WARN is trivial to trigger from guest as it will fail on any destination APIC ID that doesn't exist from the guest's perspective. Don't bother recording anything in the kernel log, the common tracepoint for kvm_avic_incomplete_ipi() is sufficient for debugging. This reverts commit 37ef0c4414c9743ba7f1af4392f0a27a99649f2a. Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220204214205.3306634-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-16KVM: nVMX: eVMCS: Filter out VM_EXIT_SAVE_VMX_PREEMPTION_TIMERVitaly Kuznetsov
[ Upstream commit 7a601e2cf61558dfd534a9ecaad09f5853ad8204 ] Enlightened VMCS v1 doesn't have VMX_PREEMPTION_TIMER_VALUE field, PIN_BASED_VMX_PREEMPTION_TIMER is also filtered out already so it makes sense to filter out VM_EXIT_SAVE_VMX_PREEMPTION_TIMER too. Note, none of the currently existing Windows/Hyper-V versions are known to enable 'save VMX-preemption timer value' when eVMCS is in use, the change is aimed at making the filtering future proof. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220112170134.1904308-3-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-02-03Merge tag 'v5.4.174' into v5.4/standard/baseBruce Ashfield
This is the 5.4.174 stable release # gpg: Signature made Thu 27 Jan 2022 03:20:01 AM EST # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2022-01-27um: registers: Rename function names to avoid conflicts and build problemsRandy Dunlap
[ Upstream commit 077b7320942b64b0da182aefd83c374462a65535 ] The function names init_registers() and restore_registers() are used in several net/ethernet/ and gpu/drm/ drivers for other purposes (not calls to UML functions), so rename them. This fixes multiple build errors. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: linux-um@lists.infradead.org Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27x86/mce: Mark mce_read_aux() noinstrBorislav Petkov
[ Upstream commit db6c996d6ce45dfb44891f0824a65ecec216f47a ] Fixes vmlinux.o: warning: objtool: do_machine_check()+0x681: call to mce_read_aux() leaves .noinstr.text section Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211208111343.8130-10-bp@alien8.de Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27x86/mce: Mark mce_end() noinstrBorislav Petkov
[ Upstream commit b4813539d37fa31fed62cdfab7bd2dd8929c5b2e ] It is called by the #MC handler which is noinstr. Fixes vmlinux.o: warning: objtool: do_machine_check()+0xbd6: call to memset() leaves .noinstr.text section Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211208111343.8130-9-bp@alien8.de Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27x86/mce: Mark mce_panic() noinstrBorislav Petkov
[ Upstream commit 3c7ce80a818fa7950be123cac80cd078e5ac1013 ] And allow instrumentation inside it because it does calls to other facilities which will not be tagged noinstr. Fixes vmlinux.o: warning: objtool: do_machine_check()+0xc73: call to mce_panic() leaves .noinstr.text section Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211208111343.8130-8-bp@alien8.de Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27x86/mm: Flush global TLB when switching to trampoline page-tableJoerg Roedel
[ Upstream commit 71d5049b053876afbde6c3273250b76935494ab2 ] Move the switching code into a function so that it can be re-used and add a global TLB flush. This makes sure that usage of memory which is not mapped in the trampoline page-table is reliably caught. Also move the clearing of CR4.PCIDE before the CR3 switch because the cr4_clear_bits() function will access data not mapped into the trampoline page-table. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211202153226.22946-4-joro@8bytes.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27x86/mce/inject: Avoid out-of-bounds write when setting flagsZhang Zixun
[ Upstream commit de768416b203ac84e02a757b782a32efb388476f ] A contrived zero-length write, for example, by using write(2): ... ret = write(fd, str, 0); ... to the "flags" file causes: BUG: KASAN: stack-out-of-bounds in flags_write Write of size 1 at addr ffff888019be7ddf by task writefile/3787 CPU: 4 PID: 3787 Comm: writefile Not tainted 5.16.0-rc7+ #12 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 due to accessing buf one char before its start. Prevent such out-of-bounds access. [ bp: Productize into a proper patch. Link below is the next best thing because the original mail didn't get archived on lore. ] Fixes: 0451d14d0561 ("EDAC, mce_amd_inj: Modify flags attribute to use string arguments") Signed-off-by: Zhang Zixun <zhang133010@icloud.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/linux-edac/YcnePfF1OOqoQwrX@zn.tnic/ Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27x86/gpu: Reserve stolen memory for first integrated Intel GPULucas De Marchi
commit 9c494ca4d3a535f9ca11ad6af1813983c1c6cbdd upstream. "Stolen memory" is memory set aside for use by an Intel integrated GPU. The intel_graphics_quirks() early quirk reserves this memory when it is called for a GPU that appears in the intel_early_ids[] table of integrated GPUs. Previously intel_graphics_quirks() was marked as QFLAG_APPLY_ONCE, so it was called only for the first Intel GPU found. If a discrete GPU happened to be enumerated first, intel_graphics_quirks() was called for it but not for any integrated GPU found later. Therefore, stolen memory for such an integrated GPU was never reserved. For example, this problem occurs in this Alderlake-P (integrated) + DG2 (discrete) topology where the DG2 is found first, but stolen memory is associated with the integrated GPU: - 00:01.0 Bridge `- 03:00.0 DG2 discrete GPU - 00:02.0 Integrated GPU (with stolen memory) Remove the QFLAG_APPLY_ONCE flag and call intel_graphics_quirks() for every Intel GPU. Reserve stolen memory for the first GPU that appears in intel_early_ids[]. [bhelgaas: commit log, add code comment, squash in https://lore.kernel.org/r/20220118190558.2ququ4vdfjuahicm@ldmartin-desk2] Link: https://lore.kernel.org/r/20220114002843.2083382-1-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-25Merge tag 'v5.4.173' into v5.4/standard/baseBruce Ashfield
This is the 5.4.173 stable release # gpg: Signature made Thu 20 Jan 2022 03:19:26 AM EST # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2022-01-20KVM: x86: remove PMU FIXED_CTR3 from msrs_to_save_allWei Wang
commit 9fb12fe5b93b94b9e607509ba461e17f4cc6a264 upstream. The fixed counter 3 is used for the Topdown metrics, which hasn't been enabled for KVM guests. Userspace accessing to it will fail as it's not included in get_fixed_pmc(). This breaks KVM selftests on ICX+ machines, which have this counter. To reproduce it on ICX+ machines, ./state_test reports: ==== Test Assertion Failure ==== lib/x86_64/processor.c:1078: r == nmsrs pid=4564 tid=4564 - Argument list too long 1 0x000000000040b1b9: vcpu_save_state at processor.c:1077 2 0x0000000000402478: main at state_test.c:209 (discriminator 6) 3 0x00007fbe21ed5f92: ?? ??:0 4 0x000000000040264d: _start at ??:? Unexpected result from KVM_GET_MSRS, r: 17 (failed MSR was 0x30c) With this patch, it works well. Signed-off-by: Wei Wang <wei.w.wang@intel.com> Message-Id: <20211217124934.32893-1-wei.w.wang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Fixes: e2ada66ec418 ("kvm: x86: Add Intel PMU MSRs to msrs_to_save[]") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>