summaryrefslogtreecommitdiffstats
path: root/arch
AgeCommit message (Collapse)Author
2020-04-23KVM: s390: vsie: Fix possible race when shadowing region 3 tablesDavid Hildenbrand
[ Upstream commit 1493e0f944f3c319d11e067c185c904d01c17ae5 ] We have to properly retry again by returning -EINVAL immediately in case somebody else instantiated the table concurrently. We missed to add the goto in this function only. The code now matches the other, similar shadowing functions. We are overwriting an existing region 2 table entry. All allocated pages are added to the crst_list to be freed later, so they are not lost forever. However, when unshadowing the region 2 table, we wouldn't trigger unshadowing of the original shadowed region 3 table that we replaced. It would get unshadowed when the original region 3 table is modified. As it's not connected to the page table hierarchy anymore, it's not going to get used anymore. However, for a limited time, this page table will stick around, so it's in some sense a temporary memory leak. Identified by manual code inspection. I don't think this classifies as stable material. Fixes: 998f637cc4b9 ("s390/mm: avoid races on region/segment/page table shadowing") Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20200403153050.20569-4-david@redhat.com Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-23um: ubd: Prevent buffer overrun on command completionGabriel Krisman Bertazi
[ Upstream commit 6e682d53fc1ef73a169e2a5300326cb23abb32ee ] On the hypervisor side, when completing commands and the pipe is full, we retry writing only the entries that failed, by offsetting io_req_buffer, but we don't reduce the number of bytes written, which can cause a buffer overrun of io_req_buffer, and write garbage to the pipe. Cc: Martyn Welch <martyn.welch@collabora.com> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-23s390/cpum_sf: Fix wrong page count in error messageThomas Richter
[ Upstream commit 4141b6a5e9f171325effc36a22eb92bf961e7a5c ] When perf record -e SF_CYCLES_BASIC_DIAG runs with very high frequency, the samples arrive faster than the perf process can save them to file. Eventually, for longer running processes, this leads to the siutation where the trace buffers allocated by perf slowly fills up. At one point the auxiliary trace buffer is full and the CPU Measurement sampling facility is turned off. Furthermore a warning is printed to the kernel log buffer: cpum_sf: The AUX buffer with 0 pages for the diagnostic-sampling mode is full The number of allocated pages for the auxiliary trace buffer is shown as zero pages. That is wrong. Fix this by saving the number of allocated pages before entering the work loop in the interrupt handler. When the interrupt handler processes the samples, it may detect the buffer full condition and stop sampling, reducing the buffer size to zero. Print the correct value in the error message: cpum_sf: The AUX buffer with 256 pages for the diagnostic-sampling mode is full Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-23powerpc/maple: Fix declaration made after definitionNathan Chancellor
[ Upstream commit af6cf95c4d003fccd6c2ecc99a598fb854b537e7 ] When building ppc64 defconfig, Clang errors (trimmed for brevity): arch/powerpc/platforms/maple/setup.c:365:1: error: attribute declaration must precede definition [-Werror,-Wignored-attributes] machine_device_initcall(maple, maple_cpc925_edac_setup); ^ machine_device_initcall expands to __define_machine_initcall, which in turn has the macro machine_is used in it, which declares mach_##name with an __attribute__((weak)). define_machine actually defines mach_##name, which in this file happens before the declaration, hence the warning. To fix this, move define_machine after machine_device_initcall so that the declaration occurs before the definition, which matches how machine_device_initcall and define_machine work throughout arch/powerpc. While we're here, remove some spaces before tabs. Fixes: 8f101a051ef0 ("edac: cpc925 MC platform device setup") Reported-by: Nick Desaulniers <ndesaulniers@google.com> Suggested-by: Ilie Halip <ilie.halip@gmail.com> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200323222729.15365-1-natechancellor@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-23s390/cpuinfo: fix wrong output when CPU0 is offlineAlexander Gordeev
[ Upstream commit 872f27103874a73783aeff2aac2b41a489f67d7c ] /proc/cpuinfo should not print information about CPU 0 when it is offline. Fixes: 281eaa8cb67c ("s390/cpuinfo: simplify locking and skip offline cpus early") Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> [heiko.carstens@de.ibm.com: shortened commit message] Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-23x86/Hyper-V: Report crash data in die() when panic_on_oops is setTianyu Lan
[ Upstream commit f3a99e761efa616028b255b4de58e9b5b87c5545 ] When oops happens with panic_on_oops unset, the oops thread is killed by die() and system continues to run. In such case, guest should not report crash register data to host since system still runs. Check panic_on_oops and return directly in hyperv_report_panic() when the function is called in the die() and panic_on_oops is unset. Fix it. Fixes: 7ed4325a44ea ("Drivers: hv: vmbus: Make panic reporting to be more useful") Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20200406155331.2105-7-Tianyu.Lan@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-23x86/Hyper-V: Report crash register data or kmsg before running crash kernelTianyu Lan
commit a11589563e96bf262767294b89b25a9d44e7303b upstream. We want to notify Hyper-V when a Linux guest VM crash occurs, so there is a record of the crash even when kdump is enabled. But crash_kexec_post_notifiers defaults to "false", so the kdump kernel runs before the notifiers and Hyper-V never gets notified. Fix this by always setting crash_kexec_post_notifiers to be true for Hyper-V VMs. Fixes: 81b18bce48af ("Drivers: HV: Send one page worth of kmsg dump over Hyper-V during panic") Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com> Link: https://lore.kernel.org/r/20200406155331.2105-5-Tianyu.Lan@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-23ARM: dts: imx6: Use gpc for FEC interrupt controller to fix wake on LAN.Martin Fuzzey
commit 4141f1a40fc0789f6fd4330e171e1edf155426aa upstream. In order to wake from suspend by ethernet magic packets the GPC must be used as intc does not have wakeup functionality. But the FEC DT node currently uses interrupt-extended, specificying intc, thus breaking WoL. This problem is probably fallout from the stacked domain conversion as intc used to chain to GPC. So replace "interrupts-extended" by "interrupts" to use the default parent which is GPC. Fixes: b923ff6af0d5 ("ARM: imx6: convert GPC to stacked domains") Signed-off-by: Martin Fuzzey <martin.fuzzey@flowbird.group> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-23arm, bpf: Fix bugs with ALU64 {RSH, ARSH} BPF_K shift by 0Luke Nelson
commit bb9562cf5c67813034c96afb50bd21130a504441 upstream. The current arm BPF JIT does not correctly compile RSH or ARSH when the immediate shift amount is 0. This causes the "rsh64 by 0 imm" and "arsh64 by 0 imm" BPF selftests to hang the kernel by reaching an instruction the verifier determines to be unreachable. The root cause is in how immediate right shifts are encoded on arm. For LSR and ASR (logical and arithmetic right shift), a bit-pattern of 00000 in the immediate encodes a shift amount of 32. When the BPF immediate is 0, the generated code shifts by 32 instead of the expected behavior (a no-op). This patch fixes the bugs by adding an additional check if the BPF immediate is 0. After the change, the above mentioned BPF selftests pass. Fixes: 39c13c204bb11 ("arm: eBPF JIT compiler") Co-developed-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200408181229.10909-1-luke.r.nels@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-23arm, bpf: Fix offset overflow for BPF_MEM BPF_DWLuke Nelson
commit 4178417cc5359c329790a4a8f4a6604612338cca upstream. This patch fixes an incorrect check in how immediate memory offsets are computed for BPF_DW on arm. For BPF_LDX/ST/STX + BPF_DW, the 32-bit arm JIT breaks down an 8-byte access into two separate 4-byte accesses using off+0 and off+4. If off fits in imm12, the JIT emits a ldr/str instruction with the immediate and avoids the use of a temporary register. While the current check off <= 0xfff ensures that the first immediate off+0 doesn't overflow imm12, it's not sufficient for the second immediate off+4, which may cause the second access of BPF_DW to read/write the wrong address. This patch fixes the problem by changing the check to off <= 0xfff - 4 for BPF_DW, ensuring off+4 will never overflow. A side effect of simplifying the check is that it now allows using negative immediate offsets in ldr/str. This means that small negative offsets can also avoid the use of a temporary register. This patch introduces no new failures in test_verifier or test_bpf.c. Fixes: c5eae692571d6 ("ARM: net: bpf: improve 64-bit store implementation") Fixes: ec19e02b343db ("ARM: net: bpf: fix LDX instructions") Co-developed-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200409221752.28448-1-luke.r.nels@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-21x86/resctrl: Fix invalid attempt at removing the default resource groupReinette Chatre
commit b0151da52a6d4f3951ea24c083e7a95977621436 upstream. The default resource group ("rdtgroup_default") is associated with the root of the resctrl filesystem and should never be removed. New resource groups can be created as subdirectories of the resctrl filesystem and they can be removed from user space. There exists a safeguard in the directory removal code (rdtgroup_rmdir()) that ensures that only subdirectories can be removed by testing that the directory to be removed has to be a child of the root directory. A possible deadlock was recently fixed with 334b0f4e9b1b ("x86/resctrl: Fix a deadlock due to inaccurate reference"). This fix involved associating the private data of the "mon_groups" and "mon_data" directories to the resource group to which they belong instead of NULL as before. A consequence of this change was that the original safeguard code preventing removal of "mon_groups" and "mon_data" found in the root directory failed resulting in attempts to remove the default resource group that ends in a BUG: kernel BUG at mm/slub.c:3969! invalid opcode: 0000 [#1] SMP PTI Call Trace: rdtgroup_rmdir+0x16b/0x2c0 kernfs_iop_rmdir+0x5c/0x90 vfs_rmdir+0x7a/0x160 do_rmdir+0x17d/0x1e0 do_syscall_64+0x55/0x1d0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fix this by improving the directory removal safeguard to ensure that subdirectories of the resctrl root directory can only be removed if they are a child of the resctrl filesystem's root _and_ not associated with the default resource group. Fixes: 334b0f4e9b1b ("x86/resctrl: Fix a deadlock due to inaccurate reference") Reported-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/884cbe1773496b5dbec1b6bd11bb50cffa83603d.1584461853.git.reinette.chatre@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-21x86/resctrl: Preserve CDP enable over CPU hotplugJames Morse
commit 9fe0450785abbc04b0ed5d3cf61fcdb8ab656b4b upstream. Resctrl assumes that all CPUs are online when the filesystem is mounted, and that CPUs remember their CDP-enabled state over CPU hotplug. This goes wrong when resctrl's CDP-enabled state changes while all the CPUs in a domain are offline. When a domain comes online, enable (or disable!) CDP to match resctrl's current setting. Fixes: 5ff193fbde20 ("x86/intel_rdt: Add basic resctrl filesystem support") Suggested-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200221162105.154163-1-james.morse@arm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-21x86/microcode/AMD: Increase microcode PATCH_MAX_SIZEJohn Allen
commit bdf89df3c54518eed879d8fac7577fcfb220c67e upstream. Future AMD CPUs will have microcode patches that exceed the default 4K patch size. Raise our limit. Signed-off-by: John Allen <john.allen@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org # v4.14.. Link: https://lkml.kernel.org/r/20200409152931.GA685273@mojo.amd.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-21kvm: x86: Host feature SSBD doesn't imply guest feature SPEC_CTRL_SSBDJim Mattson
commit 396d2e878f92ec108e4293f1c77ea3bc90b414ff upstream. The host reports support for the synthetic feature X86_FEATURE_SSBD when any of the three following hardware features are set: CPUID.(EAX=7,ECX=0):EDX.SSBD[bit 31] CPUID.80000008H:EBX.AMD_SSBD[bit 24] CPUID.80000008H:EBX.VIRT_SSBD[bit 25] Either of the first two hardware features implies the existence of the IA32_SPEC_CTRL MSR, but CPUID.80000008H:EBX.VIRT_SSBD[bit 25] does not. Therefore, CPUID.(EAX=7,ECX=0):EDX.SSBD[bit 31] should only be set in the guest if CPUID.(EAX=7,ECX=0):EDX.SSBD[bit 31] or CPUID.80000008H:EBX.AMD_SSBD[bit 24] is set on the host. Fixes: 0c54914d0c52a ("KVM: x86: use Intel speculation bugs and features as derived in generic x86 code") Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Jacob Xu <jacobhxu@google.com> Reviewed-by: Peter Shier <pshier@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Reported-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [bwh: Backported to 4.x: adjust indentation] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-17efi/x86: Fix the deletion of variables in mixed modeGary Lin
[ Upstream commit a4b81ccfd4caba017d2b84720b6de4edd16911a0 ] efi_thunk_set_variable() treated the NULL "data" pointer as an invalid parameter, and this broke the deletion of variables in mixed mode. This commit fixes the check of data so that the userspace program can delete a variable in mixed mode. Fixes: 8319e9d5ad98ffcc ("efi/x86: Handle by-ref arguments covering multiple pages in mixed mode") Signed-off-by: Gary Lin <glin@suse.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200408081606.1504-1-glin@suse.com Link: https://lore.kernel.org/r/20200409130434.6736-9-ardb@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-17powerpc/fsl_booke: Avoid creating duplicate tlb1 entryLaurentiu Tudor
[ Upstream commit aa4113340ae6c2811e046f08c2bc21011d20a072 ] In the current implementation, the call to loadcam_multi() is wrapped between switch_to_as1() and restore_to_as0() calls so, when it tries to create its own temporary AS=1 TLB1 entry, it ends up duplicating the existing one created by switch_to_as1(). Add a check to skip creating the temporary entry if already running in AS=1. Fixes: d9e1831a4202 ("powerpc/85xx: Load all early TLB entries at once") Cc: stable@vger.kernel.org # v4.4+ Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com> Acked-by: Scott Wood <oss@buserror.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200123111914.2565-1-laurentiu.tudor@nxp.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-17powerpc: Make setjmp/longjmp signature standardClement Courbet
commit c17eb4dca5a353a9dbbb8ad6934fe57af7165e91 upstream. Declaring setjmp()/longjmp() as taking longs makes the signature non-standard, and makes clang complain. In the past, this has been worked around by adding -ffreestanding to the compile flags. The implementation looks like it only ever propagates the value (in longjmp) or sets it to 1 (in setjmp), and we only call longjmp with integer parameters. This allows removing -ffreestanding from the compilation flags. Fixes: c9029ef9c957 ("powerpc: Avoid clang warnings around setjmp and longjmp") Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Clement Courbet <courbet@google.com> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Tested-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200330080400.124803-1-courbet@google.com Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-17powerpc: Add attributes for setjmp/longjmpSegher Boessenkool
commit aa497d4352414aad22e792b35d0aaaa12bbc37c5 upstream. The setjmp function should be declared as "returns_twice", or bad things can happen[1]. This does not actually change generated code in my testing. The longjmp function should be declared as "noreturn", so that the compiler can optimise calls to it better. This makes the generated code a little shorter. 1: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-returns_005ftwice-function-attribute Signed-off-by: Segher Boessenkool <segher@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/c02ce4a573f3bac907e2c70957a2d1275f910013.1567605586.git.segher@kernel.crashing.org Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-17powerpc/kprobes: Ignore traps that happened in real modeChristophe Leroy
commit 21f8b2fa3ca5b01f7a2b51b89ce97a3705a15aa0 upstream. When a program check exception happens while MMU translation is disabled, following Oops happens in kprobe_handler() in the following code: } else if (*addr != BREAKPOINT_INSTRUCTION) { BUG: Unable to handle kernel data access on read at 0x0000e268 Faulting instruction address: 0xc000ec34 Oops: Kernel access of bad area, sig: 11 [#1] BE PAGE_SIZE=16K PREEMPT CMPC885 Modules linked in: CPU: 0 PID: 429 Comm: cat Not tainted 5.6.0-rc1-s3k-dev-00824-g84195dc6c58a #3267 NIP: c000ec34 LR: c000ecd8 CTR: c019cab8 REGS: ca4d3b58 TRAP: 0300 Not tainted (5.6.0-rc1-s3k-dev-00824-g84195dc6c58a) MSR: 00001032 <ME,IR,DR,RI> CR: 2a4d3c52 XER: 00000000 DAR: 0000e268 DSISR: c0000000 GPR00: c000b09c ca4d3c10 c66d0620 00000000 ca4d3c60 00000000 00009032 00000000 GPR08: 00020000 00000000 c087de44 c000afe0 c66d0ad0 100d3dd6 fffffff3 00000000 GPR16: 00000000 00000041 00000000 ca4d3d70 00000000 00000000 0000416d 00000000 GPR24: 00000004 c53b6128 00000000 0000e268 00000000 c07c0000 c07bb6fc ca4d3c60 NIP [c000ec34] kprobe_handler+0x128/0x290 LR [c000ecd8] kprobe_handler+0x1cc/0x290 Call Trace: [ca4d3c30] [c000b09c] program_check_exception+0xbc/0x6fc [ca4d3c50] [c000e43c] ret_from_except_full+0x0/0x4 --- interrupt: 700 at 0xe268 Instruction dump: 913e0008 81220000 38600001 3929ffff 91220000 80010024 bb410008 7c0803a6 38210020 4e800020 38600000 4e800020 <813b0000> 6d2a7fe0 2f8a0008 419e0154 ---[ end trace 5b9152d4cdadd06d ]--- kprobe is not prepared to handle events in real mode and functions running in real mode should have been blacklisted, so kprobe_handler() can safely bail out telling 'this trap is not mine' for any trap that happened while in real-mode. If the trap happened with MSR_IR or MSR_DR cleared, return 0 immediately. Reported-by: Larry Finger <Larry.Finger@lwfinger.net> Fixes: 6cc89bad60a6 ("powerpc/kprobes: Invoke handlers directly") Cc: stable@vger.kernel.org # v4.10+ Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/424331e2006e7291a1bfe40e7f3fa58825f565e1.1582054578.git.christophe.leroy@c-s.fr Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-17powerpc/xive: Use XIVE_BAD_IRQ instead of zero to catch non configured IPIsCédric Le Goater
commit b1a504a6500df50e83b701b7946b34fce27ad8a3 upstream. When a CPU is brought up, an IPI number is allocated and recorded under the XIVE CPU structure. Invalid IPI numbers are tracked with interrupt number 0x0. On the PowerNV platform, the interrupt number space starts at 0x10 and this works fine. However, on the sPAPR platform, it is possible to allocate the interrupt number 0x0 and this raises an issue when CPU 0 is unplugged. The XIVE spapr driver tracks allocated interrupt numbers in a bitmask and it is not correctly updated when interrupt number 0x0 is freed. It stays allocated and it is then impossible to reallocate. Fix by using the XIVE_BAD_IRQ value instead of zero on both platforms. Reported-by: David Gibson <david@gibson.dropbear.id.au> Fixes: eac1e731b59e ("powerpc/xive: guest exploitation of the XIVE interrupt controller") Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Tested-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200306150143.5551-2-clg@kaod.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-17powerpc/hash64/devmap: Use H_PAGE_THP_HUGE when setting up huge devmap PTE ↵Aneesh Kumar K.V
entries commit 36b78402d97a3b9aeab136feb9b00d8647ec2c20 upstream. H_PAGE_THP_HUGE is used to differentiate between a THP hugepage and hugetlb hugepage entries. The difference is WRT how we handle hash fault on these address. THP address enables MPSS in segments. We want to manage devmap hugepage entries similar to THP pt entries. Hence use H_PAGE_THP_HUGE for devmap huge PTE entries. With current code while handling hash PTE fault, we do set is_thp = true when finding devmap PTE huge PTE entries. Current code also does the below sequence we setting up huge devmap entries. entry = pmd_mkhuge(pfn_t_pmd(pfn, prot)); if (pfn_t_devmap(pfn)) entry = pmd_mkdevmap(entry); In that case we would find both H_PAGE_THP_HUGE and PAGE_DEVMAP set for huge devmap PTE entries. This results in false positive error like below. kernel BUG at /home/kvaneesh/src/linux/mm/memory.c:4321! Oops: Exception in kernel mode, sig: 5 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: CPU: 56 PID: 67996 Comm: t_mmap_dio Not tainted 5.6.0-rc4-59640-g371c804dedbc #128 .... NIP [c00000000044c9e4] __follow_pte_pmd+0x264/0x900 LR [c0000000005d45f8] dax_writeback_one+0x1a8/0x740 Call Trace: str_spec.74809+0x22ffb4/0x2d116c (unreliable) dax_writeback_one+0x1a8/0x740 dax_writeback_mapping_range+0x26c/0x700 ext4_dax_writepages+0x150/0x5a0 do_writepages+0x68/0x180 __filemap_fdatawrite_range+0x138/0x180 file_write_and_wait_range+0xa4/0x110 ext4_sync_file+0x370/0x6e0 vfs_fsync_range+0x70/0xf0 sys_msync+0x220/0x2e0 system_call+0x5c/0x68 This is because our pmd_trans_huge check doesn't exclude _PAGE_DEVMAP. To make this all consistent, update pmd_mkdevmap to set H_PAGE_THP_HUGE and pmd_trans_huge check now excludes _PAGE_DEVMAP correctly. Fixes: ebd31197931d ("powerpc/mm: Add devmap support for ppc64") Cc: stable@vger.kernel.org # v4.13+ Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200313094842.351830-1-aneesh.kumar@linux.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-17powerpc/64/tm: Don't let userspace set regs->trap via sigreturnMichael Ellerman
commit c7def7fbdeaa25feaa19caf4a27c5d10bd8789e4 upstream. In restore_tm_sigcontexts() we take the trap value directly from the user sigcontext with no checking: err |= __get_user(regs->trap, &sc->gp_regs[PT_TRAP]); This means we can be in the kernel with an arbitrary regs->trap value. Although that's not immediately problematic, there is a risk we could trigger one of the uses of CHECK_FULL_REGS(): #define CHECK_FULL_REGS(regs) BUG_ON(regs->trap & 1) It can also cause us to unnecessarily save non-volatile GPRs again in save_nvgprs(), which shouldn't be problematic but is still wrong. It's also possible it could trick the syscall restart machinery, which relies on regs->trap not being == 0xc00 (see 9a81c16b5275 ("powerpc: fix double syscall restarts")), though I haven't been able to make that happen. Finally it doesn't match the behaviour of the non-TM case, in restore_sigcontext() which zeroes regs->trap. So change restore_tm_sigcontexts() to zero regs->trap. This was discovered while testing Nick's upcoming rewrite of the syscall entry path. In that series the call to save_nvgprs() prior to signal handling (do_notify_resume()) is removed, which leaves the low-bit of regs->trap uncleared which can then trigger the FULL_REGS() WARNs in setup_tm_sigcontexts(). Fixes: 2b0a576d15e0 ("powerpc: Add new transactional memory state to the signal context") Cc: stable@vger.kernel.org # v3.9+ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200401023836.3286664-1-mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-17powerpc/powernv/idle: Restore AMR/UAMOR/AMOR after idleMichael Ellerman
commit 53a712bae5dd919521a58d7bad773b949358add0 upstream. In order to implement KUAP (Kernel Userspace Access Protection) on Power9 we will be using the AMR, and therefore indirectly the UAMOR/AMOR. So save/restore these regs in the idle code. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> [ajd: Backport to 4.19 tree, CVE-2020-11669] Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-17s390/diag: fix display of diagnose call statisticsMichael Mueller
commit 6c7c851f1b666a8a455678a0b480b9162de86052 upstream. Show the full diag statistic table and not just parts of it. The issue surfaced in a KVM guest with a number of vcpus defined smaller than NR_DIAG_STAT. Fixes: 1ec2772e0c3c ("s390/diag: add a statistic for diagnose calls") Cc: stable@vger.kernel.org Signed-off-by: Michael Mueller <mimu@linux.ibm.com> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-17powerpc/pseries: Avoid NULL pointer dereference when drmem is unavailableLibor Pechacek
commit a83836dbc53e96f13fec248ecc201d18e1e3111d upstream. In guests without hotplugagble memory drmem structure is only zero initialized. Trying to manipulate DLPAR parameters results in a crash. $ echo "memory add count 1" > /sys/kernel/dlpar Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries ... NIP: c0000000000ff294 LR: c0000000000ff248 CTR: 0000000000000000 REGS: c0000000fb9d3880 TRAP: 0300 Tainted: G E (5.5.0-rc6-2-default) MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 28242428 XER: 20000000 CFAR: c0000000009a6c10 DAR: 0000000000000010 DSISR: 40000000 IRQMASK: 0 ... NIP dlpar_memory+0x6e4/0xd00 LR dlpar_memory+0x698/0xd00 Call Trace: dlpar_memory+0x698/0xd00 (unreliable) handle_dlpar_errorlog+0xc0/0x190 dlpar_store+0x198/0x4a0 kobj_attr_store+0x30/0x50 sysfs_kf_write+0x64/0x90 kernfs_fop_write+0x1b0/0x290 __vfs_write+0x3c/0x70 vfs_write+0xd0/0x260 ksys_write+0xdc/0x130 system_call+0x5c/0x68 Taking closer look at the code, I can see that for_each_drmem_lmb is a macro expanding into `for (lmb = &drmem_info->lmbs[0]; lmb <= &drmem_info->lmbs[drmem_info->n_lmbs - 1]; lmb++)`. When drmem_info->lmbs is NULL, the loop would iterate through the whole address range if it weren't stopped by the NULL pointer dereference on the next line. This patch aligns for_each_drmem_lmb and for_each_drmem_lmb_in_range macro behavior with the common C semantics, where the end marker does not belong to the scanned range, and alters get_lmb_range() semantics. As a side effect, the wraparound observed in the crash is prevented. Fixes: 6c6ea53725b3 ("powerpc/mm: Separate ibm, dynamic-memory data from DT format") Cc: stable@vger.kernel.org # v4.16+ Signed-off-by: Libor Pechacek <lpechacek@suse.cz> Signed-off-by: Michal Suchanek <msuchanek@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200131132829.10281-1-msuchanek@suse.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-17arm64: armv8_deprecated: Fix undef_hook mask for thumb setendFredrik Strupe
commit fc2266011accd5aeb8ebc335c381991f20e26e33 upstream. For thumb instructions, call_undef_hook() in traps.c first reads a u16, and if the u16 indicates a T32 instruction (u16 >= 0xe800), a second u16 is read, which then makes up the the lower half-word of a T32 instruction. For T16 instructions, the second u16 is not read, which makes the resulting u32 opcode always have the upper half set to 0. However, having the upper half of instr_mask in the undef_hook set to 0 masks out the upper half of all thumb instructions - both T16 and T32. This results in trapped T32 instructions with the lower half-word equal to the T16 encoding of setend (b650) being matched, even though the upper half-word is not 0000 and thus indicates a T32 opcode. An example of such a T32 instruction is eaa0b650, which should raise a SIGILL since T32 instructions with an eaa prefix are unallocated as per Arm ARM, but instead works as a SETEND because the second half-word is set to b650. This patch fixes the issue by extending instr_mask to include the upper u32 half, which will still match T16 instructions where the upper half is 0, but not T32 instructions. Fixes: 2d888f48e056 ("arm64: Emulate SETEND for AArch32 tasks") Cc: <stable@vger.kernel.org> # 4.0.x- Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Fredrik Strupe <fredrik@strupe.net> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-17arm64: dts: allwinner: h6: Fix PMU compatibleMaxime Ripard
commit 4c7eeb9af3e41ae7d840977119c58f3bbb3f4f59 upstream. The commit 7aa9b9eb7d6a ("arm64: dts: allwinner: H6: Add PMU mode") introduced support for the PMU found on the Allwinner H6. However, the binding only allows for a single compatible, while the patch was adding two. Make sure we follow the binding. Fixes: 7aa9b9eb7d6a ("arm64: dts: allwinner: H6: Add PMU mode") Signed-off-by: Maxime Ripard <maxime@cerno.tech> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-17powerpc/pseries: Drop pointless static qualifier in vpa_debugfs_init()YueHaibing
commit 11dd34f3eae5a468013bb161a1dcf1fecd2ca321 upstream. There is no need to have the 'struct dentry *vpa_dir' variable static since new value always be assigned before use it. Fixes: c6c26fb55e8e ("powerpc/pseries: Export raw per-CPU VPA data via debugfs") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190218125644.87448-1-yuehaibing@huawei.com Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-17KVM: VMX: fix crash cleanup when KVM wasn't usedVitaly Kuznetsov
commit dbef2808af6c594922fe32833b30f55f35e9da6d upstream. If KVM wasn't used at all before we crash the cleanup procedure fails with BUG: unable to handle page fault for address: ffffffffffffffc8 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 23215067 P4D 23215067 PUD 23217067 PMD 0 Oops: 0000 [#8] SMP PTI CPU: 0 PID: 3542 Comm: bash Kdump: loaded Tainted: G D 5.6.0-rc2+ #823 RIP: 0010:crash_vmclear_local_loaded_vmcss.cold+0x19/0x51 [kvm_intel] The root cause is that loaded_vmcss_on_cpu list is not yet initialized, we initialize it in hardware_enable() but this only happens when we start a VM. Previously, we used to have a bitmap with enabled CPUs and that was preventing [masking] the issue. Initialized loaded_vmcss_on_cpu list earlier, right before we assign crash_vmclear_loaded_vmcss pointer. blocked_vcpu_on_cpu list and blocked_vcpu_on_cpu_lock are moved altogether for consistency. Fixes: 31603d4fc2bb ("KVM: VMX: Always VMCLEAR in-use VMCSes during crash with kexec support") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200401081348.1345307-1-vkuznets@redhat.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-17KVM: x86: Gracefully handle __vmalloc() failure during VM allocationSean Christopherson
commit d18b2f43b9147c8005ae0844fb445d8cc6a87e31 upstream. Check the result of __vmalloc() to avoid dereferencing a NULL pointer in the event that allocation failres. Fixes: d1e5b0e98ea27 ("kvm: Make VM ioctl do valloc for some archs") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-17KVM: VMX: Always VMCLEAR in-use VMCSes during crash with kexec supportSean Christopherson
commit 31603d4fc2bb4f0815245d496cb970b27b4f636a upstream. VMCLEAR all in-use VMCSes during a crash, even if kdump's NMI shootdown interrupted a KVM update of the percpu in-use VMCS list. Because NMIs are not blocked by disabling IRQs, it's possible that crash_vmclear_local_loaded_vmcss() could be called while the percpu list of VMCSes is being modified, e.g. in the middle of list_add() in vmx_vcpu_load_vmcs(). This potential corner case was called out in the original commit[*], but the analysis of its impact was wrong. Skipping the VMCLEARs is wrong because it all but guarantees that a loaded, and therefore cached, VMCS will live across kexec and corrupt memory in the new kernel. Corruption will occur because the CPU's VMCS cache is non-coherent, i.e. not snooped, and so the writeback of VMCS memory on its eviction will overwrite random memory in the new kernel. The VMCS will live because the NMI shootdown also disables VMX, i.e. the in-progress VMCLEAR will #UD, and existing Intel CPUs do not flush the VMCS cache on VMXOFF. Furthermore, interrupting list_add() and list_del() is safe due to crash_vmclear_local_loaded_vmcss() using forward iteration. list_add() ensures the new entry is not visible to forward iteration unless the entire add completes, via WRITE_ONCE(prev->next, new). A bad "prev" pointer could be observed if the NMI shootdown interrupted list_del() or list_add(), but list_for_each_entry() does not consume ->prev. In addition to removing the temporary disabling of VMCLEAR, open code loaded_vmcs_init() in __loaded_vmcs_clear() and reorder VMCLEAR so that the VMCS is deleted from the list only after it's been VMCLEAR'd. Deleting the VMCS before VMCLEAR would allow a race where the NMI shootdown could arrive between list_del() and vmcs_clear() and thus neither flow would execute a successful VMCLEAR. Alternatively, more code could be moved into loaded_vmcs_init(), but that gets rather silly as the only other user, alloc_loaded_vmcs(), doesn't need the smp_wmb() and would need to work around the list_del(). Update the smp_*() comments related to the list manipulation, and opportunistically reword them to improve clarity. [*] https://patchwork.kernel.org/patch/1675731/#3720461 Fixes: 8f536b7697a0 ("KVM: VMX: provide the vmclear function and a bitmap to support VMCLEAR in kdump") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200321193751.24985-2-sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-17KVM: x86: Allocate new rmap and large page tracking when moving memslotSean Christopherson
commit edd4fa37baa6ee8e44dc65523b27bd6fe44c94de upstream. Reallocate a rmap array and recalcuate large page compatibility when moving an existing memslot to correctly handle the alignment properties of the new memslot. The number of rmap entries required at each level is dependent on the alignment of the memslot's base gfn with respect to that level, e.g. moving a large-page aligned memslot so that it becomes unaligned will increase the number of rmap entries needed at the now unaligned level. Not updating the rmap array is the most obvious bug, as KVM accesses garbage data beyond the end of the rmap. KVM interprets the bad data as pointers, leading to non-canonical #GPs, unexpected #PFs, etc... general protection fault: 0000 [#1] SMP CPU: 0 PID: 1909 Comm: move_memory_reg Not tainted 5.4.0-rc7+ #139 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:rmap_get_first+0x37/0x50 [kvm] Code: <48> 8b 3b 48 85 ff 74 ec e8 6c f4 ff ff 85 c0 74 e3 48 89 d8 5b c3 RSP: 0018:ffffc9000021bbc8 EFLAGS: 00010246 RAX: ffff00617461642e RBX: ffff00617461642e RCX: 0000000000000012 RDX: ffff88827400f568 RSI: ffffc9000021bbe0 RDI: ffff88827400f570 RBP: 0010000000000000 R08: ffffc9000021bd00 R09: ffffc9000021bda8 R10: ffffc9000021bc48 R11: 0000000000000000 R12: 0030000000000000 R13: 0000000000000000 R14: ffff88827427d700 R15: ffffc9000021bce8 FS: 00007f7eda014700(0000) GS:ffff888277a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f7ed9216ff8 CR3: 0000000274391003 CR4: 0000000000162eb0 Call Trace: kvm_mmu_slot_set_dirty+0xa1/0x150 [kvm] __kvm_set_memory_region.part.64+0x559/0x960 [kvm] kvm_set_memory_region+0x45/0x60 [kvm] kvm_vm_ioctl+0x30f/0x920 [kvm] do_vfs_ioctl+0xa1/0x620 ksys_ioctl+0x66/0x70 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x4c/0x170 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f7ed9911f47 Code: <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 21 6f 2c 00 f7 d8 64 89 01 48 RSP: 002b:00007ffc00937498 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 0000000001ab0010 RCX: 00007f7ed9911f47 RDX: 0000000001ab1350 RSI: 000000004020ae46 RDI: 0000000000000004 RBP: 000000000000000a R08: 0000000000000000 R09: 00007f7ed9214700 R10: 00007f7ed92149d0 R11: 0000000000000246 R12: 00000000bffff000 R13: 0000000000000003 R14: 00007f7ed9215000 R15: 0000000000000000 Modules linked in: kvm_intel kvm irqbypass ---[ end trace 0c5f570b3358ca89 ]--- The disallow_lpage tracking is more subtle. Failure to update results in KVM creating large pages when it shouldn't, either due to stale data or again due to indexing beyond the end of the metadata arrays, which can lead to memory corruption and/or leaking data to guest/userspace. Note, the arrays for the old memslot are freed by the unconditional call to kvm_free_memslot() in __kvm_set_memory_region(). Fixes: 05da45583de9b ("KVM: MMU: large page support") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-17KVM: s390: vsie: Fix delivery of addressing exceptionsDavid Hildenbrand
commit 4d4cee96fb7a3cc53702a9be8299bf525be4ee98 upstream. Whenever we get an -EFAULT, we failed to read in guest 2 physical address space. Such addressing exceptions are reported via a program intercept to the nested hypervisor. We faked the intercept, we have to return to guest 2. Instead, right now we would be returning -EFAULT from the intercept handler, eventually crashing the VM. the correct thing to do is to return 1 as rc == 1 is the internal representation of "we have to go back into g2". Addressing exceptions can only happen if the g2->g3 page tables reference invalid g2 addresses (say, either a table or the final page is not accessible - so something that basically never happens in sane environments. Identified by manual code inspection. Fixes: a3508fbe9dc6 ("KVM: s390: vsie: initial support for nested virtualization") Cc: <stable@vger.kernel.org> # v4.8+ Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20200403153050.20569-3-david@redhat.com Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> [borntraeger@de.ibm.com: fix patch description] Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-17KVM: s390: vsie: Fix region 1 ASCE sanity shadow address checksDavid Hildenbrand
commit a1d032a49522cb5368e5dfb945a85899b4c74f65 upstream. In case we have a region 1 the following calculation (31 + ((gmap->asce & _ASCE_TYPE_MASK) >> 2)*11) results in 64. As shifts beyond the size are undefined the compiler is free to use instructions like sllg. sllg will only use 6 bits of the shift value (here 64) resulting in no shift at all. That means that ALL addresses will be rejected. The can result in endless loops, e.g. when prefix cannot get mapped. Fixes: 4be130a08420 ("s390/mm: add shadow gmap support") Tested-by: Janosch Frank <frankja@linux.ibm.com> Reported-by: Janosch Frank <frankja@linux.ibm.com> Cc: <stable@vger.kernel.org> # v4.8+ Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20200403153050.20569-2-david@redhat.com Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> [borntraeger@de.ibm.com: fix patch description, remove WARN_ON_ONCE] Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-17KVM: nVMX: Properly handle userspace interrupt window requestSean Christopherson
commit a1c77abb8d93381e25a8d2df3a917388244ba776 upstream. Return true for vmx_interrupt_allowed() if the vCPU is in L2 and L1 has external interrupt exiting enabled. IRQs are never blocked in hardware if the CPU is in the guest (L2 from L1's perspective) when IRQs trigger VM-Exit. The new check percolates up to kvm_vcpu_ready_for_interrupt_injection() and thus vcpu_run(), and so KVM will exit to userspace if userspace has requested an interrupt window (to inject an IRQ into L1). Remove the @external_intr param from vmx_check_nested_events(), which is actually an indicator that userspace wants an interrupt window, e.g. it's named @req_int_win further up the stack. Injecting a VM-Exit into L1 to try and bounce out to L0 userspace is all kinds of broken and is no longer necessary. Remove the hack in nested_vmx_vmexit() that attempted to workaround the breakage in vmx_check_nested_events() by only filling interrupt info if there's an actual interrupt pending. The hack actually made things worse because it caused KVM to _never_ fill interrupt info when the LAPIC resides in userspace (kvm_cpu_has_interrupt() queries interrupt.injected, which is always cleared by prepare_vmcs12() before reaching the hack in nested_vmx_vmexit()). Fixes: 6550c4df7e50 ("KVM: nVMX: Fix interrupt window request with "Acknowledge interrupt on exit"") Cc: stable@vger.kernel.org Cc: Liran Alon <liran.alon@oracle.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-17x86/entry/32: Add missing ASM_CLAC to general_protection entryThomas Gleixner
commit 3d51507f29f2153a658df4a0674ec5b592b62085 upstream. All exception entry points must have ASM_CLAC right at the beginning. The general_protection entry is missing one. Fixes: e59d1b0a2419 ("x86-32, smap: Add STAC/CLAC instructions to 32-bit kernel entry") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20200225220216.219537887@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-17MIPS: OCTEON: irq: Fix potential NULL pointer dereferenceGustavo A. R. Silva
commit 792a402c2840054533ef56279c212ef6da87d811 upstream. There is a potential NULL pointer dereference in case kzalloc() fails and returns NULL. Fix this by adding a NULL check on *cd* This bug was detected with the help of Coccinelle. Fixes: 64b139f97c01 ("MIPS: OCTEON: irq: add CIB and other fixes") Cc: stable@vger.kernel.org Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-17MIPS/tlbex: Fix LDDIR usage in setup_pw() for Loongson-3Huacai Chen
commit d191aaffe3687d1e73e644c185f5f0550ec242b5 upstream. LDDIR/LDPTE is Loongson-3's acceleration for Page Table Walking. If BD (Base Directory, the 4th page directory) is not enabled, then GDOffset is biased by BadVAddr[63:62]. So, if GDOffset (aka. BadVAddr[47:36] for Loongson-3) is big enough, "0b11(BadVAddr[63:62])|BadVAddr[47:36]|...." can far beyond pg_swapper_dir. This means the pg_swapper_dir may NOT be accessed by LDDIR correctly, so fix it by set PWDirExt in CP0_PWCtl. Cc: <stable@vger.kernel.org> Signed-off-by: Pei Huang <huangpei@loongson.cn> Signed-off-by: Huacai Chen <chenhc@lemote.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-17acpi/x86: ignore unspecified bit positions in the ACPI global lock fieldJan Engelhardt
commit ecb9c790999fd6c5af0f44783bd0217f0b89ec2b upstream. The value in "new" is constructed from "old" such that all bits defined as reserved by the ACPI spec[1] are left untouched. But if those bits do not happen to be all zero, "new < 3" will not evaluate to true. The firmware of the laptop(s) Medion MD63490 / Akoya P15648 comes with garbage inside the "FACS" ACPI table. The starting value is old=0x4944454d, therefore new=0x4944454e, which is >= 3. Mask off the reserved bits. [1] https://uefi.org/sites/default/files/resources/ACPI_6_2.pdf Link: https://bugzilla.kernel.org/show_bug.cgi?id=206553 Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Jan Engelhardt <jengelh@inai.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-17x86/boot: Use unsigned comparison for addressesArvind Sankar
[ Upstream commit 81a34892c2c7c809f9c4e22c5ac936ae673fb9a2 ] The load address is compared with LOAD_PHYSICAL_ADDR using a signed comparison currently (using jge instruction). When loading a 64-bit kernel using the new efi32_pe_entry() point added by: 97aa276579b2 ("efi/x86: Add true mixed mode entry point into .compat section") using Qemu with -m 3072, the firmware actually loads us above 2Gb, resulting in a very early crash. Use the JAE instruction to perform a unsigned comparison instead, as physical addresses should be considered unsigned. Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200301230436.2246909-6-nivedita@alum.mit.edu Link: https://lore.kernel.org/r/20200308080859.21568-14-ardb@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-17x86: Don't let pgprot_modify() change the page encryption bitThomas Hellstrom
[ Upstream commit 6db73f17c5f155dbcfd5e48e621c706270b84df0 ] When SEV or SME is enabled and active, vm_get_page_prot() typically returns with the encryption bit set. This means that users of pgprot_modify(, vm_get_page_prot()) (mprotect_fixup(), do_mmap()) end up with a value of vma->vm_pg_prot that is not consistent with the intended protection of the PTEs. This is also important for fault handlers that rely on the VMA vm_page_prot to set the page protection. Fix this by not allowing pgprot_modify() to change the encryption bit, similar to how it's done for PAT bits. Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lkml.kernel.org/r/20200304114527.3636-2-thomas_os@shipmail.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-17ARM: dts: sun8i-a83t-tbs-a711: HM5065 doesn't like such a high voltageOndrej Jirman
[ Upstream commit a40550952c000667b20082d58077bc647da6c890 ] Lowering the voltage solves the quick image degradation over time (minutes), that was probably caused by overheating. Signed-off-by: Ondrej Jirman <megous@megous.com> Signed-off-by: Maxime Ripard <maxime@cerno.tech> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-13arm64: Fix size of __early_cpu_boot_statusArun KS
commit 61cf61d81e326163ce1557ceccfca76e11d0e57c upstream. __early_cpu_boot_status is of type long. Use quad assembler directive to allocate proper size. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Arun KS <arunks@codeaurora.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-02arm64: dts: ls1046ardb: set RGMII interfaces to RGMII_ID modeMadalin Bucur
commit d79e9d7c1e4ba5f95f2ff3541880c40ea9722212 upstream. The correct setting for the RGMII ports on LS1046ARDB is to enable delay on both Rx and Tx so the interface mode used must be PHY_INTERFACE_MODE_RGMII_ID. Since commit 1b3047b5208a80 ("net: phy: realtek: add support for configuring the RX delay on RTL8211F") the Realtek 8211F PHY driver has control over the RGMII RX delay and it is disabling it for RGMII_TXID. The LS1046ARDB uses two such PHYs in RGMII_ID mode but in the device tree the mode was described as "rgmii". Changing the phy-connection-type to "rgmii-id" to address the issue. Fixes: 3fa395d2c48a ("arm64: dts: add LS1046A DPAA FMan nodes") Signed-off-by: Madalin Bucur <madalin.bucur@oss.nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-02arm64: dts: ls1043a-rdb: correct RGMII delay mode to rgmii-idMadalin Bucur
commit 4022d808c45277693ea86478fab1f081ebf997e8 upstream. The correct setting for the RGMII ports on LS1043ARDB is to enable delay on both Rx and Tx so the interface mode used must be PHY_INTERFACE_MODE_RGMII_ID. Since commit 1b3047b5208a80 ("net: phy: realtek: add support for configuring the RX delay on RTL8211F") the Realtek 8211F PHY driver has control over the RGMII RX delay and it is disabling it for RGMII_TXID. The LS1043ARDB uses two such PHYs in RGMII_ID mode but in the device tree the mode was described as "rgmii_txid". This issue was not apparent at the time as the PHY driver took the same action for RGMII_TXID and RGMII_ID back then but it became visible (RX no longer working) after the above patch. Changing the phy-connection-type to "rgmii-id" to address the issue. Fixes: bf02f2ffe59c ("arm64: dts: add LS1043A DPAA FMan support") Signed-off-by: Madalin Bucur <madalin.bucur@oss.nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-02ARM: dts: N900: fix onenand timingsArthur Demchenkov
commit 0c5220a3c1242c7a2451570ed5f5af69620aac75 upstream. Commit a758f50f10cf ("mtd: onenand: omap2: Configure driver from DT") started using DT specified timings for GPMC, and as a result the OneNAND stopped working on N900 as we had wrong values in the DT. Fix by updating the values to bootloader timings that have been tested to be working on Nokia N900 with OneNAND manufacturers: Samsung, Numonyx. Fixes: a758f50f10cf ("mtd: onenand: omap2: Configure driver from DT") Signed-off-by: Arthur Demchenkov <spinal.by@gmail.com> Tested-by: Merlijn Wajer <merlijn@wizzup.org> Reviewed-by: Roger Quadros <rogerq@ti.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-02ARM: dts: imx6: phycore-som: fix arm and soc minimum voltageMarco Felsch
commit 636b45b8efa91db05553840b6c0120d6fa6b94fa upstream. The current set minimum voltage of 730000µV seems to be wrong. I don't know the document which specifies that but the imx6qdl datasheets says that the minimum voltage should be 0.925V for VDD_ARM (LDO bypassed, lowest opp) and 1.15V for VDD_SOC (LDO bypassed, lowest opp). Fixes: ddec5d1c0047 ("ARM: dts: imx6: Add initial support for phyCORE-i.MX 6 SOM") Signed-off-by: Marco Felsch <m.felsch@pengutronix.de> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-02ARM: bcm2835-rpi-zero-w: Add missing pinctrl nameNick Hudson
commit 6687c201fdc3139315c2ea7ef96c157672805cdc upstream. Define the sdhci pinctrl state as "default" so it gets applied correctly and to match all other RPis. Fixes: 2c7c040c73e9 ("ARM: dts: bcm2835: Add Raspberry Pi Zero W") Signed-off-by: Nick Hudson <skrll@netbsd.org> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-02ARM: dts: oxnas: Fix clear-mask propertySungbo Eo
commit deeabb4c1341a12bf8b599e6a2f4cfa4fd74738c upstream. Disable all rps-irq interrupts during driver initialization to prevent an accidental interrupt on GIC. Fixes: 84316f4ef141 ("ARM: boot: dts: Add Oxford Semiconductor OX810SE dtsi") Fixes: 38d4a53733f5 ("ARM: dts: Add support for OX820 and Pogoplug V3") Signed-off-by: Sungbo Eo <mans0n@gorani.run> Acked-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-02arm64: alternative: fix build with clang integrated assemblerIlie Halip
commit 6f5459da2b8736720afdbd67c4bd2d1edba7d0e3 upstream. Building an arm64 defconfig with clang's integrated assembler, this error occurs: <instantiation>:2:2: error: unrecognized instruction mnemonic _ASM_EXTABLE 9999b, 9f ^ arch/arm64/mm/cache.S:50:1: note: while in macro instantiation user_alt 9f, "dc cvau, x4", "dc civac, x4", 0 ^ While GNU as seems fine with case-sensitive macro instantiations, clang doesn't, so use the actual macro name (_asm_extable) as in the rest of the file. Also checked that the generated assembly matches the GCC output. Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Fixes: 290622efc76e ("arm64: fix "dc cvau" cache operation on errata-affected core") Link: https://github.com/ClangBuiltLinux/linux/issues/924 Signed-off-by: Ilie Halip <ilie.halip@gmail.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>