aboutsummaryrefslogtreecommitdiffstats
path: root/arch
AgeCommit message (Collapse)Author
2019-09-12powerpc/perf: Fix MMCRA corruption by bhrb_filterRavi Bangoria
commit 3202e35ec1c8fc19cea24253ff83edf702a60a02 upstream. Consider a scenario where user creates two events: 1st event: attr.sample_type |= PERF_SAMPLE_BRANCH_STACK; attr.branch_sample_type = PERF_SAMPLE_BRANCH_ANY; fd = perf_event_open(attr, 0, 1, -1, 0); This sets cpuhw->bhrb_filter to 0 and returns valid fd. 2nd event: attr.sample_type |= PERF_SAMPLE_BRANCH_STACK; attr.branch_sample_type = PERF_SAMPLE_BRANCH_CALL; fd = perf_event_open(attr, 0, 1, -1, 0); It overrides cpuhw->bhrb_filter to -1 and returns with error. Now if power_pmu_enable() gets called by any path other than power_pmu_add(), ppmu->config_bhrb(-1) will set MMCRA to -1. Fixes: 3925f46bb590 ("powerpc/perf: Enable branch stack sampling framework") Cc: stable@vger.kernel.org # v3.10+ Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-09-12KVM: PPC: Book3S HV: XIVE: Do not clear IRQ data of passthrough interruptsCédric Le Goater
commit ef9740204051d0e00f5402fe96cf3a43ddd2bbbf upstream. The passthrough interrupts are defined at the host level and their IRQ data should not be cleared unless specifically deconfigured (shutdown) by the host. They differ from the IPI interrupts which are allocated by the XIVE KVM device and reserved to the guest usage only. This fixes a host crash when destroying a VM in which a PCI adapter was passed-through. In this case, the interrupt is cleared and freed by the KVM device and then shutdown by vfio at the host level. [ 1007.360265] BUG: Kernel NULL pointer dereference at 0x00000d00 [ 1007.360285] Faulting instruction address: 0xc00000000009da34 [ 1007.360296] Oops: Kernel access of bad area, sig: 7 [#1] [ 1007.360303] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV [ 1007.360314] Modules linked in: vhost_net vhost iptable_mangle ipt_MASQUERADE iptable_nat nf_nat xt_conntrack nf_conntrack nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 tun bridge stp llc kvm_hv kvm xt_tcpudp iptable_filter squashfs fuse binfmt_misc vmx_crypto ib_iser rdma_cm iw_cm ib_cm libiscsi scsi_transport_iscsi nfsd ip_tables x_tables autofs4 btrfs zstd_decompress zstd_compress lzo_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq multipath mlx5_ib ib_uverbs ib_core crc32c_vpmsum mlx5_core [ 1007.360425] CPU: 9 PID: 15576 Comm: CPU 18/KVM Kdump: loaded Not tainted 5.1.0-gad7e7d0ef #4 [ 1007.360454] NIP: c00000000009da34 LR: c00000000009e50c CTR: c00000000009e5d0 [ 1007.360482] REGS: c000007f24ccf330 TRAP: 0300 Not tainted (5.1.0-gad7e7d0ef) [ 1007.360500] MSR: 900000000280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 24002484 XER: 00000000 [ 1007.360532] CFAR: c00000000009da10 DAR: 0000000000000d00 DSISR: 00080000 IRQMASK: 1 [ 1007.360532] GPR00: c00000000009e62c c000007f24ccf5c0 c000000001510600 c000007fe7f947c0 [ 1007.360532] GPR04: 0000000000000d00 0000000000000000 0000000000000000 c000005eff02d200 [ 1007.360532] GPR08: 0000000000400000 0000000000000000 0000000000000000 fffffffffffffffd [ 1007.360532] GPR12: c00000000009e5d0 c000007fffff7b00 0000000000000031 000000012c345718 [ 1007.360532] GPR16: 0000000000000000 0000000000000008 0000000000418004 0000000000040100 [ 1007.360532] GPR20: 0000000000000000 0000000008430000 00000000003c0000 0000000000000027 [ 1007.360532] GPR24: 00000000000000ff 0000000000000000 00000000000000ff c000007faa90d98c [ 1007.360532] GPR28: c000007faa90da40 00000000000fe040 ffffffffffffffff c000007fe7f947c0 [ 1007.360689] NIP [c00000000009da34] xive_esb_read+0x34/0x120 [ 1007.360706] LR [c00000000009e50c] xive_do_source_set_mask.part.0+0x2c/0x50 [ 1007.360732] Call Trace: [ 1007.360738] [c000007f24ccf5c0] [c000000000a6383c] snooze_loop+0x15c/0x270 (unreliable) [ 1007.360775] [c000007f24ccf5f0] [c00000000009e62c] xive_irq_shutdown+0x5c/0xe0 [ 1007.360795] [c000007f24ccf630] [c00000000019e4a0] irq_shutdown+0x60/0xe0 [ 1007.360813] [c000007f24ccf660] [c000000000198c44] __free_irq+0x3a4/0x420 [ 1007.360831] [c000007f24ccf700] [c000000000198dc8] free_irq+0x78/0xe0 [ 1007.360849] [c000007f24ccf730] [c00000000096c5a8] vfio_msi_set_vector_signal+0xa8/0x350 [ 1007.360878] [c000007f24ccf7f0] [c00000000096c938] vfio_msi_set_block+0xe8/0x1e0 [ 1007.360899] [c000007f24ccf850] [c00000000096cae0] vfio_msi_disable+0xb0/0x110 [ 1007.360912] [c000007f24ccf8a0] [c00000000096cd04] vfio_pci_set_msi_trigger+0x1c4/0x3d0 [ 1007.360922] [c000007f24ccf910] [c00000000096d910] vfio_pci_set_irqs_ioctl+0xa0/0x170 [ 1007.360941] [c000007f24ccf930] [c00000000096b400] vfio_pci_disable+0x80/0x5e0 [ 1007.360963] [c000007f24ccfa10] [c00000000096b9bc] vfio_pci_release+0x5c/0x90 [ 1007.360991] [c000007f24ccfa40] [c000000000963a9c] vfio_device_fops_release+0x3c/0x70 [ 1007.361012] [c000007f24ccfa70] [c0000000003b5668] __fput+0xc8/0x2b0 [ 1007.361040] [c000007f24ccfac0] [c0000000001409b0] task_work_run+0x140/0x1b0 [ 1007.361059] [c000007f24ccfb20] [c000000000118f8c] do_exit+0x3ac/0xd00 [ 1007.361076] [c000007f24ccfc00] [c0000000001199b0] do_group_exit+0x60/0x100 [ 1007.361094] [c000007f24ccfc40] [c00000000012b514] get_signal+0x1a4/0x8f0 [ 1007.361112] [c000007f24ccfd30] [c000000000021cc8] do_notify_resume+0x1a8/0x430 [ 1007.361141] [c000007f24ccfe20] [c00000000000e444] ret_from_except_lite+0x70/0x74 [ 1007.361159] Instruction dump: [ 1007.361175] 38422c00 e9230000 712a0004 41820010 548a2036 7d442378 78840020 71290020 [ 1007.361194] 4082004c e9230010 7c892214 7c0004ac <e9240000> 0c090000 4c00012c 792a0022 Cc: stable@vger.kernel.org # v4.12+ Fixes: 5af50993850a ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller") Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-09-12s390/crypto: fix possible sleep during spinlock aquiredHarald Freudenberger
commit 1c2c7029c008922d4d48902cc386250502e73d51 upstream. This patch fixes a complain about possible sleep during spinlock aquired "BUG: sleeping function called from invalid context at include/crypto/algapi.h:426" for the ctr(aes) and ctr(des) s390 specific ciphers. Instead of using a spinlock this patch introduces a mutex which is save to be held in sleeping context. Please note a deadlock is not possible as mutex_trylock() is used. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reported-by: Julian Wiedmann <jwi@linux.ibm.com> Cc: stable@vger.kernel.org Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-09-12s390/crypto: fix gcm-aes-s390 selftest failuresHarald Freudenberger
commit bef9f0ba300a55d79a69aa172156072182176515 upstream. The current kernel uses improved crypto selftests. These tests showed that the current implementation of gcm-aes-s390 is not able to deal with chunks of output buffers which are not a multiple of 16 bytes. This patch introduces a rework of the gcm aes s390 scatter walk handling which now is able to handle any input and output scatter list chunk sizes correctly. Code has been verified by the crypto selftests, the tcrypt kernel module and additional tests ran via the af_alg interface. Cc: <stable@vger.kernel.org> Reported-by: Julian Wiedmann <jwi@linux.ibm.com> Reviewed-by: Patrick Steuer <steuer@linux.ibm.com> Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-09-12sparc64: Fix regression in non-hypervisor TLB flush xcallJames Clarke
commit d3c976c14ad8af421134c428b0a89ff8dd3bd8f8 upstream. Previously, %g2 would end up with the value PAGE_SIZE, but after the commit mentioned below it ends up with the value 1 due to being reused for a different purpose. We need it to be PAGE_SIZE as we use it to step through pages in our demap loop, otherwise we set different flags in the low 12 bits of the address written to, thereby doing things other than a nucleus page flush. Fixes: a74ad5e660a9 ("sparc64: Handle extremely large kernel TLB range flushes more gracefully.") Reported-by: Meelis Roos <mroos@linux.ee> Tested-by: Meelis Roos <mroos@linux.ee> Signed-off-by: James Clarke <jrtc27@jrtc27.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-09-12Revert "MIPS: perf: ath79: Fix perfcount IRQ assignment"Greg Kroah-Hartman
This reverts commit ca8648816e3dcc8dadba0e79a034f61c85eb206d which is commit a1e8783db8e0d58891681bc1e6d9ada66eae8e20 upstream. Petr writes: Karl has reported to me today, that he's experiencing weird reboot hang on his devices with 4.9.180 kernel and that he has bisected it down to my backported patch. I would like to kindly ask you for removal of this patch. This patch should be reverted from all stable kernels up to 5.1, because perf counters were not broken on those kernels, and this patch won't work on the ath79 legacy IRQ code anyway, it needs new irqchip driver which was enabled on ath79 with commit 51fa4f8912c0 ("MIPS: ath79: drop legacy IRQ code"). Reported-by: Petr Štetiar <ynezz@true.cz> Cc: Kevin 'ldir' Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk> Cc: John Crispin <john@phrozen.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Paul Burton <paul.burton@mips.com> Cc: linux-mips@vger.kernel.org Cc: Ralf Baechle <ralf@linux-mips.org> Cc: James Hogan <jhogan@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jason Cooper <jason@lakedaemon.net> Cc: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [PG: add the revert found in 4.19.x-stable trees. Note that the commit ID ca8648816e is for 4.19.x; see 6dd592e476 in 4.18.x ] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-09-12Revert "x86/build: Move _etext to actual end of .text"Greg Kroah-Hartman
This reverts commit 392bef709659abea614abfe53cf228e7a59876a4. It seems to cause lots of problems when using the gold linker, and no one really needs this at the moment, so just revert it from the stable trees. Cc: Sami Tolvanen <samitolvanen@google.com> Reported-by: Kees Cook <keescook@chromium.org> Cc: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Reported-by: Alec Ari <neotheuser@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [PG: add the revert found in 4.19.x-stable trees.] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-08-25Merge tag 'v4.18.42' into v4.18/standard/baseBruce Ashfield
This is the 4.18.42 stable release
2019-08-17x86/mce: Handle varying MCA bank countsYazen Ghannam
commit 006c077041dc73b9490fffc4c6af5befe0687110 upstream. Linux reads MCG_CAP[Count] to find the number of MCA banks visible to a CPU. Currently, this number is the same for all CPUs and a warning is shown if there is a difference. The number of banks is overwritten with the MCG_CAP[Count] value of each following CPU that boots. According to the Intel SDM and AMD APM, the MCG_CAP[Count] value gives the number of banks that are available to a "processor implementation". The AMD BKDGs/PPRs further clarify that this value is per core. This value has historically been the same for every core in the system, but that is not an architectural requirement. Future AMD systems may have different MCG_CAP[Count] values per core, so the assumption that all CPUs will have the same MCG_CAP[Count] value will no longer be valid. Also, the first CPU to boot will allocate the struct mce_banks[] array using the number of banks based on its MCG_CAP[Count] value. The machine check handler and other functions use the global number of banks to iterate and index into the mce_banks[] array. So it's possible to use an out-of-bounds index on an asymmetric system where a following CPU sees a MCG_CAP[Count] value greater than its predecessors. Thus, allocate the mce_banks[] array to the maximum number of banks. This will avoid the potential out-of-bounds index since the value of mca_cfg.banks is capped to MAX_NR_BANKS. Set the value of mca_cfg.banks equal to the max of the previous value and the value for the current CPU. This way mca_cfg.banks will always represent the max number of banks detected on any CPU in the system. This will ensure that all CPUs will access all the banks that are visible to them. A CPU that can access fewer than the max number of banks will find the registers of the extra banks to be read-as-zero. Furthermore, print the resulting number of MCA banks in use. Do this in mcheck_late_init() so that the final value is printed after all CPUs have been initialized. Finally, get bank count from target CPU when doing injection with mce-inject module. [ bp: Remove out-of-bounds example, passify and cleanup commit message. ] Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Pu Wen <puwen@hygon.cn> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20180727214009.78289-1-Yazen.Ghannam@amd.com [PG: update file paths for older code base.] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-08-17x86/mce: Fix machine_check_poll() tests for error typesTony Luck
commit f19501aa07f18268ab14f458b51c1c6b7f72a134 upstream. There has been a lurking "TBD" in the machine check poll routine ever since it was first split out from the machine check handler. The potential issue is that the poll routine may have just begun a read from the STATUS register in a machine check bank when the hardware logs an error in that bank and signals a machine check. That race used to be pretty small back when machine checks were broadcast, but the addition of local machine check means that the poll code could continue running and clear the error from the bank before the local machine check handler on another CPU gets around to reading it. Fix the code to be sure to only process errors that need to be processed in the poll code, leaving other logged errors alone for the machine check handler to find and process. [ bp: Massage a bit and flip the "== 0" check to the usual !(..) test. ] Fixes: b79109c3bbcf ("x86, mce: separate correct machine check poller and fatal exception handler") Fixes: ed7290d0ee8f ("x86, mce: implement new status bits") Reported-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Ashok Raj <ashok.raj@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Cc: Yazen Ghannam <Yazen.Ghannam@amd.com> Link: https://lkml.kernel.org/r/20190312170938.GA23035@agluck-desk [PG: update for older file paths in 4.18.x code base.] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-08-17sh: sh7786: Add explicit I/O cast to sh7786_mm_sel()Geert Uytterhoeven
commit 8440bb9b944c02222c7a840d406141ed42e945cd upstream. When compile-testing on arm: arch/sh/include/cpu-sh4/cpu/sh7786.h: In function ‘sh7786_mm_sel’: arch/sh/include/cpu-sh4/cpu/sh7786.h:135:21: warning: passing argument 1 of ‘__raw_readl’ makes pointer from integer without a cast [-Wint-conversion] return __raw_readl(0xFC400020) & 0x7; ^~~~~~~~~~ In file included from include/linux/io.h:25:0, from arch/sh/include/cpu-sh4/cpu/sh7786.h:14, from drivers/pinctrl/sh-pfc/pfc-sh7786.c:15: arch/arm/include/asm/io.h:113:21: note: expected ‘const volatile void *’ but argument is of type ‘unsigned int’ #define __raw_readl __raw_readl ^ arch/arm/include/asm/io.h:114:19: note: in expansion of macro ‘__raw_readl’ static inline u32 __raw_readl(const volatile void __iomem *addr) ^~~~~~~~~~~ __raw_readl() on SuperH is a macro that casts the passed I/O address to the correct type, while the implementations on most other architectures expect to be passed the correct pointer type. Add an explicit cast to fix this. Note that this also gets rid of a sparse warning on SuperH: arch/sh/include/cpu-sh4/cpu/sh7786.h:135:16: warning: incorrect type in argument 1 (different base types) arch/sh/include/cpu-sh4/cpu/sh7786.h:135:16: expected void const volatile [noderef] <asn:2>*<noident> arch/sh/include/cpu-sh4/cpu/sh7786.h:135:16: got unsigned int Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-08-17x86/uaccess: Fix up the fixupPeter Zijlstra
commit b69656fa7ea2f75e47d7bd5b9430359fa46488af upstream. New tooling got confused about this: arch/x86/lib/memcpy_64.o: warning: objtool: .fixup+0x7: return with UACCESS enabled While the code isn't wrong, it is tedious (if at all possible) to figure out what function a particular chunk of .fixup belongs to. This then confuses the objtool uaccess validation. Instead of returning directly from the .fixup, jump back into the right function. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-08-17x86/ia32: Fix ia32_restore_sigcontext() AC leakPeter Zijlstra
commit 67a0514afdbb8b2fc70b771b8c77661a9cb9d3a9 upstream. Objtool spotted that we call native_load_gs_index() with AC set. Re-arrange the code to avoid that. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-08-17x86/uaccess, signal: Fix AC=1 bloatPeter Zijlstra
commit 88e4718275c1bddca6f61f300688b4553dc8584b upstream. Occasionally GCC is less agressive with inlining and the following is observed: arch/x86/kernel/signal.o: warning: objtool: restore_sigcontext()+0x3cc: call to force_valid_ss.isra.5() with UACCESS enabled arch/x86/kernel/signal.o: warning: objtool: do_signal()+0x384: call to frame_uc_flags.isra.0() with UACCESS enabled Cure this by moving this code out of the AC=1 region, since it really isn't needed for the user access. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-08-17arm64: cpu_ops: fix a leaked reference by adding missing of_node_putWen Yang
commit 92606ec9285fb84cd9b5943df23f07d741384bfc upstream. The call to of_get_next_child returns a node pointer with refcount incremented thus it must be explicitly decremented after the last usage. Detected by coccinelle with the following warnings: ./arch/arm64/kernel/cpu_ops.c:102:1-7: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 69, but without a corresponding object release within this function. Signed-off-by: Wen Yang <wen.yang99@zte.com.cn> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-08-17x86/build: Keep local relocations with ld.lldKees Cook
commit 7c21383f3429dd70da39c0c7f1efa12377a47ab6 upstream. The LLVM linker (ld.lld) defaults to removing local relocations, which causes KASLR boot failures. ld.bfd and ld.gold already handle this correctly. This adds the explicit instruction "--discard-none" during the link phase. There is no change in output for ld.bfd and ld.gold, but ld.lld now produces an image with all the needed relocations. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: clang-built-linux@googlegroups.com Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190404214027.GA7324@beast Link: https://github.com/ClangBuiltLinux/linux/issues/404 Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-08-17x86/microcode: Fix the ancient deprecated microcode loading methodBorislav Petkov
commit 24613a04ad1c0588c10f4b5403ca60a73d164051 upstream. Commit 2613f36ed965 ("x86/microcode: Attempt late loading only when new microcode is present") added the new define UCODE_NEW to denote that an update should happen only when newer microcode (than installed on the system) has been found. But it missed adjusting that for the old /dev/cpu/microcode loading interface. Fix it. Fixes: 2613f36ed965 ("x86/microcode: Attempt late loading only when new microcode is present") Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jann Horn <jannh@google.com> Link: https://lkml.kernel.org/r/20190405133010.24249-3-bp@alien8.de Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-08-17perf/x86/intel/cstate: Add Icelake supportKan Liang
commit f08c47d1f86c6dc666c7e659d94bf6d4492aa9d7 upstream. Icelake uses the same C-state residency events as Sandy Bridge. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Cc: jolsa@kernel.org Link: https://lkml.kernel.org/r/20190402194509.2832-10-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-08-17perf/x86/intel/rapl: Add Icelake supportKan Liang
commit b3377c3acb9e54cf86efcfe25f2e792bca599ed4 upstream. Icelake support the same RAPL counters as Skylake. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Cc: jolsa@kernel.org Link: https://lkml.kernel.org/r/20190402194509.2832-11-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-08-17perf/x86/msr: Add Icelake supportKan Liang
commit cf50d79a8cfe5adae37fec026220b009559bbeed upstream. Icelake is the same as the existing Skylake parts. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Cc: jolsa@kernel.org Link: https://lkml.kernel.org/r/20190402194509.2832-12-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-08-17arm64: vdso: Fix clock_getres() for CLOCK_REALTIMEVincenzo Frascino
commit 81fb8736dd81da3fe94f28968dac60f392ec6746 upstream. clock_getres() in the vDSO library has to preserve the same behaviour of posix_get_hrtimer_res(). In particular, posix_get_hrtimer_res() does: sec = 0; ns = hrtimer_resolution; where 'hrtimer_resolution' depends on whether or not high resolution timers are enabled, which is a runtime decision. The vDSO incorrectly returns the constant CLOCK_REALTIME_RES. Fix this by exposing 'hrtimer_resolution' in the vDSO datapage and returning that instead. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> [will: Use WRITE_ONCE(), move adr off COARSE path, renumber labels, use 'w' reg] Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-08-17x86/irq/64: Limit IST stack overflow check to #DB stackThomas Gleixner
commit 7dbcf2b0b770eeb803a416ee8dcbef78e6389d40 upstream. Commit 37fe6a42b343 ("x86: Check stack overflow in detail") added a broad check for the full exception stack area, i.e. it considers the full exception stack area as valid. That's wrong in two aspects: 1) It does not check the individual areas one by one 2) #DF, NMI and #MCE are not enabling interrupts which means that a regular device interrupt cannot happen in their context. In fact if a device interrupt hits one of those IST stacks that's a bug because some code path enabled interrupts while handling the exception. Limit the check to the #DB stack and consider all other IST stacks as 'overflow' or invalid. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com> Cc: Nicolai Stange <nstange@suse.de> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190414160143.682135110@linutronix.de Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-08-17powerpc/64: Fix booting large kernels with STRICT_KERNEL_RWXRussell Currey
commit 56c46bba9bbfe229b4472a5be313c44c5b714a39 upstream. With STRICT_KERNEL_RWX enabled anything marked __init is placed at a 16M boundary. This is necessary so that it can be repurposed later with different permissions. However, in kernels with text larger than 16M, this pushes early_setup past 32M, incapable of being reached by the branch instruction. Fix this by setting the CTR and branching there instead. Fixes: 1e0fc9d1eb2b ("powerpc/Kconfig: Enable STRICT_KERNEL_RWX for some configs") Signed-off-by: Russell Currey <ruscur@russell.cc> [mpe: Fix it to work on BE by using DOTSYM()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-08-17powerpc/numa: improve control of topology updatesNathan Lynch
commit 2d4d9b308f8f8dec68f6dbbff18c68ec7c6bd26f upstream. When booted with "topology_updates=no", or when "off" is written to /proc/powerpc/topology_updates, NUMA reassignments are inhibited for PRRN and VPHN events. However, migration and suspend unconditionally re-enable reassignments via start_topology_update(). This is incoherent. Check the topology_updates_enabled flag in start/stop_topology_update() so that callers of those APIs need not be aware of whether reassignments are enabled. This allows the administrative decision on reassignments to remain in force across migrations and suspensions. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-08-07Merge tag 'v4.18.41' into v4.18/standard/baseBruce Ashfield
This is the 4.18.41 stable release
2019-08-02x86/build: Move _etext to actual end of .textKees Cook
commit 392bef709659abea614abfe53cf228e7a59876a4 upstream. When building x86 with Clang LTO and CFI, CFI jump regions are automatically added to the end of the .text section late in linking. As a result, the _etext position was being labelled before the appended jump regions, causing confusion about where the boundaries of the executable region actually are in the running kernel, and broke at least the fault injection code. This moves the _etext mark to outside (and immediately after) the .text area, as it already the case on other architectures (e.g. arm64, arm). Reported-and-tested-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20190423183827.GA4012@beast Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-08-02s390/kexec_file: Fix detection of text segment in ELF loaderPhilipp Rudo
commit 729829d775c9a5217abc784b2f16087d79c4eec8 upstream. To register data for the next kernel (command line, oldmem_base, etc.) the current kernel needs to find the ELF segment that contains head.S. This is currently done by checking ifor 'phdr->p_paddr == 0'. This works fine for the current kernel build but in theory the first few pages could be skipped. Make the detection more robust by checking if the entry point lies within the segment. Signed-off-by: Philipp Rudo <prudo@linux.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-08-02x86/modules: Avoid breaking W^X while loading modulesNadav Amit
commit f2c65fb3221adc6b73b0549fc7ba892022db9797 upstream. When modules and BPF filters are loaded, there is a time window in which some memory is both writable and executable. An attacker that has already found another vulnerability (e.g., a dangling pointer) might be able to exploit this behavior to overwrite kernel code. Prevent having writable executable PTEs in this stage. In addition, avoiding having W+X mappings can also slightly simplify the patching of modules code on initialization (e.g., by alternatives and static-key), as would be done in the next patch. This was actually the main motivation for this patch. To avoid having W+X mappings, set them initially as RW (NX) and after they are set as RO set them as X as well. Setting them as executable is done as a separate step to avoid one core in which the old PTE is cached (hence writable), and another which sees the updated PTE (executable), which would break the W^X protection. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Suggested-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Nadav Amit <namit@vmware.com> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <akpm@linux-foundation.org> Cc: <ard.biesheuvel@linaro.org> Cc: <deneen.t.dock@intel.com> Cc: <kernel-hardening@lists.openwall.com> Cc: <kristen@linux.intel.com> Cc: <linux_dti@icloud.com> Cc: <will.deacon@arm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jessica Yu <jeyu@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Rik van Riel <riel@surriel.com> Link: https://lkml.kernel.org/r/20190426001143.4983-12-namit@vmware.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-08-02x86/alternatives, jumplabel: Use text_poke_early() before mm_init()Pavel Tatashin
commit 6fffacb30349e0903602d664f7ab6fc87e85162e upstream. It supposed to be safe to modify static branches after jump_label_init(). But, because static key modifying code eventually calls text_poke() it can end up accessing a struct page which has not been initialized yet. Here is how to quickly reproduce the problem. Insert code like this into init/main.c: | +static DEFINE_STATIC_KEY_FALSE(__test); | asmlinkage __visible void __init start_kernel(void) | { | char *command_line; |@@ -587,6 +609,10 @@ asmlinkage __visible void __init start_kernel(void) | vfs_caches_init_early(); | sort_main_extable(); | trap_init(); |+ { |+ static_branch_enable(&__test); |+ WARN_ON(!static_branch_likely(&__test)); |+ } | mm_init(); The following warnings show-up: WARNING: CPU: 0 PID: 0 at arch/x86/kernel/alternative.c:701 text_poke+0x20d/0x230 RIP: 0010:text_poke+0x20d/0x230 Call Trace: ? text_poke_bp+0x50/0xda ? arch_jump_label_transform+0x89/0xe0 ? __jump_label_update+0x78/0xb0 ? static_key_enable_cpuslocked+0x4d/0x80 ? static_key_enable+0x11/0x20 ? start_kernel+0x23e/0x4c8 ? secondary_startup_64+0xa5/0xb0 ---[ end trace abdc99c031b8a90a ]--- If the code above is moved after mm_init(), no warning is shown, as struct pages are initialized during handover from memblock. Use text_poke_early() in static branching until early boot IRQs are enabled and from there switch to text_poke. Also, ensure text_poke() is never invoked when unitialized memory access may happen by using adding a !after_bootmem assertion. Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Cc: steven.sistare@oracle.com Cc: daniel.m.jordan@oracle.com Cc: linux@armlinux.org.uk Cc: schwidefsky@de.ibm.com Cc: heiko.carstens@de.ibm.com Cc: john.stultz@linaro.org Cc: sboyd@codeaurora.org Cc: hpa@zytor.com Cc: douly.fnst@cn.fujitsu.com Cc: peterz@infradead.org Cc: prarit@redhat.com Cc: feng.tang@intel.com Cc: pmladek@suse.com Cc: gnomes@lxorguk.ukuu.org.uk Cc: linux-s390@vger.kernel.org Cc: boris.ostrovsky@oracle.com Cc: jgross@suse.com Cc: pbonzini@redhat.com Link: https://lkml.kernel.org/r/20180719205545.16512-9-pasha.tatashin@oracle.com Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-08-02powerpc/watchdog: Use hrtimers for per-CPU heartbeatNicholas Piggin
commit 7ae3f6e130e8dc6188b59e3b4ebc2f16e9c8d053 upstream. Using a jiffies timer creates a dependency on the tick_do_timer_cpu incrementing jiffies. If that CPU has locked up and jiffies is not incrementing, the watchdog heartbeat timer for all CPUs stops and creates false positives and confusing warnings on local CPUs, and also causes the SMP detector to stop, so the root cause is never detected. Fix this by using hrtimer based timers for the watchdog heartbeat, like the generic kernel hardlockup detector. Cc: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Reported-by: Ravikumar Bangoria <ravi.bangoria@in.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Reported-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-08-02arm64: Fix compiler warning from pte_unmap() with -Wunused-but-set-variableQian Cai
commit 74dd022f9e6260c3b5b8d15901d27ebcc5f21eda upstream. When building with -Wunused-but-set-variable, the compiler shouts about a number of pte_unmap() users, since this expands to an empty macro on arm64: | mm/gup.c: In function 'gup_pte_range': | mm/gup.c:1727:16: warning: variable 'ptem' set but not used | [-Wunused-but-set-variable] | mm/gup.c: At top level: | mm/memory.c: In function 'copy_pte_range': | mm/memory.c:821:24: warning: variable 'orig_dst_pte' set but not used | [-Wunused-but-set-variable] | mm/memory.c:821:9: warning: variable 'orig_src_pte' set but not used | [-Wunused-but-set-variable] | mm/swap_state.c: In function 'swap_ra_info': | mm/swap_state.c:641:15: warning: variable 'orig_pte' set but not used | [-Wunused-but-set-variable] | mm/madvise.c: In function 'madvise_free_pte_range': | mm/madvise.c:318:9: warning: variable 'orig_pte' set but not used | [-Wunused-but-set-variable] Rewrite pte_unmap() as a static inline function, which silences the warnings. Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-08-02ARM: vdso: Remove dependency with the arch_timer driver internalsMarc Zyngier
commit 1f5b62f09f6b314c8d70b9de5182dae4de1f94da upstream. The VDSO code uses the kernel helper that was originally designed to abstract the access between 32 and 64bit systems. It worked so far because this function is declared as 'inline'. As we're about to revamp that part of the code, the VDSO would break. Let's fix it by doing what should have been done from the start, a proper system register access. Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-08-02powerpc/perf: Fix loop exit condition in nest_imc_event_initAnju T Sudhakar
commit 860b7d2286236170a36f94946d03ca9888d32571 upstream. The data structure (i.e struct imc_mem_info) to hold the memory address information for nest imc units is allocated based on the number of nodes in the system. nest_imc_event_init() traverse this struct array to calculate the memory base address for the event-cpu. If we fail to find a match for the event cpu's chip-id in imc_mem_info struct array, then the do-while loop will iterate until we crash. Fix this by changing the loop exit condition based on the number of non zero vbase elements in the array, since the allocation is done for nr_chips + 1. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: 885dcd709ba91 ("powerpc/perf: Add nest IMC PMU support") Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-08-02powerpc/boot: Fix missing check of lseek() return valueBo YU
commit 5d085ec04a000fefb5182d3b03ee46ca96d8389b upstream. This is detected by Coverity scan: CID: 1440481 Signed-off-by: Bo YU <tsu.yubo@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-08-02powerpc/perf: Return accordingly on invalid chip-id inAnju T Sudhakar
commit a913e5e8b43be1d3897a141ce61c1ec071cad89c upstream. Nest hardware counter memory resides in a per-chip reserve-memory. During nest_imc_event_init(), chip-id of the event-cpu is considered to calculate the base memory addresss for that cpu. Return, proper error condition if the chip_id calculated is invalid. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: 885dcd709ba91 ("powerpc/perf: Add nest IMC PMU support") Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-08-02arm64/iommu: handle non-remapped addresses in ->mmap and ->get_sgtableChristoph Hellwig
commit a98d9ae937d256ed679a935fc82d9deaa710d98e upstream. DMA allocations that can't sleep may return non-remapped addresses, but we do not properly handle them in the mmap and get_sgtable methods. Resolve non-vmalloc addresses using virt_to_page to handle this corner case. Cc: <stable@vger.kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-08-02kvm: svm/avic: fix off-by-one in checking host APIC IDSuthikulpanit, Suravee
commit c9bcd3e3335d0a29d89fabd2c385e1b989e6f1b0 upstream. Current logic does not allow VCPU to be loaded onto CPU with APIC ID 255. This should be allowed since the host physical APIC ID field in the AVIC Physical APIC table entry is an 8-bit value, and APIC ID 255 is valid in system with x2APIC enabled. Instead, do not allow VCPU load if the host APIC ID cannot be represented by an 8-bit value. Also, use the more appropriate AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK instead of AVIC_MAX_PHYSICAL_ID_COUNT. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-08-02x86: Hide the int3_emulate_call/jmp functions from UMLSteven Rostedt (VMware)
commit 693713cbdb3a4bda5a8a678c31f06560bbb14657 upstream. User Mode Linux does not have access to the ip or sp fields of the pt_regs, and accessing them causes UML to fail to build. Hide the int3_emulate_jmp() and int3_emulate_call() instructions from UML, as it doesn't need them anyway. Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-08-02x86/mm/mem_encrypt: Disable all instrumentation for early SME setupGary Hook
commit b51ce3744f115850166f3d6c292b9c8cb849ad4f upstream. Enablement of AMD's Secure Memory Encryption feature is determined very early after start_kernel() is entered. Part of this procedure involves scanning the command line for the parameter 'mem_encrypt'. To determine intended state, the function sme_enable() uses library functions cmdline_find_option() and strncmp(). Their use occurs early enough such that it cannot be assumed that any instrumentation subsystem is initialized. For example, making calls to a KASAN-instrumented function before KASAN is set up will result in the use of uninitialized memory and a boot failure. When AMD's SME support is enabled, conditionally disable instrumentation of these dependent functions in lib/string.c and arch/x86/lib/cmdline.c. [ bp: Get rid of intermediary nostackp var and cleanup whitespace. ] Fixes: aca20d546214 ("x86/mm: Add support to make use of Secure Memory Encryption") Reported-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Gary R Hook <gary.hook@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Boris Brezillon <bbrezillon@kernel.org> Cc: Coly Li <colyli@suse.de> Cc: "dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: "luto@kernel.org" <luto@kernel.org> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: "mingo@redhat.com" <mingo@redhat.com> Cc: "peterz@infradead.org" <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/155657657552.7116.18363762932464011367.stgit@sosrh3.amd.com Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-08-02x86: kvm: hyper-v: deal with buggy TLB flush requests from WS2012Vitaly Kuznetsov
commit da66761c2d93a46270d69001abb5692717495a68 upstream. It was reported that with some special Multi Processor Group configuration, e.g: bcdedit.exe /set groupsize 1 bcdedit.exe /set maxgroup on bcdedit.exe /set groupaware on for a 16-vCPU guest WS2012 shows BSOD on boot when PV TLB flush mechanism is in use. Tracing kvm_hv_flush_tlb immediately reveals the issue: kvm_hv_flush_tlb: processor_mask 0x0 address_space 0x0 flags 0x2 The only flag set in this request is HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES, however, processor_mask is 0x0 and no HV_FLUSH_ALL_PROCESSORS is specified. We don't flush anything and apparently it's not what Windows expects. TLFS doesn't say anything about such requests and newer Windows versions seem to be unaffected. This all feels like a WS2012 bug, which is, however, easy to workaround in KVM: let's flush everything when we see an empty flush request, over-flushing doesn't hurt. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-08-02MIPS: perf: Fix build with CONFIG_CPU_BMIPS5000 enabledFlorian Fainelli
commit 1b1f01b653b408ebe58fec78c566d1075d285c64 upstream. arch/mips/kernel/perf_event_mipsxx.c: In function 'mipsxx_pmu_enable_event': arch/mips/kernel/perf_event_mipsxx.c:326:21: error: unused variable 'event' [-Werror=unused-variable] struct perf_event *event = container_of(evt, struct perf_event, hw); ^~~~~ Fix this by making use of IS_ENABLED() to simplify the code and avoid unnecessary ifdefery. Fixes: 84002c88599d ("MIPS: perf: Fix perf with MT counting other threads") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Paul Burton <paul.burton@mips.com> Cc: linux-mips@linux-mips.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: James Hogan <jhogan@kernel.org> Cc: linux-kernel@vger.kernel.org Cc: linux-mips@vger.kernel.org Cc: stable@vger.kernel.org # v4.18+ Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-08-02ftrace/x86_64: Emulate call function while updating in breakpoint handlerPeter Zijlstra
commit 9e298e8604088a600d8100a111a532a9d342af09 upstream. Nicolai Stange discovered[1] that if live kernel patching is enabled, and the function tracer started tracing the same function that was patched, the conversion of the fentry call site during the translation of going from calling the live kernel patch trampoline to the iterator trampoline, would have as slight window where it didn't call anything. As live kernel patching depends on ftrace to always call its code (to prevent the function being traced from being called, as it will redirect it). This small window would allow the old buggy function to be called, and this can cause undesirable results. Nicolai submitted new patches[2] but these were controversial. As this is similar to the static call emulation issues that came up a while ago[3]. But after some debate[4][5] adding a gap in the stack when entering the breakpoint handler allows for pushing the return address onto the stack to easily emulate a call. [1] http://lkml.kernel.org/r/20180726104029.7736-1-nstange@suse.de [2] http://lkml.kernel.org/r/20190427100639.15074-1-nstange@suse.de [3] http://lkml.kernel.org/r/3cf04e113d71c9f8e4be95fb84a510f085aa4afa.1541711457.git.jpoimboe@redhat.com [4] http://lkml.kernel.org/r/CAHk-=wh5OpheSU8Em_Q3Hg8qw_JtoijxOdPtHru6d+5K8TWM=A@mail.gmail.com [5] http://lkml.kernel.org/r/CAHk-=wjvQxY4DvPrJ6haPgAa6b906h=MwZXO6G8OtiTGe=N7_w@mail.gmail.com [ Live kernel patching is not implemented on x86_32, thus the emulate calls are only for x86_64. ] Cc: Andy Lutomirski <luto@kernel.org> Cc: Nicolai Stange <nstange@suse.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: the arch/x86 maintainers <x86@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Miroslav Benes <mbenes@suse.cz> Cc: Petr Mladek <pmladek@suse.com> Cc: Joe Lawrence <joe.lawrence@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Mimi Zohar <zohar@linux.ibm.com> Cc: Juergen Gross <jgross@suse.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Nayna Jain <nayna@linux.ibm.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: "open list:KERNEL SELFTEST FRAMEWORK" <linux-kselftest@vger.kernel.org> Cc: stable@vger.kernel.org Fixes: b700e7f03df5 ("livepatch: kernel: add support for live patching") Tested-by: Nicolai Stange <nstange@suse.de> Reviewed-by: Nicolai Stange <nstange@suse.de> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> [ Changed to only implement emulated calls for x86_64 ] Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-08-02x86_64: Allow breakpoints to emulate call instructionsPeter Zijlstra
commit 4b33dadf37666c0860b88f9e52a16d07bf6d0b03 upstream. In order to allow breakpoints to emulate call instructions, they need to push the return address onto the stack. The x86_64 int3 handler adds a small gap to allow the stack to grow some. Use this gap to add the return address to be able to emulate a call instruction at the breakpoint location. These helper functions are added: int3_emulate_jmp(): changes the location of the regs->ip to return there. (The next two are only for x86_64) int3_emulate_push(): to push the address onto the gap in the stack int3_emulate_call(): push the return address and change regs->ip Cc: Andy Lutomirski <luto@kernel.org> Cc: Nicolai Stange <nstange@suse.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: the arch/x86 maintainers <x86@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Miroslav Benes <mbenes@suse.cz> Cc: Petr Mladek <pmladek@suse.com> Cc: Joe Lawrence <joe.lawrence@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Mimi Zohar <zohar@linux.ibm.com> Cc: Juergen Gross <jgross@suse.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Nayna Jain <nayna@linux.ibm.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: "open list:KERNEL SELFTEST FRAMEWORK" <linux-kselftest@vger.kernel.org> Cc: stable@vger.kernel.org Fixes: b700e7f03df5 ("livepatch: kernel: add support for live patching") Tested-by: Nicolai Stange <nstange@suse.de> Reviewed-by: Nicolai Stange <nstange@suse.de> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> [ Modified to only work for x86_64 and added comment to int3_emulate_push() ] Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-08-02x86_64: Add gap to int3 to allow for call emulationJosh Poimboeuf
commit 2700fefdb2d9751c416ad56897e27d41e409324a upstream. To allow an int3 handler to emulate a call instruction, it must be able to push a return address onto the stack. Add a gap to the stack to allow the int3 handler to push the return address and change the return from int3 to jump straight to the emulated called function target. Link: http://lkml.kernel.org/r/20181130183917.hxmti5josgq4clti@treble Link: http://lkml.kernel.org/r/20190502162133.GX2623@hirez.programming.kicks-ass.net [ Note, this is needed to allow Live Kernel Patching to not miss calling a patched function when tracing is enabled. -- Steven Rostedt ] Cc: stable@vger.kernel.org Fixes: b700e7f03df5 ("livepatch: kernel: add support for live patching") Tested-by: Nicolai Stange <nstange@suse.de> Reviewed-by: Nicolai Stange <nstange@suse.de> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-08-02parisc: Rename LEVEL to PA_ASM_LEVEL to avoid name clash with DRBD codeHelge Deller
commit 1829dda0e87f4462782ca81be474c7890efe31ce upstream. LEVEL is a very common word, and now after many years it suddenly clashed with another LEVEL define in the DRBD code. Rename it to PA_ASM_LEVEL instead. Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Helge Deller <deller@gmx.de> Cc: <stable@vger.kernel.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-08-02parisc: Use PA_ASM_LEVEL in boot codeHelge Deller
commit bdca5d64ee92abeacd6dada0bc6f6f8e6350dd67 upstream. The LEVEL define clashed with the DRBD code. Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Helge Deller <deller@gmx.de> Cc: <stable@vger.kernel.org> # v4.14+ Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-08-02parisc: Export running_on_qemu symbol for modulesHelge Deller
commit 3e1120f4b57bc12437048494ab56648edaa5b57d upstream. Signed-off-by: Helge Deller <deller@gmx.de> CC: stable@vger.kernel.org # v4.9+ Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-06-03Merge tag 'v4.18.40' into v4.18/standard/baseBruce Ashfield
This is the 4.18.40 stable release
2019-05-30perf/x86/intel: Allow PEBS multi-entry in watermark modeStephane Eranian
commit c7a286577d7592720c2f179aadfb325a1ff48c95 upstream. This patch fixes a restriction/bug introduced by: 583feb08e7f7 ("perf/x86/intel: Fix handling of wakeup_events for multi-entry PEBS") The original patch prevented using multi-entry PEBS when wakeup_events != 0. However given that wakeup_events is part of a union with wakeup_watermark, it means that in watermark mode, PEBS multi-entry is also disabled which is not the intent. This patch fixes this by checking is watermark mode is enabled. Signed-off-by: Stephane Eranian <eranian@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: jolsa@redhat.com Cc: kan.liang@intel.com Cc: vincent.weaver@maine.edu Fixes: 583feb08e7f7 ("perf/x86/intel: Fix handling of wakeup_events for multi-entry PEBS") Link: http://lkml.kernel.org/r/20190514003400.224340-1-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-05-30KVM: x86: fix return value for reserved EFERPaolo Bonzini
commit 66f61c92889ff3ca365161fb29dd36d6354682ba upstream. Commit 11988499e62b ("KVM: x86: Skip EFER vs. guest CPUID checks for host-initiated writes", 2019-04-02) introduced a "return false" in a function returning int, and anyway set_efer has a "nonzero on error" conventon so it should be returning 1. Reported-by: Pavel Machek <pavel@denx.de> Fixes: 11988499e62b ("KVM: x86: Skip EFER vs. guest CPUID checks for host-initiated writes") Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>