summaryrefslogtreecommitdiffstats
path: root/arch
AgeCommit message (Collapse)Author
2020-05-27x86/unwind/orc: Fix unwind_get_return_address_ptr() for inactive tasksJosh Poimboeuf
commit 187b96db5ca79423618dfa29a05c438c34f9e1f0 upstream. Normally, show_trace_log_lvl() scans the stack, looking for text addresses to print. In parallel, it unwinds the stack with unwind_next_frame(). If the stack address matches the pointer returned by unwind_get_return_address_ptr() for the current frame, the text address is printed normally without a question mark. Otherwise it's considered a breadcrumb (potentially from a previous call path) and it's printed with a question mark to indicate that the address is unreliable and typically can be ignored. Since the following commit: f1d9a2abff66 ("x86/unwind/orc: Don't skip the first frame for inactive tasks") ... for inactive tasks, show_trace_log_lvl() prints *only* unreliable addresses (prepended with '?'). That happens because, for the first frame of an inactive task, unwind_get_return_address_ptr() returns the wrong return address pointer: one word *below* the task stack pointer. show_trace_log_lvl() starts scanning at the stack pointer itself, so it never finds the first 'reliable' address, causing only guesses to being printed. The first frame of an inactive task isn't a normal stack frame. It's actually just an instance of 'struct inactive_task_frame' which is left behind by __switch_to_asm(). Now that this inactive frame is actually exposed to callers, fix unwind_get_return_address_ptr() to interpret it properly. Fixes: f1d9a2abff66 ("x86/unwind/orc: Don't skip the first frame for inactive tasks") Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200522135435.vbxs7umku5pyrdbk@treble Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-27powerpc/64s: Disable STRICT_KERNEL_RWXMichael Ellerman
[ Upstream commit 8659a0e0efdd975c73355dbc033f79ba3b31e82c ] Several strange crashes have been eventually traced back to STRICT_KERNEL_RWX and its interaction with code patching. Various paths in our ftrace, kprobes and other patching code need to be hardened against patching failures, otherwise we can end up running with partially/incorrectly patched ftrace paths, kprobes or jump labels, which can then cause strange crashes. Although fixes for those are in development, they're not -rc material. There also seem to be problems with the underlying strict RWX logic, which needs further debugging. So for now disable STRICT_KERNEL_RWX on 64-bit to prevent people from enabling the option and tripping over the bugs. Fixes: 1e0fc9d1eb2b ("powerpc/Kconfig: Enable STRICT_KERNEL_RWX for some configs") Cc: stable@vger.kernel.org # v4.13+ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200520133605.972649-1-mpe@ellerman.id.au Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-05-27powerpc: Remove STRICT_KERNEL_RWX incompatibility with RELOCATABLERussell Currey
[ Upstream commit c55d7b5e64265fdca45c85b639013e770bde2d0e ] I have tested this with the Radix MMU and everything seems to work, and the previous patch for Hash seems to fix everything too. STRICT_KERNEL_RWX should still be disabled by default for now. Please test STRICT_KERNEL_RWX + RELOCATABLE! Signed-off-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191224064126.183670-2-ruscur@russell.cc Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-05-27ARM: futex: Address build warningThomas Gleixner
[ Upstream commit 8101b5a1531f3390b3a69fa7934c70a8fd6566ad ] Stephen reported the following build warning on a ARM multi_v7_defconfig build with GCC 9.2.1: kernel/futex.c: In function 'do_futex': kernel/futex.c:1676:17: warning: 'oldval' may be used uninitialized in this function [-Wmaybe-uninitialized] 1676 | return oldval == cmparg; | ~~~~~~~^~~~~~~~~ kernel/futex.c:1652:6: note: 'oldval' was declared here 1652 | int oldval, ret; | ^~~~~~ introduced by commit a08971e9488d ("futex: arch_futex_atomic_op_inuser() calling conventions change"). While that change should not make any difference it confuses GCC which fails to work out that oldval is not referenced when the return value is not zero. GCC fails to properly analyze arch_futex_atomic_op_inuser(). It's not the early return, the issue is with the assembly macros. GCC fails to detect that those either set 'ret' to 0 and set oldval or set 'ret' to -EFAULT which makes oldval uninteresting. The store to the callsite supplied oldval pointer is conditional on ret == 0. The straight forward way to solve this is to make the store unconditional. Aside of addressing the build warning this makes sense anyway because it removes the conditional from the fastpath. In the error case the stored value is uninteresting and the extra store does not matter at all. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/87pncao2ph.fsf@nanos.tec.linutronix.de Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-05-27x86/apic: Move TSC deadline timer debug printkThomas Gleixner
[ Upstream commit c84cb3735fd53c91101ccdb191f2e3331a9262cb ] Leon reported that the printk_once() in __setup_APIC_LVTT() triggers a lockdep splat due to a lock order violation between hrtimer_base::lock and console_sem, when the 'once' condition is reset via /sys/kernel/debug/clear_warn_once after boot. The initial printk cannot trigger this because that happens during boot when the local APIC timer is set up on the boot CPU. Prevent it by moving the printk to a place which is guaranteed to be only called once during boot. Mark the deadline timer check related functions and data __init while at it. Reported-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/87y2qhoshi.fsf@nanos.tec.linutronix.de Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-05-27riscv: set max_pfn to the PFN of the last pageVincent Chen
commit c749bb2d554825e007cbc43b791f54e124dadfce upstream. The current max_pfn equals to zero. In this case, I found it caused users cannot get some page information through /proc such as kpagecount in v5.6 kernel because of new sanity checks. The following message is displayed by stress-ng test suite with the command "stress-ng --verbose --physpage 1 -t 1" on HiFive unleashed board. # stress-ng --verbose --physpage 1 -t 1 stress-ng: debug: [109] 4 processors online, 4 processors configured stress-ng: info: [109] dispatching hogs: 1 physpage stress-ng: debug: [109] cache allocate: reducing cache level from L3 (too high) to L0 stress-ng: debug: [109] get_cpu_cache: invalid cache_level: 0 stress-ng: info: [109] cache allocate: using built-in defaults as no suitable cache found stress-ng: debug: [109] cache allocate: default cache size: 2048K stress-ng: debug: [109] starting stressors stress-ng: debug: [109] 1 stressor spawned stress-ng: debug: [110] stress-ng-physpage: started [110] (instance 0) stress-ng: error: [110] stress-ng-physpage: cannot read page count for address 0x3fd34de000 in /proc/kpagecount, errno=0 (Success) stress-ng: error: [110] stress-ng-physpage: cannot read page count for address 0x3fd32db078 in /proc/kpagecount, errno=0 (Success) ... stress-ng: error: [110] stress-ng-physpage: cannot read page count for address 0x3fd32db078 in /proc/kpagecount, errno=0 (Success) stress-ng: debug: [110] stress-ng-physpage: exited [110] (instance 0) stress-ng: debug: [109] process [110] terminated stress-ng: info: [109] successful run completed in 1.00s # After applying this patch, the kernel can pass the test. # stress-ng --verbose --physpage 1 -t 1 stress-ng: debug: [104] 4 processors online, 4 processors configured stress-ng: info: [104] dispatching hogs: 1 physpage stress-ng: info: [104] cache allocate: using defaults, can't determine cache details from sysfs stress-ng: debug: [104] cache allocate: default cache size: 2048K stress-ng: debug: [104] starting stressors stress-ng: debug: [104] 1 stressor spawned stress-ng: debug: [105] stress-ng-physpage: started [105] (instance 0) stress-ng: debug: [105] stress-ng-physpage: exited [105] (instance 0) stress-ng: debug: [104] process [105] terminated stress-ng: info: [104] successful run completed in 1.01s # Cc: stable@vger.kernel.org Signed-off-by: Vincent Chen <vincent.chen@sifive.com> Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Yash Shah <yash.shah@sifive.com> Tested-by: Yash Shah <yash.shah@sifive.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com> [Palmer: back-ported to 4.19] Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-27KVM: SVM: Fix potential memory leak in svm_cpu_init()Miaohe Lin
commit d80b64ff297e40c2b6f7d7abc1b3eba70d22a068 upstream. When kmalloc memory for sd->sev_vmcbs failed, we forget to free the page held by sd->save_area. Also get rid of the var r as '-ENOMEM' is actually the only possible outcome here. Reviewed-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-20KVM: x86: Fix off-by-one error in kvm_vcpu_ioctl_x86_setup_mceJim Mattson
commit c4e0e4ab4cf3ec2b3f0b628ead108d677644ebd9 upstream. Bank_num is a one-based count of banks, not a zero-based index. It overflows the allocated space only when strictly greater than KVM_MAX_MCE_BANKS. Fixes: a9e38c3e01ad ("KVM: x86: Catch potential overrun in MCE setup") Signed-off-by: Jue Wang <juew@google.com> Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Peter Shier <pshier@google.com> Message-Id: <20200511225616.19557-1-jmattson@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-20ARM: dts: r8a7740: Add missing extal2 to CPG nodeGeert Uytterhoeven
commit e47cb97f153193d4b41ca8d48127da14513d54c7 upstream. The Clock Pulse Generator (CPG) device node lacks the extal2 clock. This may lead to a failure registering the "r" clock, or to a wrong parent for the "usb24s" clock, depending on MD_CK2 pin configuration and boot loader CPG_USBCKCR register configuration. This went unnoticed, as this does not affect the single upstream board configuration, which relies on the first clock input only. Fixes: d9ffd583bf345e2e ("ARM: shmobile: r8a7740: add SoC clocks to DTS") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Ulrich Hecht <uli+renesas@fpond.eu> Link: https://lore.kernel.org/r/20200508095918.6061-1-geert+renesas@glider.be Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-20arm64: dts: renesas: r8a77980: Fix IPMMU VIP[01] nodesYoshihiro Shimoda
commit f4d71c6ea9e58c07dd4d02d09c5dd9bb780ec4b1 upstream. Missing the renesas,ipmmu-main property on ipmmu_vip[01] nodes. Fixes: 55697cbb44e4 ("arm64: dts: renesas: r8a779{65,80,90}: Add IPMMU devices nodes) Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Link: https://lore.kernel.org/r/1587108543-23786-1-git-send-email-yoshihiro.shimoda.uh@renesas.com Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-20ARM: dts: r8a73a4: Add missing CMT1 interruptsGeert Uytterhoeven
commit 0f739fdfe9e5ce668bd6d3210f310df282321837 upstream. The R-Mobile APE6 Compare Match Timer 1 generates 8 interrupts, one for each channel, but currently only 1 is described. Fix this by adding the missing interrupts. Fixes: f7b65230019b9dac ("ARM: shmobile: r8a73a4: Add CMT1 node") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/20200408090926.25201-1-geert+renesas@glider.be Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-20arm64: dts: rockchip: Rename dwc3 device nodes on rk3399 to make dtc happyChen-Yu Tsai
commit 190c7f6fd43a776d4a6da1dac44408104649e9b7 upstream. The device tree compiler complains that the dwc3 nodes have regs properties but no matching unit addresses. Add the unit addresses to the device node name. While at it, also rename the nodes from "dwc3" to "usb", as guidelines require device nodes have generic names. Fixes: 7144224f2c2b ("arm64: dts: rockchip: support dwc3 USB for rk3399") Signed-off-by: Chen-Yu Tsai <wens@csie.org> Link: https://lore.kernel.org/r/20200327030414.5903-7-wens@kernel.org Signed-off-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-20arm64: dts: rockchip: Replace RK805 PMIC node name with "pmic" on rk3328 boardsChen-Yu Tsai
commit 83b994129fb4c18a8460fd395864a28740e5e7fb upstream. In some board device tree files, "rk805" was used for the RK805 PMIC's node name. However the policy for device trees is that generic names should be used. Replace the "rk805" node name with the generic "pmic" name. Fixes: 1e28037ec88e ("arm64: dts: rockchip: add rk805 node for rk3328-evb") Fixes: 955bebde057e ("arm64: dts: rockchip: add rk3328-rock64 board") Signed-off-by: Chen-Yu Tsai <wens@csie.org> Link: https://lore.kernel.org/r/20200327030414.5903-3-wens@kernel.org Signed-off-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-20x86/unwind/orc: Fix error handling in __unwind_start()Josh Poimboeuf
commit 71c95825289f585014fe9741b051d32a7a916680 upstream. The unwind_state 'error' field is used to inform the reliable unwinding code that the stack trace can't be trusted. Set this field for all errors in __unwind_start(). Also, move the zeroing out of the unwind_state struct to before the ORC table initialization check, to prevent the caller from reading uninitialized data if the ORC table is corrupted. Fixes: af085d9084b4 ("stacktrace/x86: add function for detecting reliable stack traces") Fixes: d3a09104018c ("x86/unwinder/orc: Dont bail on stack overflow") Fixes: 98d0c8ebf77e ("x86/unwind/orc: Prevent unwinding before ORC initialization") Reported-by: Pavel Machek <pavel@denx.de> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/d6ac7215a84ca92b895fdd2e1aa546729417e6e6.1589487277.git.jpoimboe@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-20x86: Fix early boot crash on gcc-10, third tryBorislav Petkov
commit a9a3ed1eff3601b63aea4fb462d8b3b92c7c1e7e upstream. ... or the odyssey of trying to disable the stack protector for the function which generates the stack canary value. The whole story started with Sergei reporting a boot crash with a kernel built with gcc-10: Kernel panic — not syncing: stack-protector: Kernel stack is corrupted in: start_secondary CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.6.0-rc5—00235—gfffb08b37df9 #139 Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./H77M—D3H, BIOS F12 11/14/2013 Call Trace: dump_stack panic ? start_secondary __stack_chk_fail start_secondary secondary_startup_64 -—-[ end Kernel panic — not syncing: stack—protector: Kernel stack is corrupted in: start_secondary This happens because gcc-10 tail-call optimizes the last function call in start_secondary() - cpu_startup_entry() - and thus emits a stack canary check which fails because the canary value changes after the boot_init_stack_canary() call. To fix that, the initial attempt was to mark the one function which generates the stack canary with: __attribute__((optimize("-fno-stack-protector"))) ... start_secondary(void *unused) however, using the optimize attribute doesn't work cumulatively as the attribute does not add to but rather replaces previously supplied optimization options - roughly all -fxxx options. The key one among them being -fno-omit-frame-pointer and thus leading to not present frame pointer - frame pointer which the kernel needs. The next attempt to prevent compilers from tail-call optimizing the last function call cpu_startup_entry(), shy of carving out start_secondary() into a separate compilation unit and building it with -fno-stack-protector, was to add an empty asm(""). This current solution was short and sweet, and reportedly, is supported by both compilers but we didn't get very far this time: future (LTO?) optimization passes could potentially eliminate this, which leads us to the third attempt: having an actual memory barrier there which the compiler cannot ignore or move around etc. That should hold for a long time, but hey we said that about the other two solutions too so... Reported-by: Sergei Trofimovich <slyfox@gentoo.org> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Kalle Valo <kvalo@codeaurora.org> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200314164451.346497-1-slyfox@gentoo.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-20ARM: dts: imx27-phytec-phycard-s-rdk: Fix the I2C1 pinctrl entriesFabio Estevam
commit 0caf34350a25907515d929a9c77b9b206aac6d1e upstream. The I2C2 pins are already used and the following errors are seen: imx27-pinctrl 10015000.iomuxc: pin MX27_PAD_I2C2_SDA already requested by 10012000.i2c; cannot claim for 1001d000.i2c imx27-pinctrl 10015000.iomuxc: pin-69 (1001d000.i2c) status -22 imx27-pinctrl 10015000.iomuxc: could not request pin 69 (MX27_PAD_I2C2_SDA) from group i2c2grp on device 10015000.iomuxc imx-i2c 1001d000.i2c: Error applying setting, reverse things back imx-i2c: probe of 1001d000.i2c failed with error -22 Fix it by adding the correct I2C1 IOMUX entries for the pinctrl_i2c1 group. Cc: <stable@vger.kernel.org> Fixes: 61664d0b432a ("ARM: dts: imx27 phyCARD-S pinctrl") Signed-off-by: Fabio Estevam <festevam@gmail.com> Reviewed-by: Stefan Riedmueller <s.riedmueller@phytec.de> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-20ARM: dts: dra7: Fix bus_dma_limit for PCIeKishon Vijay Abraham I
commit 90d4d3f4ea45370d482fa609dbae4d2281b4074f upstream. Even though commit cfb5d65f2595 ("ARM: dts: dra7: Add bus_dma_limit for L3 bus") added bus_dma_limit for L3 bus, the PCIe controller gets incorrect value of bus_dma_limit. Fix it by adding empty dma-ranges property to axi@0 and axi@1 (parent device tree node of PCIe controller). Cc: stable@kernel.org Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-20arm64: fix the flush_icache_range arguments in machine_kexecChristoph Hellwig
[ Upstream commit d51c214541c5154dda3037289ee895ea3ded5ebd ] The second argument is the end "pointer", not the length. Fixes: d28f6df1305a ("arm64/kexec: Add core kexec support") Cc: <stable@vger.kernel.org> # 4.8.x- Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-05-20riscv: fix vdso build with lldIlie Halip
[ Upstream commit 3c1918c8f54166598195d938564072664a8275b1 ] When building with the LLVM linker this error occurrs: LD arch/riscv/kernel/vdso/vdso-syms.o ld.lld: error: no input files This happens because the lld treats -R as an alias to -rpath, as opposed to ld where -R means --just-symbols. Use the long option name for compatibility between the two. Link: https://github.com/ClangBuiltLinux/linux/issues/805 Reported-by: Dmitry Golovin <dima@golovin.in> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Ilie Halip <ilie.halip@gmail.com> Reviewed-by: Fangrui Song <maskray@google.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-05-14x86/unwind/orc: Fix premature unwind stoppage due to IRET framesJosh Poimboeuf
commit 81b67439d147677d844d492fcbd03712ea438f42 upstream. The following execution path is possible: fsnotify() [ realign the stack and store previous SP in R10 ] <IRQ> [ only IRET regs saved ] common_interrupt() interrupt_entry() <NMI> [ full pt_regs saved ] ... [ unwind stack ] When the unwinder goes through the NMI and the IRQ on the stack, and then sees fsnotify(), it doesn't have access to the value of R10, because it only has the five IRET registers. So the unwind stops prematurely. However, because the interrupt_entry() code is careful not to clobber R10 before saving the full regs, the unwinder should be able to read R10 from the previously saved full pt_regs associated with the NMI. Handle this case properly. When encountering an IRET regs frame immediately after a full pt_regs frame, use the pt_regs as a backup which can be used to get the C register values. Also, note that a call frame resets the 'prev_regs' value, because a function is free to clobber the registers. For this fix to work, the IRET and full regs frames must be adjacent, with no FUNC frames in between. So replace the FUNC hint in interrupt_entry() with an IRET_REGS hint. Fixes: ee9f8fce9964 ("x86/unwind: Add the ORC unwinder") Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Jones <dsj@fb.com> Cc: Jann Horn <jannh@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lore.kernel.org/r/97a408167cc09f1cfa0de31a7b70dd88868d743f.1587808742.git.jpoimboe@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-14x86/unwind/orc: Fix error path for bad ORC entry typeJosh Poimboeuf
commit a0f81bf26888048100bf017fadf438a5bdffa8d8 upstream. If the ORC entry type is unknown, nothing else can be done other than reporting an error. Exit the function instead of breaking out of the switch statement. Fixes: ee9f8fce9964 ("x86/unwind: Add the ORC unwinder") Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Jones <dsj@fb.com> Cc: Jann Horn <jannh@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lore.kernel.org/r/a7fa668ca6eabbe81ab18b2424f15adbbfdc810a.1587808742.git.jpoimboe@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-14x86/unwind/orc: Prevent unwinding before ORC initializationJosh Poimboeuf
commit 98d0c8ebf77e0ba7c54a9ae05ea588f0e9e3f46e upstream. If the unwinder is called before the ORC data has been initialized, orc_find() returns NULL, and it tries to fall back to using frame pointers. This can cause some unexpected warnings during boot. Move the 'orc_init' check from orc_find() to __unwind_init(), so that it doesn't even try to unwind from an uninitialized state. Fixes: ee9f8fce9964 ("x86/unwind: Add the ORC unwinder") Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Jones <dsj@fb.com> Cc: Jann Horn <jannh@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lore.kernel.org/r/069d1499ad606d85532eb32ce39b2441679667d5.1587808742.git.jpoimboe@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-14x86/unwind/orc: Don't skip the first frame for inactive tasksMiroslav Benes
commit f1d9a2abff66aa8156fbc1493abed468db63ea48 upstream. When unwinding an inactive task, the ORC unwinder skips the first frame by default. If both the 'regs' and 'first_frame' parameters of unwind_start() are NULL, 'state->sp' and 'first_frame' are later initialized to the same value for an inactive task. Given there is a "less than or equal to" comparison used at the end of __unwind_start() for skipping stack frames, the first frame is skipped. Drop the equal part of the comparison and make the behavior equivalent to the frame pointer unwinder. Fixes: ee9f8fce9964 ("x86/unwind: Add the ORC unwinder") Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Jones <dsj@fb.com> Cc: Jann Horn <jannh@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lore.kernel.org/r/7f08db872ab59e807016910acdbe82f744de7065.1587808742.git.jpoimboe@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-14x86/entry/64: Fix unwind hints in rewind_stack_do_exit()Jann Horn
commit f977df7b7ca45a4ac4b66d30a8931d0434c394b1 upstream. The LEAQ instruction in rewind_stack_do_exit() moves the stack pointer directly below the pt_regs at the top of the task stack before calling do_exit(). Tell the unwinder to expect pt_regs. Fixes: 8c1f75587a18 ("x86/entry/64: Add unwind hint annotations") Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Jones <dsj@fb.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lore.kernel.org/r/68c33e17ae5963854916a46f522624f8e1d264f2.1587808742.git.jpoimboe@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-14x86/entry/64: Fix unwind hints in kernel exit pathJosh Poimboeuf
commit 1fb143634a38095b641a3a21220774799772dc4c upstream. In swapgs_restore_regs_and_return_to_usermode, after the stack is switched to the trampoline stack, the existing UNWIND_HINT_REGS hint is no longer valid, which can result in the following ORC unwinder warning: WARNING: can't dereference registers at 000000003aeb0cdd for ip swapgs_restore_regs_and_return_to_usermode+0x93/0xa0 For full correctness, we could try to add complicated unwind hints so the unwinder could continue to find the registers, but when when it's this close to kernel exit, unwind hints aren't really needed anymore and it's fine to just use an empty hint which tells the unwinder to stop. For consistency, also move the UNWIND_HINT_EMPTY in entry_SYSCALL_64_after_hwframe to a similar location. Fixes: 3e3b9293d392 ("x86/entry/64: Return to userspace from the trampoline stack") Reported-by: Vince Weaver <vincent.weaver@maine.edu> Reported-by: Dave Jones <dsj@fb.com> Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reported-by: Joe Mario <jmario@redhat.com> Reported-by: Jann Horn <jannh@google.com> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/60ea8f562987ed2d9ace2977502fe481c0d7c9a0.1587808742.git.jpoimboe@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-14x86/entry/64: Fix unwind hints in register clearing codeJosh Poimboeuf
commit 06a9750edcffa808494d56da939085c35904e618 upstream. The PUSH_AND_CLEAR_REGS macro zeroes each register immediately after pushing it. If an NMI or exception hits after a register is cleared, but before the UNWIND_HINT_REGS annotation, the ORC unwinder will wrongly think the previous value of the register was zero. This can confuse the unwinding process and cause it to exit early. Because ORC is simpler than DWARF, there are a limited number of unwind annotation states, so it's not possible to add an individual unwind hint after each push/clear combination. Instead, the register clearing instructions need to be consolidated and moved to after the UNWIND_HINT_REGS annotation. Fixes: 3f01daecd545 ("x86/entry/64: Introduce the PUSH_AND_CLEAN_REGS macro") Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Jones <dsj@fb.com> Cc: Jann Horn <jannh@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lore.kernel.org/r/68fd3d0bc92ae2d62ff7879d15d3684217d51f08.1587808742.git.jpoimboe@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-14KVM: VMX: Mark RCX, RDX and RSI as clobbered in vmx_vcpu_run()'s asm blobSean Christopherson
Based on upstream commit f3689e3f17f064fd4cd5f0cb01ae2395c94f39d9. Save RCX, RDX and RSI to fake outputs to coerce the compiler into treating them as clobbered. RCX in particular is likely to be reused by the compiler to dereference the 'struct vcpu_vmx' pointer, which will result in a null pointer dereference now that RCX is zeroed by the asm blob. Tag the asm() blob as volatile to prevent GCC from dropping the blob, which is possible now that the blob has output values, all of which are unused. Upstream commit f3689e3f17f06 ("KVM: VMX: Save RSI to an unused output in the vCPU-run asm blob") is not a direct equivalent of this patch. As its shortlog states, it only tagged RSI as clobbered, whereas here RCX and RDX are also clobbered. In upstream at the time of the offending commit (b4be98039a92 in 4.19, 0e0ab73c9a024 upstream), the inline asm blob had previously been moved to a dedicated helper, __vmx_vcpu_run(). For unrelated reasons, __vmx_vcpu_run() was put into its own optimization unit, which for all intents and purposes made it impossible to consume clobbered registers because RCX, RDX and RSI are volatile and __vmx_vcpu_run() couldn't itself be inlined. In other words, the bug existed but couldn't be hit. Similarly, the lack of "volatile" was also a bug in upstream that was hidden by an unrelated change that exists in upstream but not in 4.19. In this case, the asm blob also uses ASM_CALL_CONSTRAINT (marks RSP as being an input/output constraint) in upstream to play nice with objtool due the blob making a CALL. In 4.19, there is no CALL and thus no ASM_CALL_CONSTRAINT. Furthermore, both of the lurking bugs were blasted away in upstream by commits 5e0781df1899 ("KVM: VMX: Move vCPU-run code to a proper assembly routine") and fc2ba5a27a1a ("KVM: VMX: Call vCPU-run asm sub-routine from C and remove clobbering"), i.e. these bugs will never be directly fixed in upstream. Reported-by: Tobias Urdin <tobias.urdin@binero.com> Fixes: b4be98039a92 ("KVM: VMX: Zero out *all* general purpose registers after VM-Exit") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-14KVM: VMX: Explicitly reference RCX as the vmx_vcpu pointer in asm blobsSean Christopherson
commit 051a2d3e59e51ae49fd56aef34e472832897ce46 upstream. Use '%% " _ASM_CX"' instead of '%0' to dereference RCX, i.e. the 'struct vcpu_vmx' pointer, in the VM-Enter asm blobs of vmx_vcpu_run() and nested_vmx_check_vmentry_hw(). Using the symbolic name means that adding/removing an output parameter(s) requires "rewriting" almost all of the asm blob, which makes it nearly impossible to understand what's being changed in even the most minor patches. Opportunistically improve the code comments. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-14arm64: hugetlb: avoid potential NULL dereferenceMark Rutland
commit 027d0c7101f50cf03aeea9eebf484afd4920c8d3 upstream. The static analyzer in GCC 10 spotted that in huge_pte_alloc() we may pass a NULL pmdp into pte_alloc_map() when pmd_alloc() returns NULL: | CC arch/arm64/mm/pageattr.o | CC arch/arm64/mm/hugetlbpage.o | from arch/arm64/mm/hugetlbpage.c:10: | arch/arm64/mm/hugetlbpage.c: In function ‘huge_pte_alloc’: | ./arch/arm64/include/asm/pgtable-types.h:28:24: warning: dereference of NULL ‘pmdp’ [CWE-690] [-Wanalyzer-null-dereference] | ./arch/arm64/include/asm/pgtable.h:436:26: note: in expansion of macro ‘pmd_val’ | arch/arm64/mm/hugetlbpage.c:242:10: note: in expansion of macro ‘pte_alloc_map’ | |arch/arm64/mm/hugetlbpage.c:232:10: | |./arch/arm64/include/asm/pgtable-types.h:28:24: | ./arch/arm64/include/asm/pgtable.h:436:26: note: in expansion of macro ‘pmd_val’ | arch/arm64/mm/hugetlbpage.c:242:10: note: in expansion of macro ‘pte_alloc_map’ This can only occur when the kernel cannot allocate a page, and so is unlikely to happen in practice before other systems start failing. We can avoid this by bailing out if pmd_alloc() fails, as we do earlier in the function if pud_alloc() fails. Fixes: 66b3923a1a0f ("arm64: hugetlb: add support for PTE contiguous bit") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reported-by: Kyrill Tkachov <kyrylo.tkachov@arm.com> Cc: <stable@vger.kernel.org> # 4.5.x- Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-14KVM: arm64: Fix 32bit PC wrap-aroundMarc Zyngier
commit 0225fd5e0a6a32af7af0aefac45c8ebf19dc5183 upstream. In the unlikely event that a 32bit vcpu traps into the hypervisor on an instruction that is located right at the end of the 32bit range, the emulation of that instruction is going to increment PC past the 32bit range. This isn't great, as userspace can then observe this value and get a bit confused. Conversly, userspace can do things like (in the context of a 64bit guest that is capable of 32bit EL0) setting PSTATE to AArch64-EL0, set PC to a 64bit value, change PSTATE to AArch32-USR, and observe that PC hasn't been truncated. More confusion. Fix both by: - truncating PC increments for 32bit guests - sanitizing all 32bit regs every time a core reg is changed by userspace, and that PSTATE indicates a 32bit mode. Cc: stable@vger.kernel.org Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-10hexagon: define ioremap_ucNick Desaulniers
commit 7312b70699252074d753c5005fc67266c547bbe3 upstream. Similar to commit 38e45d81d14e ("sparc64: implement ioremap_uc") define ioremap_uc for hexagon to avoid errors from -Wimplicit-function-definition. Link: http://lkml.kernel.org/r/20191209222956.239798-2-ndesaulniers@google.com Link: https://github.com/ClangBuiltLinux/linux/issues/797 Fixes: e537654b7039 ("lib: devres: add a helper function for ioremap_uc") Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Suggested-by: Nathan Chancellor <natechancellor@gmail.com> Acked-by: Brian Cain <bcain@codeaurora.org> Cc: Lee Jones <lee.jones@linaro.org> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Tuowen Zhao <ztuowen@gmail.com> Cc: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Alexios Zavras <alexios.zavras@intel.com> Cc: Allison Randal <allison@lohutok.net> Cc: Will Deacon <will@kernel.org> Cc: Richard Fontana <rfontana@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-10hexagon: clean up ioremapChristoph Hellwig
commit ac32292c8552f7e8517be184e65dd09786e991f9 upstream. Use ioremap as the main implemented function, and defined ioremap_nocache to it as a deprecated alias. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-10s390/ftrace: fix potential crashes when switching tracersPhilipp Rudo
[ Upstream commit 8ebf6da9db1b2a20bb86cc1bee2552e894d03308 ] Switching tracers include instruction patching. To prevent that a instruction is patched while it's read the instruction patching is done in stop_machine 'context'. This also means that any function called during stop_machine must not be traced. Thus add 'notrace' to all functions called within stop_machine. Fixes: 1ec2772e0c3c ("s390/diag: add a statistic for diagnose calls") Fixes: 38f2c691a4b3 ("s390: improve wait logic of stop_machine") Fixes: 4ecf0a43e729 ("processor: get rid of cpu_relax_yield") Signed-off-by: Philipp Rudo <prudo@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-05-10powerpc/pci/of: Parse unassigned resourcesAlexey Kardashevskiy
commit dead1c845dbe97e0061dae2017eaf3bd8f8f06ee upstream. The pseries platform uses the PCI_PROBE_DEVTREE method of PCI probing which reads "assigned-addresses" of every PCI device and initializes the device resources. However if the property is missing or zero sized, then there is no fallback of any kind and the PCI resources remain undiscovered, i.e. pdev->resource[] array remains empty. This adds a fallback which parses the "reg" property in pretty much same way except it marks resources as "unset" which later make Linux assign those resources proper addresses. This has an effect when: 1. a hypervisor failed to assign any resource for a device; 2. /chosen/linux,pci-probe-only=0 is in the DT so the system may try assigning a resource. Neither is likely to happen under PowerVM. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-02bpf, x86_32: Fix clobbering of dst for BPF_JSETLuke Nelson
commit 50fe7ebb6475711c15b3397467e6424e20026d94 upstream. The current JIT clobbers the destination register for BPF_JSET BPF_X and BPF_K by using "and" and "or" instructions. This is fine when the destination register is a temporary loaded from a register stored on the stack but not otherwise. This patch fixes the problem (for both BPF_K and BPF_X) by always loading the destination register into temporaries since BPF_JSET should not modify the destination register. This bug may not be currently triggerable as BPF_REG_AX is the only register not stored on the stack and the verifier uses it in a limited way. Fixes: 03f5781be2c7b ("bpf, x86_32: add eBPF JIT compiler for ia32") Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Wang YanQing <udknight@gmail.com> Link: https://lore.kernel.org/bpf/20200422173630.8351-2-luke.r.nels@gmail.com Signed-off-by: Wang YanQing <udknight@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-02arm64: Delete the space separator in __emit_instFangrui Song
[ Upstream commit c9a4ef66450145a356a626c833d3d7b1668b3ded ] In assembly, many instances of __emit_inst(x) expand to a directive. In a few places __emit_inst(x) is used as an assembler macro argument. For example, in arch/arm64/kvm/hyp/entry.S ALTERNATIVE(nop, SET_PSTATE_PAN(1), ARM64_HAS_PAN, CONFIG_ARM64_PAN) expands to the following by the C preprocessor: alternative_insn nop, .inst (0xd500401f | ((0) << 16 | (4) << 5) | ((!!1) << 8)), 4, 1 Both comma and space are separators, with an exception that content inside a pair of parentheses/quotes is not split, so the clang integrated assembler splits the arguments to: nop, .inst, (0xd500401f | ((0) << 16 | (4) << 5) | ((!!1) << 8)), 4, 1 GNU as preprocesses the input with do_scrub_chars(). Its arm64 backend (along with many other non-x86 backends) sees: alternative_insn nop,.inst(0xd500401f|((0)<<16|(4)<<5)|((!!1)<<8)),4,1 # .inst(...) is parsed as one argument while its x86 backend sees: alternative_insn nop,.inst (0xd500401f|((0)<<16|(4)<<5)|((!!1)<<8)),4,1 # The extra space before '(' makes the whole .inst (...) parsed as two arguments The non-x86 backend's behavior is considered unintentional (https://sourceware.org/bugzilla/show_bug.cgi?id=25750). So drop the space separator inside `.inst (...)` to make the clang integrated assembler work. Suggested-by: Ilie Halip <ilie.halip@gmail.com> Signed-off-by: Fangrui Song <maskray@google.com> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Link: https://github.com/ClangBuiltLinux/linux/issues/939 Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-05-02x86: hyperv: report value of misc_featuresOlaf Hering
[ Upstream commit 97d9f1c43bedd400301d6f1eff54d46e8c636e47 ] A few kernel features depend on ms_hyperv.misc_features, but unlike its siblings ->features and ->hints, the value was never reported during boot. Signed-off-by: Olaf Hering <olaf@aepfle.de> Link: https://lore.kernel.org/r/20200407172739.31371-1-olaf@aepfle.de Signed-off-by: Wei Liu <wei.liu@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-05-02bpf, x86: Fix encoding for lower 8-bit registers in BPF_STX BPF_BLuke Nelson
[ Upstream commit aee194b14dd2b2bde6252b3acf57d36dccfc743a ] This patch fixes an encoding bug in emit_stx for BPF_B when the source register is BPF_REG_FP. The current implementation for BPF_STX BPF_B in emit_stx saves one REX byte when the operands can be encoded using Mod-R/M alone. The lower 8 bits of registers %rax, %rbx, %rcx, and %rdx can be accessed without using a REX prefix via %al, %bl, %cl, and %dl, respectively. Other registers, (e.g., %rsi, %rdi, %rbp, %rsp) require a REX prefix to use their 8-bit equivalents (%sil, %dil, %bpl, %spl). The current code checks if the source for BPF_STX BPF_B is BPF_REG_1 or BPF_REG_2 (which map to %rdi and %rsi), in which case it emits the required REX prefix. However, it misses the case when the source is BPF_REG_FP (mapped to %rbp). The result is that BPF_STX BPF_B with BPF_REG_FP as the source operand will read from register %ch instead of the correct %bpl. This patch fixes the problem by fixing and refactoring the check on which registers need the extra REX byte. Since no BPF registers map to %rsp, there is no need to handle %spl. Fixes: 622582786c9e0 ("net: filter: x86: internal BPF JIT") Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200418232655.23870-1-luke.r.nels@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-05-02bpf, x86_32: Fix incorrect encoding in BPF_LDX zero-extensionLuke Nelson
commit 5fa9a98fb10380e48a398998cd36a85e4ef711d6 upstream. The current JIT uses the following sequence to zero-extend into the upper 32 bits of the destination register for BPF_LDX BPF_{B,H,W}, when the destination register is not on the stack: EMIT3(0xC7, add_1reg(0xC0, dst_hi), 0); The problem is that C7 /0 encodes a MOV instruction that requires a 4-byte immediate; the current code emits only 1 byte of the immediate. This means that the first 3 bytes of the next instruction will be treated as the rest of the immediate, breaking the stream of instructions. This patch fixes the problem by instead emitting "xor dst_hi,dst_hi" to clear the upper 32 bits. This fixes the problem and is more efficient than using MOV to load a zero immediate. This bug may not be currently triggerable as BPF_REG_AX is the only register not stored on the stack and the verifier uses it in a limited way, and the verifier implements a zero-extension optimization. But the JIT should avoid emitting incorrect encodings regardless. Fixes: 03f5781be2c7b ("bpf, x86_32: add eBPF JIT compiler for ia32") Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: H. Peter Anvin (Intel) <hpa@zytor.com> Acked-by: Wang YanQing <udknight@gmail.com> Link: https://lore.kernel.org/bpf/20200422173630.8351-1-luke.r.nels@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-02ARM: dts: bcm283x: Disable dsi0 nodeNicolas Saenz Julienne
commit 90444b958461a5f8fc299ece0fe17eab15cba1e1 upstream. Since its inception the module was meant to be disabled by default, but the original commit failed to add the relevant property. Fixes: 4aba4cf82054 ("ARM: dts: bcm2835: Add the DSI module nodes and clocks") Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de> Reviewed-by: Eric Anholt <eric@anholt.net> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-29s390/mm: fix page table upgrade vs 2ndary address mode accessesChristian Borntraeger
commit 316ec154810960052d4586b634156c54d0778f74 upstream. A page table upgrade in a kernel section that uses secondary address mode will mess up the kernel instructions as follows: Consider the following scenario: two threads are sharing memory. On CPU1 thread 1 does e.g. strnlen_user(). That gets to old_fs = enable_sacf_uaccess(); len = strnlen_user_srst(src, size); and " la %2,0(%1)\n" " la %3,0(%0,%1)\n" " slgr %0,%0\n" " sacf 256\n" "0: srst %3,%2\n" in strnlen_user_srst(). At that point we are in secondary space mode, control register 1 points to kernel page table and instruction fetching happens via c1, rather than usual c13. Interrupts are not disabled, for obvious reasons. On CPU2 thread 2 does MAP_FIXED mmap(), forcing the upgrade of page table from 3-level to e.g. 4-level one. We'd allocated new top-level table, set it up and now we hit this: notify = 1; spin_unlock_bh(&mm->page_table_lock); } if (notify) on_each_cpu(__crst_table_upgrade, mm, 0); OK, we need to actually change over to use of new page table and we need that to happen in all threads that are currently running. Which happens to include the thread 1. IPI is delivered and we have static void __crst_table_upgrade(void *arg) { struct mm_struct *mm = arg; if (current->active_mm == mm) set_user_asce(mm); __tlb_flush_local(); } run on CPU1. That does static inline void set_user_asce(struct mm_struct *mm) { S390_lowcore.user_asce = mm->context.asce; OK, user page table address updated... __ctl_load(S390_lowcore.user_asce, 1, 1); ... and control register 1 set to it. clear_cpu_flag(CIF_ASCE_PRIMARY); } IPI is run in home space mode, so it's fine - insns are fetched using c13, which always points to kernel page table. But as soon as we return from the interrupt, previous PSW is restored, putting CPU1 back into secondary space mode, at which point we no longer get the kernel instructions from the kernel mapping. The fix is to only fixup the control registers that are currently in use for user processes during the page table update. We must also disable interrupts in enable_sacf_uaccess to synchronize the cr and thread.mm_segment updates against the on_each-cpu. Fixes: 0aaba41b58bc ("s390: remove all code using the access register mode") Cc: stable@vger.kernel.org # 4.15+ Reported-by: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> References: CVE-2020-11884 Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-29powerpc/setup_64: Set cache-line-size based on cache-block-sizeChris Packham
commit 94c0b013c98583614e1ad911e8795ca36da34a85 upstream. If {i,d}-cache-block-size is set and {i,d}-cache-line-size is not, use the block-size value for both. Per the devicetree spec cache-line-size is only needed if it differs from the block size. Originally the code would fallback from block size to line size. An error message was printed if both properties were missing. Later the code was refactored to use clearer names and logic but it inadvertently made line size a required property, meaning on systems without a line size property we fall back to the default from the cputable. On powernv (OPAL) platforms, since the introduction of device tree CPU features (5a61ef74f269 ("powerpc/64s: Support new device tree binding for discovering CPU features")), that has led to the wrong value being used, as the fallback value is incorrect for Power8/Power9 CPUs. The incorrect values flow through to the VDSO and also to the sysconf values, SC_LEVEL1_ICACHE_LINESIZE etc. Fixes: bd067f83b084 ("powerpc/64: Fix naming of cache block vs. cache line") Cc: stable@vger.kernel.org # v4.11+ Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Reported-by: Qian Cai <cai@lca.pw> [mpe: Add even more detail to change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200416221908.7886-1-chris.packham@alliedtelesis.co.nz Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-29ARM: imx: provide v7_cpu_resume() only on ARM_CPU_SUSPEND=yAhmad Fatoum
commit f1baca8896ae18e12c45552a4c4ae2086aa7e02c upstream. 512a928affd5 ("ARM: imx: build v7_cpu_resume() unconditionally") introduced an unintended linker error for i.MX6 configurations that have ARM_CPU_SUSPEND=n which can happen if neither CONFIG_PM, CONFIG_CPU_IDLE, nor ARM_PSCI_FW are selected. Fix this by having v7_cpu_resume() compiled only when cpu_resume() it calls is available as well. The C declaration for the function remains unguarded to avoid future code inadvertently using a stub and introducing a regression to the bug the original commit fixed. Cc: <stable@vger.kernel.org> Fixes: 512a928affd5 ("ARM: imx: build v7_cpu_resume() unconditionally") Reported-by: Clemens Gruber <clemens.gruber@pqgruber.com> Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de> Tested-by: Roland Hieber <rhi@pengutronix.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-29KVM: VMX: Enable machine check support for 32bit targetsUros Bizjak
commit fb56baae5ea509e63c2a068d66a4d8ea91969fca upstream. There is no reason to limit the use of do_machine_check to 64bit targets. MCE handling works for both target familes. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: stable@vger.kernel.org Fixes: a0861c02a981 ("KVM: Add VT-x machine check support") Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Message-Id: <20200414071414.45636-1-ubizjak@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-29KVM: s390: Return last valid slot if approx index is out-of-boundsSean Christopherson
commit 97daa028f3f621adff2c4f7b15fe0874e5b5bd6c upstream. Return the index of the last valid slot from gfn_to_memslot_approx() if its binary search loop yielded an out-of-bounds index. The index can be out-of-bounds if the specified gfn is less than the base of the lowest memslot (which is also the last valid memslot). Note, the sole caller, kvm_s390_get_cmma(), ensures used_slots is non-zero. Fixes: afdad61615cc3 ("KVM: s390: Fix storage attributes migration with memory slots") Cc: stable@vger.kernel.org # 4.19.x: 0774a964ef56: KVM: Fix out of range accesses to memslots Cc: stable@vger.kernel.org # 4.19.x Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200408064059.8957-3-sean.j.christopherson@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-29x86/KVM: Clean up host's steal time structureBoris Ostrovsky
commit a6bd811f1209fe1c64c9f6fd578101d6436c6b6e upstream. Now that we are mapping kvm_steal_time from the guest directly we don't need keep a copy of it in kvm_vcpu_arch.st. The same is true for the stime field. This is part of CVE-2019-3016. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-29x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is not missedBoris Ostrovsky
commit b043138246a41064527cf019a3d51d9f015e9796 upstream. There is a potential race in record_steal_time() between setting host-local vcpu->arch.st.steal.preempted to zero (i.e. clearing KVM_VCPU_PREEMPTED) and propagating this value to the guest with kvm_write_guest_cached(). Between those two events the guest may still see KVM_VCPU_PREEMPTED in its copy of kvm_steal_time, set KVM_VCPU_FLUSH_TLB and assume that hypervisor will do the right thing. Which it won't. Instad of copying, we should map kvm_steal_time and that will guarantee atomicity of accesses to @preempted. This is part of CVE-2019-3016. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [bwh: Backported to 4.19: No tracepoint in record_steal_time().] Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-29x86/kvm: Cache gfn to pfn translationBoris Ostrovsky
commit 917248144db5d7320655dbb41d3af0b8a0f3d589 upstream. __kvm_map_gfn()'s call to gfn_to_pfn_memslot() is * relatively expensive * in certain cases (such as when done from atomic context) cannot be called Stashing gfn-to-pfn mapping should help with both cases. This is part of CVE-2019-3016. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-29KVM: nVMX: Always sync GUEST_BNDCFGS when it comes from vmcs01Sean Christopherson
commit 3b013a2972d5bc344d6eaa8f24fdfe268211e45f upstream. If L1 does not set VM_ENTRY_LOAD_BNDCFGS, then L1's BNDCFGS value must be propagated to vmcs02 since KVM always runs with VM_ENTRY_LOAD_BNDCFGS when MPX is supported. Because the value effectively comes from vmcs01, vmcs02 must be updated even if vmcs12 is clean. Fixes: 62cf9bd8118c4 ("KVM: nVMX: Fix emulation of VM_ENTRY_LOAD_BNDCFGS") Cc: Liran Alon <liran.alon@oracle.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [bwh: Backported to 4.19: adjust filename, context] Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-29KVM: VMX: Zero out *all* general purpose registers after VM-ExitSean Christopherson
commit 0e0ab73c9a0243736bcd779b30b717e23ba9a56d upstream. ...except RSP, which is restored by hardware as part of VM-Exit. Paolo theorized that restoring registers from the stack after a VM-Exit in lieu of zeroing them could lead to speculative execution with the guest's values, e.g. if the stack accesses miss the L1 cache[1]. Zeroing XORs are dirt cheap, so just be ultra-paranoid. Note that the scratch register (currently RCX) used to save/restore the guest state is also zeroed as its host-defined value is loaded via the stack, just with a MOV instead of a POP. [1] https://patchwork.kernel.org/patch/10771539/#22441255 Fixes: 0cb5b30698fd ("kvm: vmx: Scrub hardware GPRs at VM-exit") Cc: Jim Mattson <jmattson@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [bwh: Backported to 4.19: adjust filename, context] Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>