summaryrefslogtreecommitdiffstats
path: root/arch
AgeCommit message (Collapse)Author
2016-02-19parisc: Fix __ARCH_SI_PREAMBLE_SIZEHelge Deller
commit e60fc5aa608eb38b47ba4ee058f306f739eb70a0 upstream. On a 64bit kernel build the compiler aligns the _sifields union in the struct siginfo_t on a 64bit address. The __ARCH_SI_PREAMBLE_SIZE define compensates for this alignment and thus fixes the wait testcase of the strace package. The symptoms of a wrong __ARCH_SI_PREAMBLE_SIZE value is that _sigchld.si_stime variable is missed to be copied and thus after a copy_siginfo() will have uninitialized values. Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-19parisc: Fix syscall restartsHelge Deller
commit 71a71fb5374a23be36a91981b5614590b9e722c3 upstream. On parisc syscalls which are interrupted by signals sometimes failed to restart and instead returned -ENOSYS which in the worst case lead to userspace crashes. A similiar problem existed on MIPS and was fixed by commit e967ef02 ("MIPS: Fix restart of indirect syscalls"). On parisc the current syscall restart code assumes that all syscall callers load the syscall number in the delay slot of the ble instruction. That's how it is e.g. done in the unistd.h header file: ble 0x100(%sr2, %r0) ldi #syscall_nr, %r20 Because of that assumption the current code never restored %r20 before returning to userspace. This assumption is at least not true for code which uses the glibc syscall() function, which instead uses this syntax: ble 0x100(%sr2, %r0) copy regX, %r20 where regX depend on how the compiler optimizes the code and register usage. This patch fixes this problem by adding code to analyze how the syscall number is loaded in the delay branch and - if needed - copy the syscall number to regX prior returning to userspace for the syscall restart. Signed-off-by: Helge Deller <deller@gmx.de> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-19parisc: Drop unused MADV_xxxK_PAGES flags from asm/mman.hHelge Deller
commit dcbf0d299c00ed4f82ea8d6e359ad88a5182f9b8 upstream. Drop the MADV_xxK_PAGES flags, which were never used and were from a proposed API which was never integrated into the generic Linux kernel code. Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-19sh64: fix __NR_fgetxattrDmitry V. Levin
commit 2d33fa1059da4c8e816627a688d950b613ec0474 upstream. According to arch/sh/kernel/syscalls_64.S and common sense, __NR_fgetxattr has to be defined to 259, but it doesn't. Instead, it's defined to 269, which is of course used by another syscall, __NR_sched_setaffinity in this case. This bug was found by strace test suite. Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Acked-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-28mn10300: Select CONFIG_HAVE_UID16 to fix build failureGuenter Roeck
commit c86576ea114a9a881cf7328dc7181052070ca311 upstream. mn10300 builds fail with fs/stat.c: In function 'cp_old_stat': fs/stat.c:163:2: error: 'old_uid_t' undeclared ipc/util.c: In function 'ipc64_perm_to_ipc_perm': ipc/util.c:540:2: error: 'old_uid_t' undeclared Select CONFIG_HAVE_UID16 and remove local definition of CONFIG_UID16 to fix the problem. Fixes: fbc416ff8618 ("arm64: fix building without CONFIG_UID16") Cc: Arnd Bergmann <arnd@arndb.de> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-28openrisc: fix CONFIG_UID16 settingAndrew Morton
commit 04ea1e91f85615318ea91ce8ab50cb6a01ee4005 upstream. openrisc-allnoconfig: kernel/uid16.c: In function 'SYSC_setgroups16': kernel/uid16.c:184:2: error: implicit declaration of function 'groups_alloc' kernel/uid16.c:184:13: warning: assignment makes pointer from integer without a cast openrisc shouldn't be setting CONFIG_UID16 when CONFIG_MULTIUSER=n. Fixes: 2813893f8b197a1 ("kernel: conditionally support non-root users, groups and capabilities") Reported-by: Fengguang Wu <fengguang.wu@gmail.com> Cc: Iulia Manda <iulia.manda21@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-28arm64: mm: ensure that the zero page is visible to the page table walkerWill Deacon
commit 32d6397805d00573ce1fa55f408ce2bca15b0ad3 upstream. In paging_init, we allocate the zero page, memset it to zero and then point TTBR0 to it in order to avoid speculative fetches through the identity mapping. In order to guarantee that the freshly zeroed page is indeed visible to the page table walker, we need to execute a dsb instruction prior to writing the TTBR. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-28arm64: Clear out any singlestep state on a ptrace detach operationJohn Blackwood
commit 5db4fd8c52810bd9740c1240ebf89223b171aa70 upstream. Make sure to clear out any ptrace singlestep state when a ptrace(2) PTRACE_DETACH call is made on arm64 systems. Otherwise, the previously ptraced task will die off with a SIGTRAP signal if the debugger just previously singlestepped the ptraced task. Signed-off-by: John Blackwood <john.blackwood@ccur.com> [will: added comment to justify why this is in the arch code] Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-28powerpc: Make {cmp}xchg* and their atomic_ versions fully orderedBoqun Feng
commit 81d7a3294de7e9828310bbf986a67246b13fa01e upstream. According to memory-barriers.txt, xchg*, cmpxchg* and their atomic_ versions all need to be fully ordered, however they are now just RELEASE+ACQUIRE, which are not fully ordered. So also replace PPC_RELEASE_BARRIER and PPC_ACQUIRE_BARRIER with PPC_ATOMIC_ENTRY_BARRIER and PPC_ATOMIC_EXIT_BARRIER in __{cmp,}xchg_{u32,u64} respectively to guarantee fully ordered semantics of atomic{,64}_{cmp,}xchg() and {cmp,}xchg(), as a complement of commit b97021f85517 ("powerpc: Fix atomic_xxx_return barrier semantics") This patch depends on patch "powerpc: Make value-returning atomics fully ordered" for PPC_ATOMIC_ENTRY_BARRIER definition. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-28powerpc: Make value-returning atomics fully orderedBoqun Feng
commit 49e9cf3f0c04bf76ffa59242254110309554861d upstream. According to memory-barriers.txt: > Any atomic operation that modifies some state in memory and returns > information about the state (old or new) implies an SMP-conditional > general memory barrier (smp_mb()) on each side of the actual > operation ... Which mean these operations should be fully ordered. However on PPC, PPC_ATOMIC_ENTRY_BARRIER is the barrier before the actual operation, which is currently "lwsync" if SMP=y. The leading "lwsync" can not guarantee fully ordered atomics, according to Paul Mckenney: https://lkml.org/lkml/2015/10/14/970 To fix this, we define PPC_ATOMIC_ENTRY_BARRIER as "sync" to guarantee the fully-ordered semantics. This also makes futex atomics fully ordered, which can avoid possible memory ordering problems if userspace code relies on futex system call for fully ordered semantics. Fixes: b97021f85517 ("powerpc: Fix atomic_xxx_return barrier semantics") Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-28powerpc/tm: Block signal return setting invalid MSR stateMichael Neuling
commit d2b9d2a5ad5ef04ff978c9923d19730cb05efd55 upstream. Currently we allow both the MSR T and S bits to be set by userspace on a signal return. Unfortunately this is a reserved configuration and will cause a TM Bad Thing exception if attempted (via rfid). This patch checks for this case in both the 32 and 64 bit signals code. If both T and S are set, we mark the context as invalid. Found using a syscall fuzzer. Fixes: 2b0a576d15e0 ("powerpc: Add new transactional memory state to the signal context") Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-28x86/boot: Double BOOT_HEAP_SIZE to 64KBH.J. Lu
commit 8c31902cffc4d716450be549c66a67a8a3dd479c upstream. When decompressing kernel image during x86 bootup, malloc memory for ELF program headers may run out of heap space, which leads to system halt. This patch doubles BOOT_HEAP_SIZE to 64KB. Tested with 32-bit kernel which failed to boot without this patch. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-28x86/reboot/quirks: Add iMac10,1 to pci_reboot_dmi_table[]Mario Kleiner
commit 2f0c0b2d96b1205efb14347009748d786c2d9ba5 upstream. Without the reboot=pci method, the iMac 10,1 simply hangs after printing "Restarting system" at the point when it should reboot. This fixes it. Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1450466646-26663-1-git-send-email-mario.kleiner.de@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-28KVM: PPC: Book3S HV: Prohibit setting illegal transaction state in MSRPaul Mackerras
commit c20875a3e638e4a03e099b343ec798edd1af5cc6 upstream. Currently it is possible for userspace (e.g. QEMU) to set a value for the MSR for a guest VCPU which has both of the TS bits set, which is an illegal combination. The result of this is that when we execute a hrfid (hypervisor return from interrupt doubleword) instruction to enter the guest, the CPU will take a TM Bad Thing type of program interrupt (vector 0x700). Now, if PR KVM is configured in the kernel along with HV KVM, we actually handle this without crashing the host or giving hypervisor privilege to the guest; instead what happens is that we deliver a program interrupt to the guest, with SRR0 reflecting the address of the hrfid instruction and SRR1 containing the MSR value at that point. If PR KVM is not configured in the kernel, then we try to run the host's program interrupt handler with the MMU set to the guest context, which almost certainly causes a host crash. This closes the hole by making kvmppc_set_msr_hv() check for the illegal combination and force the TS field to a safe value (00, meaning non-transactional). Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-28x86/xen: don't reset vcpu_info on a cancelled suspendOuyang Zhaowei (Charles)
commit 6a1f513776b78c994045287073e55bae44ed9f8c upstream. On a cancelled suspend the vcpu_info location does not change (it's still in the per-cpu area registered by xen_vcpu_setup()). So do not call xen_hvm_init_shared_info() which would make the kernel think its back in the shared info. With the wrong vcpu_info, events cannot be received and the domain will hang after a cancelled suspend. Signed-off-by: Charles Ouyang <ouyangzhaowei@huawei.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-28x86/signal: Fix restart_syscall number for x32 tasksDmitry V. Levin
commit 22eab1108781eff09961ae7001704f7bd8fb1dce upstream. When restarting a syscall with regs->ax == -ERESTART_RESTARTBLOCK, regs->ax is assigned to a restart_syscall number. For x32 tasks, this syscall number must have __X32_SYSCALL_BIT set, otherwise it will be an x86_64 syscall number instead of a valid x32 syscall number. This issue has been there since the introduction of x32. Reported-by: strace/tests/restart_syscall.test Reported-and-tested-by: Elvira Khabirova <lineprinter0@gmail.com> Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Cc: Elvira Khabirova <lineprinter0@gmail.com> Link: http://lkml.kernel.org/r/20151130215436.GA25996@altlinux.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-09arm64: Fix compat register mappingsRobin Murphy
commit 5accd17d0eb523350c9ef754d655e379c9bb93b3 upstream. For reasons not entirely apparent, but now enshrined in history, the architectural mapping of AArch32 banked registers to AArch64 registers actually orders SP_<mode> and LR_<mode> backwards compared to the intuitive r13/r14 order, for all modes except FIQ. Fix the compat_<reg>_<mode> macros accordingly, in the hope of avoiding subtle bugs with KVM and AArch32 guests. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-09x86/cpu: Fix SMAP check in PVOPS environmentsAndrew Cooper
commit 581b7f158fe0383b492acd1ce3fb4e99d4e57808 upstream. There appears to be no formal statement of what pv_irq_ops.save_fl() is supposed to return precisely. Native returns the full flags, while lguest and Xen only return the Interrupt Flag, and both have comments by the implementations stating that only the Interrupt Flag is looked at. This may have been true when initially implemented, but no longer is. To make matters worse, the Xen PVOP leaves the upper bits undefined, making the BUG_ON() undefined behaviour. Experimentally, this now trips for 32bit PV guests on Broadwell hardware. The BUG_ON() is consistent for an individual build, but not consistent for all builds. It has also been a sitting timebomb since SMAP support was introduced. Use native_save_fl() instead, which will obtain an accurate view of the AC flag. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Tested-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: <lguest@lists.ozlabs.org> Cc: Xen-devel <xen-devel@lists.xen.org> Link: http://lkml.kernel.org/r/1433323874-6927-1-git-send-email-andrew.cooper3@citrix.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-09x86/cpu: Call verify_cpu() after having entered long mode tooBorislav Petkov
commit 04633df0c43d710e5f696b06539c100898678235 upstream. When we get loaded by a 64-bit bootloader, kernel entry point is startup_64 in head_64.S. We don't trust any and all bootloaders because some will fiddle with CPU configuration so we go ahead and massage each CPU into sanity again. For example, some dell BIOSes have this XD disable feature which set IA32_MISC_ENABLE[34] and disable NX. This might be some dumb workaround for other OSes but Linux sure doesn't need it. A similar thing is present in the Surface 3 firmware - see https://bugzilla.kernel.org/show_bug.cgi?id=106051 - which sets this bit only on the BSP: # rdmsr -a 0x1a0 400850089 850089 850089 850089 I know, right?! There's not even an off switch in there. So fix all those cases by sanitizing the 64-bit entry point too. For that, make verify_cpu() callable in 64-bit mode also. Requested-and-debugged-by: "H. Peter Anvin" <hpa@zytor.com> Reported-and-tested-by: Bastien Nocera <bugzilla@hadess.net> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1446739076-21303-1-git-send-email-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-09x86/setup: Fix low identity map for >= 2GB kernel rangeKrzysztof Mazur
commit 68accac392d859d24adcf1be3a90e41f978bd54c upstream. The commit f5f3497cad8c extended the low identity mapping. However, if the kernel uses more than 2 GB (VMSPLIT_2G_OPT or VMSPLIT_1G memory split), the normal memory mapping is overwritten by the low identity mapping causing a crash. To avoid overwritting, limit the low identity map to cover only memory before kernel range (PAGE_OFFSET). Fixes: f5f3497cad8c "x86/setup: Extend low identity map to cover whole kernel range Signed-off-by: Krzysztof Mazur <krzysiek@podlesie.net> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Laszlo Ersek <lersek@redhat.com> Cc: Matt Fleming <matt.fleming@intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Link: http://lkml.kernel.org/r/1446815916-22105-1-git-send-email-krzysiek@podlesie.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-09x86/setup: Extend low identity map to cover whole kernel rangePaolo Bonzini
commit f5f3497cad8c8416a74b9aaceb127908755d020a upstream. On 32-bit systems, the initial_page_table is reused by efi_call_phys_prolog as an identity map to call SetVirtualAddressMap. efi_call_phys_prolog takes care of converting the current CPU's GDT to a physical address too. For PAE kernels the identity mapping is achieved by aliasing the first PDPE for the kernel memory mapping into the first PDPE of initial_page_table. This makes the EFI stub's trick "just work". However, for non-PAE kernels there is no guarantee that the identity mapping in the initial_page_table extends as far as the GDT; in this case, accesses to the GDT will cause a page fault (which quickly becomes a triple fault). Fix this by copying the kernel mappings from swapper_pg_dir to initial_page_table twice, both at PAGE_OFFSET and at identity mapping. For some reason, this is only reproducible with QEMU's dynamic translation mode, and not for example with KVM. However, even under KVM one can clearly see that the page table is bogus: $ qemu-system-i386 -pflash OVMF.fd -M q35 vmlinuz0 -s -S -daemonize $ gdb (gdb) target remote localhost:1234 (gdb) hb *0x02858f6f Hardware assisted breakpoint 1 at 0x2858f6f (gdb) c Continuing. Breakpoint 1, 0x02858f6f in ?? () (gdb) monitor info registers ... GDT= 0724e000 000000ff IDT= fffbb000 000007ff CR0=0005003b CR2=ff896000 CR3=032b7000 CR4=00000690 ... The page directory is sane: (gdb) x/4wx 0x32b7000 0x32b7000: 0x03398063 0x03399063 0x0339a063 0x0339b063 (gdb) x/4wx 0x3398000 0x3398000: 0x00000163 0x00001163 0x00002163 0x00003163 (gdb) x/4wx 0x3399000 0x3399000: 0x00400003 0x00401003 0x00402003 0x00403003 but our particular page directory entry is empty: (gdb) x/1wx 0x32b7000 + (0x724e000 >> 22) * 4 0x32b7070: 0x00000000 [ It appears that you can skate past this issue if you don't receive any interrupts while the bogus GDT pointer is loaded, or if you avoid reloading the segment registers in general. Andy Lutomirski provides some additional insight: "AFAICT it's entirely permissible for the GDTR and/or LDT descriptor to point to unmapped memory. Any attempt to use them (segment loads, interrupts, IRET, etc) will try to access that memory as if the access came from CPL 0 and, if the access fails, will generate a valid page fault with CR2 pointing into the GDT or LDT." Up until commit 23a0d4e8fa6d ("efi: Disable interrupts around EFI calls, not in the epilog/prolog calls") interrupts were disabled around the prolog and epilog calls, and the functional GDT was re-installed before interrupts were re-enabled. Which explains why no one has hit this issue until now. ] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reported-by: Laszlo Ersek <lersek@redhat.com> Cc: <stable@vger.kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Matt Fleming <matt.fleming@intel.com> [ Updated changelog. ] Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-09ARM: orion: Fix DSA platform device after mvmdio conversionFlorian Fainelli
commit d836ace65ee98d7079bc3c5afdbcc0e27dca20a3 upstream. DSA expects the host_dev pointer to be the device structure associated with the MDIO bus controller driver. First commit breaking that was c3a07134e6aa ("mv643xx_eth: convert to use the Marvell Orion MDIO driver"), and then, it got completely under the radar for a while. Reported-by: Frans van de Wiel <fvdw@fvdw.eu> Fixes: c3a07134e6aa ("mv643xx_eth: convert to use the Marvell Orion MDIO driver") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-09ARM: 8427/1: dma-mapping: add support for offset parameter in dma_mmap()Marek Szyprowski
commit 7e31210349e9e03a9a4dff31ab5f2bc83e8e84f5 upstream. IOMMU-based dma_mmap() implementation lacked proper support for offset parameter used in mmap call (it always assumed that mapping starts from offset zero). This patch adds support for offset parameter to IOMMU-based implementation. Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-09ARM: 8426/1: dma-mapping: add missing range check in dma_mmap()Marek Szyprowski
commit 371f0f085f629fc0f66695f572373ca4445a67ad upstream. dma_mmap() function in IOMMU-based dma-mapping implementation lacked a check for valid range of mmap parameters (offset and buffer size), what might have caused access beyond the allocated buffer. This patch fixes this issue. Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-11-09xen: fix backport of previous kexec patchGreg Kroah-Hartman
Fixes the backport of 0b34a166f291d255755be46e43ed5497cdd194f2 upstream Commit 0b34a166f291d255755be46e43ed5497cdd194f2 "x86/xen: Support kexec/kdump in HVM guests by doing a soft reset" has been added to the 4.2-stable tree" needed to correct the CONFIG variable, as CONFIG_KEXEC_CORE only showed up in 4.3. Reported-by: David Vrabel <david.vrabel@citrix.com> Reported-by: Luis Henriques <luis.henriques@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-11-09Revert "ARM64: unwind: Fix PC calculation"Will Deacon
commit 9702970c7bd3e2d6fecb642a190269131d4ac16c upstream. This reverts commit e306dfd06fcb44d21c80acb8e5a88d55f3d1cf63. With this patch applied, we were the only architecture making this sort of adjustment to the PC calculation in the unwinder. This causes problems for ftrace, where the PC values are matched against the contents of the stack frames in the callchain and fail to match any records after the address adjustment. Whilst there has been some effort to change ftrace to workaround this, those patches are not yet ready for mainline and, since we're the odd architecture in this regard, let's just step in line with other architectures (like arch/arm/) for now. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-11-09powerpc/rtas: Validate rtas.entry before calling enter_rtas()Vasant Hegde
commit 8832317f662c06f5c06e638f57bfe89a71c9b266 upstream. Currently we do not validate rtas.entry before calling enter_rtas(). This leads to a kernel oops when user space calls rtas system call on a powernv platform (see below). This patch adds code to validate rtas.entry before making enter_rtas() call. Oops: Exception in kernel mode, sig: 4 [#1] SMP NR_CPUS=1024 NUMA PowerNV task: c000000004294b80 ti: c0000007e1a78000 task.ti: c0000007e1a78000 NIP: 0000000000000000 LR: 0000000000009c14 CTR: c000000000423140 REGS: c0000007e1a7b920 TRAP: 0e40 Not tainted (3.18.17-340.el7_1.pkvm3_1_0.2400.1.ppc64le) MSR: 1000000000081000 <HV,ME> CR: 00000000 XER: 00000000 CFAR: c000000000009c0c SOFTE: 0 NIP [0000000000000000] (null) LR [0000000000009c14] 0x9c14 Call Trace: [c0000007e1a7bba0] [c00000000041a7f4] avc_has_perm_noaudit+0x54/0x110 (unreliable) [c0000007e1a7bd80] [c00000000002ddc0] ppc_rtas+0x150/0x2d0 [c0000007e1a7be30] [c000000000009358] syscall_exit+0x0/0x98 Fixes: 55190f88789a ("powerpc: Add skeleton PowerNV platform") Reported-by: NAGESWARA R. SASTRY <nasastry@in.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> [mpe: Reword change log, trim oops, and add stable + fixes] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-27crypto: sparc - initialize blkcipher.ivsizeDave Kleikamp
commit a66d7f724a96d6fd279bfbd2ee488def6b081bea upstream. Some of the crypto algorithms write to the initialization vector, but no space has been allocated for it. This clobbers adjacent memory. Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-27m68k/uaccess: Fix asm constraints for userspace accessGeert Uytterhoeven
commit 631d8b674f5f8235e9cb7e628b0fe9e5200e3158 upstream. When compiling a MMU kernel with CPU_HAS_ADDRESS_SPACES=n (e.g. "MMU=y allnoconfig": "echo CONFIG_MMU=y > allno.config && make KCONFIG_ALLCONFIG=1 allnoconfig"), we use plain "move" instead of "moves", and I got: CC arch/m68k/lib/uaccess.o {standard input}: Assembler messages: {standard input}:47: Error: operands mismatch -- statement `move.b %a0,(%a1)' ignored This happens because plain "move" doesn't support byte transfers between memory and address registers, while "moves" does. Fix the asm constraints for __generic_copy_from_user(), __generic_copy_to_user(), and __clear_user() to only use data registers when accessing userspace. Also, relax the asm constraints for 16-bit userspace accesses in __put_user() and __get_user(), as both "move" and "moves" do support such transfers between memory and address registers. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22x86: Add 1/2/4/8 byte optimization to 64bit __copy_{from,to}_user_inatomicAndi Kleen
commit ff47ab4ff3cddfa7bc1b25b990e24abe2ae474ff upstream. The 64bit __copy_{from,to}_user_inatomic always called copy_from_user_generic, but skipped the special optimizations for 1/2/4/8 byte accesses. This especially hurts the futex call, which accesses the 4 byte futex user value with a complicated fast string operation in a function call, instead of a single movl. Use __copy_{from,to}_user for _inatomic instead to get the same optimizations. The only problem was the might_fault() in those functions. So move that to new wrapper and call __copy_{f,t}_user_nocheck() from *_inatomic directly. 32bit already did this correctly by duplicating the code. Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1376687844-19857-2-git-send-email-andi@firstfloor.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22m68k: Define asmlinkage_protectAndreas Schwab
commit 8474ba74193d302e8340dddd1e16c85cc4b98caf upstream. Make sure the compiler does not modify arguments of syscall functions. This can happen if the compiler generates a tailcall to another function. For example, without asmlinkage_protect sys_openat is compiled into this function: sys_openat: clr.l %d0 move.w 18(%sp),%d0 move.l %d0,16(%sp) jbra do_sys_open Note how the fourth argument is modified in place, modifying the register %d4 that gets restored from this stack slot when the function returns to user-space. The caller may expect the register to be unmodified across system calls. Signed-off-by: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22arm64: readahead: fault retry breaks mmap file read random detectionMark Salyzyn
commit 569ba74a7ba69f46ce2950bf085b37fea2408385 upstream. This is the arm64 portion of commit 45cac65b0fcd ("readahead: fault retry breaks mmap file read random detection"), which was absent from the initial port and has since gone unnoticed. The original commit says: > .fault now can retry. The retry can break state machine of .fault. In > filemap_fault, if page is miss, ra->mmap_miss is increased. In the second > try, since the page is in page cache now, ra->mmap_miss is decreased. And > these are done in one fault, so we can't detect random mmap file access. > > Add a new flag to indicate .fault is tried once. In the second try, skip > ra->mmap_miss decreasing. The filemap_fault state machine is ok with it. With this change, Mark reports that: > Random read improves by 250%, sequential read improves by 40%, and > random write by 400% to an eMMC device with dm crypto wrapped around it. Cc: Shaohua Li <shli@kernel.org> Cc: Rik van Riel <riel@redhat.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Mark Salyzyn <salyzyn@android.com> Signed-off-by: Riley Andrews <riandrews@android.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22powerpc/MSI: Fix race condition in tearing down MSI interruptsPaul Mackerras
commit e297c939b745e420ef0b9dc989cb87bda617b399 upstream. This fixes a race which can result in the same virtual IRQ number being assigned to two different MSI interrupts. The most visible consequence of that is usually a warning and stack trace from the sysfs code about an attempt to create a duplicate entry in sysfs. The race happens when one CPU (say CPU 0) is disposing of an MSI while another CPU (say CPU 1) is setting up an MSI. CPU 0 calls (for example) pnv_teardown_msi_irqs(), which calls msi_bitmap_free_hwirqs() to indicate that the MSI (i.e. its hardware IRQ number) is no longer in use. Then, before CPU 0 gets to calling irq_dispose_mapping() to free up the virtal IRQ number, CPU 1 comes in and calls msi_bitmap_alloc_hwirqs() to allocate an MSI, and gets the same hardware IRQ number that CPU 0 just freed. CPU 1 then calls irq_create_mapping() to get a virtual IRQ number, which sees that there is currently a mapping for that hardware IRQ number and returns the corresponding virtual IRQ number (which is the same virtual IRQ number that CPU 0 was using). CPU 0 then calls irq_dispose_mapping() and frees that virtual IRQ number. Now, if another CPU comes along and calls irq_create_mapping(), it is likely to get the virtual IRQ number that was just freed, resulting in the same virtual IRQ number apparently being used for two different hardware interrupts. To fix this race, we just move the call to msi_bitmap_free_hwirqs() to after the call to irq_dispose_mapping(). Since virq_to_hw() doesn't work for the virtual IRQ number after irq_dispose_mapping() has been called, we need to call it before irq_dispose_mapping() and remember the result for the msi_bitmap_free_hwirqs() call. The pattern of calling msi_bitmap_free_hwirqs() before irq_dispose_mapping() appears in 5 places under arch/powerpc, and appears to have originated in commit 05af7bd2d75e ("[POWERPC] MPIC U3/U4 MSI backend") from 2007. Fixes: 05af7bd2d75e ("[POWERPC] MPIC U3/U4 MSI backend") Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22MIPS: dma-default: Fix 32-bit fall back to GFP_DMAJames Hogan
commit 53960059d56ecef67d4ddd546731623641a3d2d1 upstream. If there is a DMA zone (usually 24bit = 16MB I believe), but no DMA32 zone, as is the case for some 32-bit kernels, then massage_gfp_flags() will cause DMA memory allocated for devices with a 32..63-bit coherent_dma_mask to fall back to using __GFP_DMA, even though there may only be 32-bits of physical address available anyway. Correct that case to compare against a mask the size of phys_addr_t instead of always using a 64-bit mask. Signed-off-by: James Hogan <james.hogan@imgtec.com> Fixes: a2e715a86c6d ("MIPS: DMA: Fix computation of DMA flags from device's coherent_dma_mask.") Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/9610/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22x86/xen: Support kexec/kdump in HVM guests by doing a soft resetVitaly Kuznetsov
commit 0b34a166f291d255755be46e43ed5497cdd194f2 upstream. Currently there is a number of issues preventing PVHVM Xen guests from doing successful kexec/kdump: - Bound event channels. - Registered vcpu_info. - PIRQ/emuirq mappings. - shared_info frame after XENMAPSPACE_shared_info operation. - Active grant mappings. Basically, newly booted kernel stumbles upon already set up Xen interfaces and there is no way to reestablish them. In Xen-4.7 a new feature called 'soft reset' is coming. A guest performing kexec/kdump operation is supposed to call SCHEDOP_shutdown hypercall with SHUTDOWN_soft_reset reason before jumping to new kernel. Hypervisor (with some help from toolstack) will do full domain cleanup (but keeping its memory and vCPU contexts intact) returning the guest to the state it had when it was first booted and thus allowing it to start over. Doing SHUTDOWN_soft_reset on Xen hypervisors which don't support it is probably OK as by default all unknown shutdown reasons cause domain destroy with a message in toolstack log: 'Unknown shutdown reason code 5. Destroying domain.' which gives a clue to what the problem is and eliminates false expectations. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22x86/mm: Set NX on gap between __ex_table and rodataStephen Smalley
commit ab76f7b4ab2397ffdd2f1eb07c55697d19991d10 upstream. Unused space between the end of __ex_table and the start of rodata can be left W+x in the kernel page tables. Extend the setting of the NX bit to cover this gap by starting from text_end rather than rodata_start. Before: ---[ High Kernel Mapping ]--- 0xffffffff80000000-0xffffffff81000000 16M pmd 0xffffffff81000000-0xffffffff81600000 6M ro PSE GLB x pmd 0xffffffff81600000-0xffffffff81754000 1360K ro GLB x pte 0xffffffff81754000-0xffffffff81800000 688K RW GLB x pte 0xffffffff81800000-0xffffffff81a00000 2M ro PSE GLB NX pmd 0xffffffff81a00000-0xffffffff81b3b000 1260K ro GLB NX pte 0xffffffff81b3b000-0xffffffff82000000 4884K RW GLB NX pte 0xffffffff82000000-0xffffffff82200000 2M RW PSE GLB NX pmd 0xffffffff82200000-0xffffffffa0000000 478M pmd After: ---[ High Kernel Mapping ]--- 0xffffffff80000000-0xffffffff81000000 16M pmd 0xffffffff81000000-0xffffffff81600000 6M ro PSE GLB x pmd 0xffffffff81600000-0xffffffff81754000 1360K ro GLB x pte 0xffffffff81754000-0xffffffff81800000 688K RW GLB NX pte 0xffffffff81800000-0xffffffff81a00000 2M ro PSE GLB NX pmd 0xffffffff81a00000-0xffffffff81b3b000 1260K ro GLB NX pte 0xffffffff81b3b000-0xffffffff82000000 4884K RW GLB NX pte 0xffffffff82000000-0xffffffff82200000 2M RW PSE GLB NX pmd 0xffffffff82200000-0xffffffffa0000000 478M pmd Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1443704662-3138-1-git-send-email-sds@tycho.nsa.gov Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22Use WARN_ON_ONCE for missing X86_FEATURE_NRIPSDirk Müller
commit d2922422c48df93f3edff7d872ee4f3191fefb08 upstream. The cpu feature flags are not ever going to change, so warning everytime can cause a lot of kernel log spam (in our case more than 10GB/hour). The warning seems to only occur when nested virtualization is enabled, so it's probably triggered by a KVM bug. This is a sensible and safe change anyway, and the KVM bug fix might not be suitable for stable releases anyway. Signed-off-by: Dirk Mueller <dmueller@suse.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22x86/platform: Fix Geode LX timekeeping in the generic x86 buildDavid Woodhouse
commit 03da3ff1cfcd7774c8780d2547ba0d995f7dc03d upstream. In 2007, commit 07190a08eef36 ("Mark TSC on GeodeLX reliable") bypassed verification of the TSC on Geode LX. However, this code (now in the check_system_tsc_reliable() function in arch/x86/kernel/tsc.c) was only present if CONFIG_MGEODE_LX was set. OpenWRT has recently started building its generic Geode target for Geode GX, not LX, to include support for additional platforms. This broke the timekeeping on LX-based devices, because the TSC wasn't marked as reliable: https://dev.openwrt.org/ticket/20531 By adding a runtime check on is_geode_lx(), we can also include the fix if CONFIG_MGEODEGX1 or CONFIG_X86_GENERIC are set, thus fixing the problem. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Cc: Andres Salomon <dilinger@queued.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Marcelo Tosatti <marcelo@kvack.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1442409003.131189.87.camel@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22x86/apic: Serialize LVTT and TSC_DEADLINE writesShaohua Li
commit 5d7c631d926b59aa16f3c56eaeb83f1036c81dc7 upstream. The APIC LVTT register is MMIO mapped but the TSC_DEADLINE register is an MSR. The write to the TSC_DEADLINE MSR is not serializing, so it's not guaranteed that the write to LVTT has reached the APIC before the TSC_DEADLINE MSR is written. In such a case the write to the MSR is ignored and as a consequence the local timer interrupt never fires. The SDM decribes this issue for xAPIC and x2APIC modes. The serialization methods recommended by the SDM differ. xAPIC: "1. Memory-mapped write to LVT Timer Register, setting bits 18:17 to 10b. 2. WRMSR to the IA32_TSC_DEADLINE MSR a value much larger than current time-stamp counter. 3. If RDMSR of the IA32_TSC_DEADLINE MSR returns zero, go to step 2. 4. WRMSR to the IA32_TSC_DEADLINE MSR the desired deadline." x2APIC: "To allow for efficient access to the APIC registers in x2APIC mode, the serializing semantics of WRMSR are relaxed when writing to the APIC registers. Thus, system software should not use 'WRMSR to APIC registers in x2APIC mode' as a serializing instruction. Read and write accesses to the APIC registers will occur in program order. A WRMSR to an APIC register may complete before all preceding stores are globally visible; software can prevent this by inserting a serializing instruction, an SFENCE, or an MFENCE before the WRMSR." The xAPIC method is to just wait for the memory mapped write to hit the LVTT by checking whether the MSR write has reached the hardware. There is no reason why a proper MFENCE after the memory mapped write would not do the same. Andi Kleen confirmed that MFENCE is sufficient for the xAPIC case as well. Issue MFENCE before writing to the TSC_DEADLINE MSR. This can be done unconditionally as all CPUs which have TSC_DEADLINE also have MFENCE support. [ tglx: Massaged the changelog ] Signed-off-by: Shaohua Li <shli@fb.com> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: <Kernel-team@fb.com> Cc: <lenb@kernel.org> Cc: <fenghua.yu@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Link: http://lkml.kernel.org/r/20150909041352.GA2059853@devbig257.prn2.facebook.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22ARM: 8429/1: disable GCC SRA optimizationArd Biesheuvel
commit a077224fd35b2f7fbc93f14cf67074fc792fbac2 upstream. While working on the 32-bit ARM port of UEFI, I noticed a strange corruption in the kernel log. The following snprintf() statement (in drivers/firmware/efi/efi.c:efi_md_typeattr_format()) snprintf(pos, size, "|%3s|%2s|%2s|%2s|%3s|%2s|%2s|%2s|%2s]", was producing the following output in the log: | | | | | |WB|WT|WC|UC] | | | | | |WB|WT|WC|UC] | | | | | |WB|WT|WC|UC] |RUN| | | | |WB|WT|WC|UC]* |RUN| | | | |WB|WT|WC|UC]* | | | | | |WB|WT|WC|UC] |RUN| | | | |WB|WT|WC|UC]* | | | | | |WB|WT|WC|UC] |RUN| | | | | | | |UC] |RUN| | | | | | | |UC] As it turns out, this is caused by incorrect code being emitted for the string() function in lib/vsprintf.c. The following code if (!(spec.flags & LEFT)) { while (len < spec.field_width--) { if (buf < end) *buf = ' '; ++buf; } } for (i = 0; i < len; ++i) { if (buf < end) *buf = *s; ++buf; ++s; } while (len < spec.field_width--) { if (buf < end) *buf = ' '; ++buf; } when called with len == 0, triggers an issue in the GCC SRA optimization pass (Scalar Replacement of Aggregates), which handles promotion of signed struct members incorrectly. This is a known but as yet unresolved issue. (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65932). In this particular case, it is causing the second while loop to be executed erroneously a single time, causing the additional space characters to be printed. So disable the optimization by passing -fno-ipa-sra. Acked-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-01x86: bpf_jit: fix compilation of large bpf programsAlexei Starovoitov
commit 3f7352bf21f8fd7ba3e2fcef9488756f188e12be upstream. x86 has variable length encoding. x86 JIT compiler is trying to pick the shortest encoding for given bpf instruction. While doing so the jump targets are changing, so JIT is doing multiple passes over the program. Typical program needs 3 passes. Some very short programs converge with 2 passes. Large programs may need 4 or 5. But specially crafted bpf programs may hit the pass limit and if the program converges on the last iteration the JIT compiler will be producing an image full of 'int 3' insns. Fix this corner case by doing final iteration over bpf program. Fixes: 0a14842f5a3c ("net: filter: Just In Time compiler for x86-64") Reported-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Tested-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-01parisc: Filter out spurious interrupts in PA-RISC irq handlerHelge Deller
commit b1b4e435e4ef7de77f07bf2a42c8380b960c2d44 upstream. When detecting a serial port on newer PA-RISC machines (with iosapic) we have a long way to go to find the right IRQ line, registering it, then registering the serial port and the irq handler for the serial port. During this phase spurious interrupts for the serial port may happen which then crashes the kernel because the action handler might not have been set up yet. So, basically it's a race condition between the serial port hardware and the CPU which sets up the necessary fields in the irq sructs. The main reason for this race is, that we unmask the serial port irqs too early without having set up everything properly before (which isn't easily possible because we need the IRQ number to register the serial ports). This patch is a work-around for this problem. It adds checks to the CPU irq handler to verify if the IRQ action field has been initialized already. If not, we just skip this interrupt (which isn't critical for a serial port at bootup). The real fix would probably involve rewriting all PA-RISC specific IRQ code (for CPU, IOSAPIC, GSC and EISA) to use IRQ domains with proper parenting of the irq chips and proper irq enabling along this line. This bug has been in the PA-RISC port since the beginning, but the crashes happened very rarely with currently used hardware. But on the latest machine which I bought (a C8000 workstation), which uses the fastest CPUs (4 x PA8900, 1GHz) and which has the largest possible L1 cache size (64MB each), the kernel crashed at every boot because of this race. So, without this patch the machine would currently be unuseable. For the record, here is the flow logic: 1. serial_init_chip() in 8250_gsc.c calls iosapic_serial_irq(). 2. iosapic_serial_irq() calls txn_alloc_irq() to find the irq. 3. iosapic_serial_irq() calls cpu_claim_irq() to register the CPU irq 4. cpu_claim_irq() unmasks the CPU irq (which it shouldn't!) 5. serial_init_chip() then registers the 8250 port. Problems: - In step 4 the CPU irq shouldn't have been registered yet, but after step 5 - If serial irq happens between 4 and 5 have finished, the kernel will crash Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-01x86/mm: Initialize pmd_idx in page_table_range_init_count()Minfei Huang
commit 9962eea9e55f797f05f20ba6448929cab2a9f018 upstream. The variable pmd_idx is not initialized for the first iteration of the for loop. Assign the proper value which indexes the start address. Fixes: 719272c45b82 'x86, mm: only call early_ioremap_page_table_range_init() once' Signed-off-by: Minfei Huang <mnfhuang@gmail.com> Cc: tony.luck@intel.com Cc: wangnan0@huawei.com Cc: david.vrabel@citrix.com Reviewed-by: yinghai@kernel.org Link: http://lkml.kernel.org/r/1436703522-29552-1-git-send-email-mhuang@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-01powerpc/rtas: Introduce rtas_get_sensor_fast() for IRQ handlersThomas Huth
commit 1c2cb594441d02815d304cccec9742ff5c707495 upstream. The EPOW interrupt handler uses rtas_get_sensor(), which in turn uses rtas_busy_delay() to wait for RTAS becoming ready in case it is necessary. But rtas_busy_delay() is annotated with might_sleep() and thus may not be used by interrupts handlers like the EPOW handler! This leads to the following BUG when CONFIG_DEBUG_ATOMIC_SLEEP is enabled: BUG: sleeping function called from invalid context at arch/powerpc/kernel/rtas.c:496 in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.2.0-rc2-thuth #6 Call Trace: [c00000007ffe7b90] [c000000000807670] dump_stack+0xa0/0xdc (unreliable) [c00000007ffe7bc0] [c0000000000e1f14] ___might_sleep+0x134/0x180 [c00000007ffe7c20] [c00000000002aec0] rtas_busy_delay+0x30/0xd0 [c00000007ffe7c50] [c00000000002bde4] rtas_get_sensor+0x74/0xe0 [c00000007ffe7ce0] [c000000000083264] ras_epow_interrupt+0x44/0x450 [c00000007ffe7d90] [c000000000120260] handle_irq_event_percpu+0xa0/0x300 [c00000007ffe7e70] [c000000000120524] handle_irq_event+0x64/0xc0 [c00000007ffe7eb0] [c000000000124dbc] handle_fasteoi_irq+0xec/0x260 [c00000007ffe7ef0] [c00000000011f4f0] generic_handle_irq+0x50/0x80 [c00000007ffe7f20] [c000000000010f3c] __do_irq+0x8c/0x200 [c00000007ffe7f90] [c0000000000236cc] call_do_irq+0x14/0x24 [c00000007e6f39e0] [c000000000011144] do_IRQ+0x94/0x110 [c00000007e6f3a30] [c000000000002594] hardware_interrupt_common+0x114/0x180 Fix this issue by introducing a new rtas_get_sensor_fast() function that does not use rtas_busy_delay() - and thus can only be used for sensors that do not cause a BUSY condition - known as "fast" sensors. The EPOW sensor is defined to be "fast" in sPAPR - mpe. Fixes: 587f83e8dd50 ("powerpc/pseries: Use rtas_get_sensor in RAS code") Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-01powerpc/mm: Fix pte_pagesize_index() crash on 4K w/64K hashMichael Ellerman
commit 74b5037baa2011a2799e2c43adde7d171b072f9e upstream. The powerpc kernel can be built to have either a 4K PAGE_SIZE or a 64K PAGE_SIZE. However when built with a 4K PAGE_SIZE there is an additional config option which can be enabled, PPC_HAS_HASH_64K, which means the kernel also knows how to hash a 64K page even though the base PAGE_SIZE is 4K. This is used in one obscure configuration, to support 64K pages for SPU local store on the Cell processor when the rest of the kernel is using 4K pages. In this configuration, pte_pagesize_index() is defined to just pass through its arguments to get_slice_psize(). However pte_pagesize_index() is called for both user and kernel addresses, whereas get_slice_psize() only knows how to handle user addresses. This has been broken forever, however until recently it happened to work. That was because in get_slice_psize() the large kernel address would cause the right shift of the slice mask to return zero. However in commit 7aa0727f3302 ("powerpc/mm: Increase the slice range to 64TB"), the get_slice_psize() code was changed so that instead of a right shift we do an array lookup based on the address. When passed a kernel address this means we index way off the end of the slice array and return random junk. That is only fatal if we happen to hit something non-zero, but when we do return a non-zero value we confuse the MMU code and eventually cause a check stop. This fix is ugly, but simple. When we're called for a kernel address we return 4K, which is always correct in this configuration, otherwise we use the slice mask. Fixes: 7aa0727f3302 ("powerpc/mm: Increase the slice range to 64TB") Reported-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-01arm64: head.S: initialise mdcr_el2 in el2_setupWill Deacon
commit d10bcd473301888f957ec4b6b12aa3621be78d59 upstream. When entering the kernel at EL2, we fail to initialise the MDCR_EL2 register which controls debug access and PMU capabilities at EL1. This patch ensures that the register is initialised so that all traps are disabled and all the PMU counters are available to the host. When a guest is scheduled, KVM takes care to configure trapping appropriately. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-01arm64: compat: fix vfp save/restore across signal handlers in big-endianWill Deacon
commit bdec97a855ef1e239f130f7a11584721c9a1bf04 upstream. When saving/restoring the VFP registers from a compat (AArch32) signal frame, we rely on the compat registers forming a prefix of the native register file and therefore make use of copy_{to,from}_user to transfer between the native fpsimd_state and the compat_vfp_sigframe. Unfortunately, this doesn't work so well in a big-endian environment. Our fpsimd save/restore code operates directly on 128-bit quantities (Q registers) whereas the compat_vfp_sigframe represents the registers as an array of 64-bit (D) registers. The architecture packs the compat D registers into the Q registers, with the least significant bytes holding the lower register. Consequently, we need to swap the 64-bit halves when converting between these two representations on a big-endian machine. This patch replaces the __copy_{to,from}_user invocations in our compat VFP signal handling code with explicit __put_user loops that operate on 64-bit values and swap them accordingly. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-01arm64: kconfig: Move LIST_POISON to a safe valueJeff Vander Stoep
commit bf0c4e04732479f650ff59d1ee82de761c0071f0 upstream. Move the poison pointer offset to 0xdead000000000000, a recognized value that is not mappable by user-space exploits. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Thierry Strudel <tstrudel@google.com> Signed-off-by: Jeff Vander Stoep <jeffv@google.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-09-21xtensa: don't use echo -e needlesslyMax Filippov
commit 123f15e669d5a5a2e2f260ba4a5fc2efd93df20e upstream. -e is not needed to output strings without escape sequences. This breaks big endian FSF build when the shell is dash, because its builtin echo doesn't understand '-e' switch and outputs it in the echoed string. Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Chris Zankel <chris@zankel.net> Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-09-21xtensa: fix kernel register spillingMax Filippov
commit 77d6273e79e3a86552fcf10cdd31a69b46ed2ce6 upstream. call12 can't be safely used as the first call in the inline function, because the compiler does not extend the stack frame of the bounding function accordingly, which may result in corruption of local variables. If a call needs to be done, do call8 first followed by call12. For pure assembly code in _switch_to increase stack frame size of the bounding function. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>