aboutsummaryrefslogtreecommitdiffstats
path: root/include/linux
AgeCommit message (Collapse)Author
2018-02-16NFS: Fix nfsstat breakage due to LOOKUPPTrond Myklebust
commit 8634ef5e05311f32d7f2aee06f6b27a8834a3bd6 upstream. The LOOKUPP operation was inserted into the nfs4_procedures array rather than being appended, which put /proc/net/rpc/nfs out of whack, and broke the nfsstat utility. Fix by moving the LOOKUPP operation to the end of the array, and by ensuring that it keeps the same length whether or not NFSV4.1 and NFSv4.2 are compiled in. Fixes: 5b5faaf6df734 ("nfs4: add NFSv4 LOOKUPP handlers") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-16mtd: cfi: convert inline functions to macrosArnd Bergmann
commit 9e343e87d2c4c707ef8fae2844864d4dde3a2d13 upstream. The map_word_() functions, dating back to linux-2.6.8, try to perform bitwise operations on a 'map_word' structure. This may have worked with compilers that were current then (gcc-3.4 or earlier), but end up being rather inefficient on any version I could try now (gcc-4.4 or higher). Specifically we hit a problem analyzed in gcc PR81715 where we fail to reuse the stack space for local variables. This can be seen immediately in the stack consumption for cfi_staa_erase_varsize() and other functions that (with CONFIG_KASAN) can be up to 2200 bytes. Changing the inline functions into macros brings this down to 1280 bytes. Without KASAN, the same problem exists, but the stack consumption is lower to start with, my patch shrinks it from 920 to 496 bytes on with arm-linux-gnueabi-gcc-5.4, and saves around 1KB in .text size for cfi_cmdset_0020.c, as it avoids copying map_word structures for each call to one of these helpers. With the latest gcc-8 snapshot, the problem is fixed in upstream gcc, but nobody uses that yet, so we should still work around it in mainline kernels and probably backport the workaround to stable kernels as well. We had a couple of other functions that suffered from the same gcc bug, and all of those had a simpler workaround involving dummy variables in the inline function. Unfortunately that did not work here, the macro hack was the best I could come up with. It would also be helpful to have someone to a little performance testing on the patch, to see how much it helps in terms of CPU utilitzation. Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81715 Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Richard Weinberger <richard@nod.at> Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-16arm/arm64: smccc: Implement SMCCC v1.1 inline primitiveMarc Zyngier
Commit f2d3b2e8759a upstream. One of the major improvement of SMCCC v1.1 is that it only clobbers the first 4 registers, both on 32 and 64bit. This means that it becomes very easy to provide an inline version of the SMC call primitive, and avoid performing a function call to stash the registers that would otherwise be clobbered by SMCCC v1.0. Reviewed-by: Robin Murphy <robin.murphy@arm.com> Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-16arm/arm64: smccc: Make function identifiers an unsigned quantityMarc Zyngier
Commit ded4c39e93f3 upstream. Function identifiers are a 32bit, unsigned quantity. But we never tell so to the compiler, resulting in the following: 4ac: b26187e0 mov x0, #0xffffffff80000001 We thus rely on the firmware narrowing it for us, which is not always a reasonable expectation. Cc: stable@vger.kernel.org Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-16firmware/psci: Expose SMCCC version through psci_opsMarc Zyngier
Commit e78eef554a91 upstream. Since PSCI 1.0 allows the SMCCC version to be (indirectly) probed, let's do that at boot time, and expose the version of the calling convention as part of the psci_ops structure. Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-16firmware/psci: Expose PSCI conduitMarc Zyngier
Commit 09a8d6d48499 upstream. In order to call into the firmware to apply workarounds, it is useful to find out whether we're using HVC or SMC. Let's expose this through the psci_ops. Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-16arm64: KVM: Report SMCCC_ARCH_WORKAROUND_1 BP hardening supportMarc Zyngier
Commit 6167ec5c9145 upstream. A new feature of SMCCC 1.1 is that it offers firmware-based CPU workarounds. In particular, SMCCC_ARCH_WORKAROUND_1 provides BP hardening for CVE-2017-5715. If the host has some mitigation for this issue, report that we deal with it using SMCCC_ARCH_WORKAROUND_1, as we apply the host workaround on every guest exit. Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Conflicts: arch/arm/include/asm/kvm_host.h arch/arm64/include/asm/kvm_host.h Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-16arm/arm64: KVM: Advertise SMCCC v1.1Marc Zyngier
Commit 09e6be12effd upstream. The new SMC Calling Convention (v1.1) allows for a reduced overhead when calling into the firmware, and provides a new feature discovery mechanism. Make it visible to KVM guests. Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-16drivers/firmware: Expose psci_get_version through psci_ops structureWill Deacon
Commit d68e3ba5303f upstream. Entry into recent versions of ARM Trusted Firmware will invalidate the CPU branch predictor state in order to protect against aliasing attacks. This patch exposes the PSCI "VERSION" function via psci_ops, so that it can be invoked outside of the PSCI driver where necessary. Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-13Merge tag 'v4.15.2' into v4.15/standard/baseBruce Ashfield
This is the 4.15.2 stable release
2018-02-13Merge tag 'v4.15.1' into v4.15/standard/baseBruce Ashfield
This is the 4.15.1 stable release
2018-02-07x86/retpoline: Avoid retpolines for built-in __init functionsDavid Woodhouse
commit 66f793099a636862a71c59d4a6ba91387b155e0c There's no point in building init code with retpolines, since it runs before any potentially hostile userspace does. And before the retpoline is actually ALTERNATIVEd into place, for much of it. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: karahmed@amazon.de Cc: peterz@infradead.org Cc: bp@alien8.de Link: https://lkml.kernel.org/r/1517484441-1420-2-git-send-email-dwmw@amazon.co.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-07vfs, fdtable: Prevent bounds-check bypass via speculative executionDan Williams
commit 56c30ba7b348b90484969054d561f711ba196507 'fd' is a user controlled value that is used as a data dependency to read from the 'fdt->fd' array. In order to avoid potential leaks of kernel memory values, block speculative execution of the instruction stream that could issue reads based on an invalid 'file *' returned from __fcheck_files. Co-developed-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: kernel-hardening@lists.openwall.com Cc: gregkh@linuxfoundation.org Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: torvalds@linux-foundation.org Cc: alan@linux.intel.com Link: https://lkml.kernel.org/r/151727418500.33451.17392199002892248656.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-07array_index_nospec: Sanitize speculative array de-referencesDan Williams
commit f3804203306e098dae9ca51540fcd5eb700d7f40 array_index_nospec() is proposed as a generic mechanism to mitigate against Spectre-variant-1 attacks, i.e. an attack that bypasses boundary checks via speculative execution. The array_index_nospec() implementation is expected to be safe for current generation CPUs across multiple architectures (ARM, x86). Based on an original implementation by Linus Torvalds, tweaked to remove speculative flows by Alexei Starovoitov, and tweaked again by Linus to introduce an x86 assembly implementation for the mask generation. Co-developed-by: Linus Torvalds <torvalds@linux-foundation.org> Co-developed-by: Alexei Starovoitov <ast@kernel.org> Suggested-by: Cyril Novikov <cnovikov@lynx.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: kernel-hardening@lists.openwall.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: gregkh@linuxfoundation.org Cc: torvalds@linux-foundation.org Cc: alan@linux.intel.com Link: https://lkml.kernel.org/r/151727414229.33451.18411580953862676575.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-07module/retpoline: Warn about missing retpoline in moduleAndi Kleen
commit caf7501a1b4ec964190f31f9c3f163de252273b8 There's a risk that a kernel which has full retpoline mitigations becomes vulnerable when a module gets loaded that hasn't been compiled with the right compiler or the right option. To enable detection of that mismatch at module load time, add a module info string "retpoline" at build time when the module was compiled with retpoline support. This only covers compiled C source, but assembler source or prebuilt object files are not checked. If a retpoline enabled kernel detects a non retpoline protected module at load time, print a warning and report it in the sysfs vulnerability file. [ tglx: Massaged changelog ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: David Woodhouse <dwmw2@infradead.org> Cc: gregkh@linuxfoundation.org Cc: torvalds@linux-foundation.org Cc: jeyu@kernel.org Cc: arjan@linux.intel.com Link: https://lkml.kernel.org/r/20180125235028.31211-1-andi@firstfloor.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-03tty: fix data race between tty_init_dev and flush of bufGaurav Kohli
commit b027e2298bd588d6fa36ed2eda97447fb3eac078 upstream. There can be a race, if receive_buf call comes before tty initialization completes in n_tty_open and tty->disc_data may be NULL. CPU0 CPU1 ---- ---- 000|n_tty_receive_buf_common() n_tty_open() -001|n_tty_receive_buf2() tty_ldisc_open.isra.3() -002|tty_ldisc_receive_buf(inline) tty_ldisc_setup() Using ldisc semaphore lock in tty_init_dev till disc_data initializes completely. Signed-off-by: Gaurav Kohli <gkohli@codeaurora.org> Reviewed-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-02aufs4: mmap supportBruce Ashfield
aufs4-mmap.patch from git://github.com/sfjro/aufs4-standalone.git Signed-off-by: Bruce Ashfield <bruce.ashfield@windriver.com>
2018-02-02aufs4: base supportBruce Ashfield
aufs4-base.patch from git://github.com/sfjro/aufs4-standalone.git Signed-off-by: Bruce Ashfield <bruce.ashfield@windriver.com>
2018-02-02nfs: Allow default io size to be configured.Jim Somerville
Some boards need a smaller size here for performance reasons, so let's make the value configurable for those boards. Signed-off-by: Jim Somerville <Jim.Somerville@windriver.com>
2018-02-02NFS: allow nfs root mount to use alternate rpc portsJason Wessel
Allow an nfs root mount to use alternate RPC ports for mountd and nfsd. Signed-off-by: Jason Wessel <jason.wessel@windriver.com> [forward port to 2.6.33+] Signed-off-by: Bruce Ashfield <bruce.ashfield@windriver.com>
2018-01-28Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Thomas Gleixner: "A single bug fix to prevent a subtle deadlock in the scheduler core code vs cpu hotplug" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/core: Fix cpu.max vs. cpuhotplug deadlock
2018-01-24Merge tag 'trace-v4.15-rc9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "With the new ORC unwinder, ftrace stack tracing became disfunctional. One was that ORC didn't know how to handle the ftrace callbacks in general (which Josh fixed). The other was that ORC would just bail if it hit a dynamically allocated trampoline. Which means all ftrace stack tracing that happens from the function tracer would produce no results (that includes killing the max stack size tracer). I added a check to the ORC unwinder to see if the trampoline belonged to ftrace, and if it did, use the orc entry of the static trampoline that was used to create the dynamic one (it would be identical). Finally, I noticed that the skip values of the stack tracing were out of whack. I went through and fixed them up" * tag 'trace-v4.15-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Update stack trace skipping for ORC unwinder ftrace, orc, x86: Handle ftrace dynamically allocated trampolines x86/ftrace: Fix ORC unwinding from ftrace handlers
2018-01-24Revert "module: Add retpoline tag to VERMAGIC"Greg Kroah-Hartman
This reverts commit 6cfb521ac0d5b97470883ff9b7facae264b7ab12. Turns out distros do not want to make retpoline as part of their "ABI", so this patch should not have been merged. Sorry Andi, this was my fault, I suggested it when your original patch was the "correct" way of doing this instead. Reported-by: Jiri Kosina <jikos@kernel.org> Fixes: 6cfb521ac0d5 ("module: Add retpoline tag to VERMAGIC") Acked-by: Andi Kleen <ak@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: rusty@rustcorp.com.au Cc: arjan.van.de.ven@intel.com Cc: jeyu@kernel.org Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-24sched/core: Fix cpu.max vs. cpuhotplug deadlockPeter Zijlstra
Tejun reported the following cpu-hotplug lock (percpu-rwsem) read recursion: tg_set_cfs_bandwidth() get_online_cpus() cpus_read_lock() cfs_bandwidth_usage_inc() static_key_slow_inc() cpus_read_lock() Reported-by: Tejun Heo <tj@kernel.org> Tested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180122215328.GP3397@worktop Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-23ftrace, orc, x86: Handle ftrace dynamically allocated trampolinesSteven Rostedt (VMware)
The function tracer can create a dynamically allocated trampoline that is called by the function mcount or fentry hook that is used to call the function callback that is registered. The problem is that the orc undwinder will bail if it encounters one of these trampolines. This breaks the stack trace of function callbacks, which include the stack tracer and setting the stack trace for individual functions. Since these dynamic trampolines are basically copies of the static ftrace trampolines defined in ftrace_*.S, we do not need to create new orc entries for the dynamic trampolines. Finding the return address on the stack will be identical as the functions that were copied to create the dynamic trampolines. When encountering a ftrace dynamic trampoline, we can just use the orc entry of the ftrace static function that was copied for that trampoline. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-01-21mm, page_vma_mapped: Drop faulty pointer arithmetics in check_pte()Kirill A. Shutemov
Tetsuo reported random crashes under memory pressure on 32-bit x86 system and tracked down to change that introduced page_vma_mapped_walk(). The root cause of the issue is the faulty pointer math in check_pte(). As ->pte may point to an arbitrary page we have to check that they are belong to the section before doing math. Otherwise it may lead to weird results. It wasn't noticed until now as mem_map[] is virtually contiguous on flatmem or vmemmap sparsemem. Pointer arithmetic just works against all 'struct page' pointers. But with classic sparsemem, it doesn't because each section memap is allocated separately and so consecutive pfns crossing two sections might have struct pages at completely unrelated addresses. Let's restructure code a bit and replace pointer arithmetic with operations on pfns. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-and-tested-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Acked-by: Michal Hocko <mhocko@suse.com> Fixes: ace71a19cec5 ("mm: introduce page_vma_mapped_walk()") Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-19sparse doesn't support struct randomizationMatthew Wilcox
Without this patch, I drown in a sea of unknown attribute warnings Link: http://lkml.kernel.org/r/20180117024539.27354-1-willy@infradead.org Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-17Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Ingo Molnar: "A delayacct statistics correctness fix" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: delayacct: Account blkio completion on the correct task
2018-01-17Merge branch 'x86-pti-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 pti bits and fixes from Thomas Gleixner: "This last update contains: - An objtool fix to prevent a segfault with the gold linker by changing the invocation order. That's not just for gold, it's a general robustness improvement. - An improved error message for objtool which spares tearing hairs. - Make KASAN fail loudly if there is not enough memory instead of oopsing at some random place later - RSB fill on context switch to prevent RSB underflow and speculation through other units. - Make the retpoline/RSB functionality work reliably for both Intel and AMD - Add retpoline to the module version magic so mismatch can be detected - A small (non-fix) update for cpufeatures which prevents cpu feature clashing for the upcoming extra mitigation bits to ease backporting" * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: module: Add retpoline tag to VERMAGIC x86/cpufeature: Move processor tracing out of scattered features objtool: Improve error message for bad file argument objtool: Fix seg fault with gold linker x86/retpoline: Add LFENCE to the retpoline/RSB filling RSB macros x86/retpoline: Fill RSB on context switch for affected CPUs x86/kasan: Panic if there is not enough memory to boot
2018-01-17module: Add retpoline tag to VERMAGICAndi Kleen
Add a marker for retpoline to the module VERMAGIC. This catches the case when a non RETPOLINE compiled module gets loaded into a retpoline kernel, making it insecure. It doesn't handle the case when retpoline has been runtime disabled. Even in this case the match of the retcompile status will be enforced. This implies that even with retpoline run time disabled all modules loaded need to be recompiled. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: David Woodhouse <dwmw@amazon.co.uk> Cc: rusty@rustcorp.com.au Cc: arjan.van.de.ven@intel.com Cc: jeyu@kernel.org Cc: torvalds@linux-foundation.org Link: https://lkml.kernel.org/r/20180116205228.4890-1-andi@firstfloor.org
2018-01-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Two read past end of buffer fixes in AF_KEY, from Eric Biggers. 2) Memory leak in key_notify_policy(), from Steffen Klassert. 3) Fix overflow with bpf arrays, from Daniel Borkmann. 4) Fix RDMA regression with mlx5 due to mlx5 no longer using pci_irq_get_affinity(), from Saeed Mahameed. 5) Missing RCU read locking in nl80211_send_iface() when it calls ieee80211_bss_get_ie(), from Dominik Brodowski. 6) cfg80211 should check dev_set_name()'s return value, from Johannes Berg. 7) Missing module license tag in 9p protocol, from Stephen Hemminger. 8) Fix crash due to too small MTU in udp ipv6 sendmsg, from Mike Maloney. 9) Fix endless loop in netlink extack code, from David Ahern. 10) TLS socket layer sets inverted error codes, resulting in an endless loop. From Robert Hering. 11) Revert openvswitch erspan tunnel support, it's mis-designed and we need to kill it before it goes into a real release. From William Tu. 12) Fix lan78xx failures in full speed USB mode, from Yuiko Oshino. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (54 commits) net, sched: fix panic when updating miniq {b,q}stats qed: Fix potential use-after-free in qed_spq_post() nfp: use the correct index for link speed table lan78xx: Fix failure in USB Full Speed sctp: do not allow the v4 socket to bind a v4mapped v6 address sctp: return error if the asoc has been peeled off in sctp_wait_for_sndbuf sctp: reinit stream if stream outcnt has been change by sinit in sendmsg ibmvnic: Fix pending MAC address changes netlink: extack: avoid parenthesized string constant warning ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY net: Allow neigh contructor functions ability to modify the primary_key sh_eth: fix dumping ARSTR Revert "openvswitch: Add erspan tunnel support." net/tls: Fix inverted error codes to avoid endless loop ipv6: ip6_make_skb() needs to clear cork.base.dst sctp: avoid compiler warning on implicit fallthru net: ipv4: Make "ip route get" match iif lo rules again. netlink: extack needs to be reset each time through loop tipc: fix a memory leak in tipc_nl_node_get_link() ipv6: fix udpv6 sendmsg crash caused by too small MTU ...
2018-01-16delayacct: Account blkio completion on the correct taskJosh Snyder
Before commit: e33a9bba85a8 ("sched/core: move IO scheduling accounting from io_schedule_timeout() into scheduler") delayacct_blkio_end() was called after context-switching into the task which completed I/O. This resulted in double counting: the task would account a delay both waiting for I/O and for time spent in the runqueue. With e33a9bba85a8, delayacct_blkio_end() is called by try_to_wake_up(). In ttwu, we have not yet context-switched. This is more correct, in that the delay accounting ends when the I/O is complete. But delayacct_blkio_end() relies on 'get_current()', and we have not yet context-switched into the task whose I/O completed. This results in the wrong task having its delay accounting statistics updated. Instead of doing that, pass the task_struct being woken to delayacct_blkio_end(), so that it can update the statistics of the correct task. Signed-off-by: Josh Snyder <joshs@netflix.com> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Balbir Singh <bsingharora@gmail.com> Cc: <stable@vger.kernel.org> Cc: Brendan Gregg <bgregg@netflix.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-block@vger.kernel.org Fixes: e33a9bba85a8 ("sched/core: move IO scheduling accounting from io_schedule_timeout() into scheduler") Link: http://lkml.kernel.org/r/1513613712-571-1-git-send-email-joshs@netflix.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-15netlink: extack: avoid parenthesized string constant warningJohannes Berg
NL_SET_ERR_MSG() and NL_SET_ERR_MSG_ATTR() lead to the following warning in newer versions of gcc: warning: array initialized from parenthesized string constant Just remove the parentheses, they're not needed in this context since anyway since there can be no operator precendence issues or similar. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-15ptr_ring: document usage around __ptr_ring_peekMichael S. Tsirkin
This explains why is the net usage of __ptr_ring_peek actually ok without locks. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-14Merge branch 'x86-pti-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 pti updates from Thomas Gleixner: "This contains: - a PTI bugfix to avoid setting reserved CR3 bits when PCID is disabled. This seems to cause issues on a virtual machine at least and is incorrect according to the AMD manual. - a PTI bugfix which disables the perf BTS facility if PTI is enabled. The BTS AUX buffer is not globally visible and causes the CPU to fault when the mapping disappears on switching CR3 to user space. A full fix which restores BTS on PTI is non trivial and will be worked on. - PTI bugfixes for EFI and trusted boot which make sure that the user space visible page table entries have the NX bit cleared - removal of dead code in the PTI pagetable setup functions - add PTI documentation - add a selftest for vsyscall to verify that the kernel actually implements what it advertises. - a sysfs interface to expose vulnerability and mitigation information so there is a coherent way for users to retrieve the status. - the initial spectre_v2 mitigations, aka retpoline: + The necessary ASM thunk and compiler support + The ASM variants of retpoline and the conversion of affected ASM code + Make LFENCE serializing on AMD so it can be used as speculation trap + The RSB fill after vmexit - initial objtool support for retpoline As I said in the status mail this is the most of the set of patches which should go into 4.15 except two straight forward patches still on hold: - the retpoline add on of LFENCE which waits for ACKs - the RSB fill after context switch Both should be ready to go early next week and with that we'll have covered the major holes of spectre_v2 and go back to normality" * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (28 commits) x86,perf: Disable intel_bts when PTI security/Kconfig: Correct the Documentation reference for PTI x86/pti: Fix !PCID and sanitize defines selftests/x86: Add test_vsyscall x86/retpoline: Fill return stack buffer on vmexit x86/retpoline/irq32: Convert assembler indirect jumps x86/retpoline/checksum32: Convert assembler indirect jumps x86/retpoline/xen: Convert Xen hypercall indirect jumps x86/retpoline/hyperv: Convert assembler indirect jumps x86/retpoline/ftrace: Convert ftrace assembler indirect jumps x86/retpoline/entry: Convert entry assembler indirect jumps x86/retpoline/crypto: Convert crypto assembler indirect jumps x86/spectre: Add boot time option to select Spectre v2 mitigation x86/retpoline: Add initial retpoline support objtool: Allow alternatives to be ignored objtool: Detect jumps to retpoline thunks x86/pti: Make unpoison of pgd for trusted boot work for real x86/alternatives: Fix optimize_nops() checking sysfs/cpu: Fix typos in vulnerability documentation x86/cpu/AMD: Use LFENCE_RDTSC in preference to MFENCE_RDTSC ...
2018-01-13Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixlets from Andrew Morton: "4 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: tools/objtool/Makefile: don't assume sync-check.sh is executable kdump: write correct address of mem_section into vmcoreinfo kmemleak: allow to coexist with fault injection MAINTAINERS, nilfs2: change project home URLs
2018-01-13kdump: write correct address of mem_section into vmcoreinfoKirill A. Shutemov
Depending on configuration mem_section can now be an array or a pointer to an array allocated dynamically. In most cases, we can continue to refer to it as 'mem_section' regardless of what it is. But there's one exception: '&mem_section' means "address of the array" if mem_section is an array, but if mem_section is a pointer, it would mean "address of the pointer". We've stepped onto this in kdump code. VMCOREINFO_SYMBOL(mem_section) writes down address of pointer into vmcoreinfo, not array as we wanted. Let's introduce VMCOREINFO_SYMBOL_ARRAY() that would handle the situation correctly for both cases. Link: http://lkml.kernel.org/r/20180112162532.35896-1-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Fixes: 83e3c48729d9 ("mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y") Acked-by: Baoquan He <bhe@redhat.com> Acked-by: Dave Young <dyoung@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Dave Young <dyoung@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-12Merge branch 'locking-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Ingo Molnar: "No functional effects intended: removes leftovers from recent lockdep and refcounts work" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/refcounts: Remove stale comment from the ARCH_HAS_REFCOUNT Kconfig entry locking/lockdep: Remove cross-release leftovers locking/Documentation: Remove stale crossrelease_fullstack parameter
2018-01-12net/mlx5: Fix get vector affinity helper functionSaeed Mahameed
mlx5_get_vector_affinity used to call pci_irq_get_affinity and after reverting the patch that sets the device affinity via PCI_IRQ_AFFINITY API, calling pci_irq_get_affinity becomes useless and it breaks RDMA mlx5 users. To fix this, this patch provides an alternative way to retrieve IRQ vector affinity using legacy IRQ API, following smp_affinity read procfs implementation. Fixes: 231243c82793 ("Revert mlx5: move affinity hints assignments to generic code") Fixes: a435393acafb ("mlx5: move affinity hints assignments to generic code") Cc: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-12{net,ib}/mlx5: Don't disable local loopback multicast traffic when neededEran Ben Elisha
There are systems platform information management interfaces (such as HOST2BMC) for which we cannot disable local loopback multicast traffic. Separate disable_local_lb_mc and disable_local_lb_uc capability bits so driver will not disable multicast loopback traffic if not supported. (It is expected that Firmware will not set disable_local_lb_mc if HOST2BMC is running for example.) Function mlx5_nic_vport_update_local_lb will do best effort to disable/enable UC/MC loopback traffic and return success only in case it succeeded to changed all allowed by Firmware. Adapt mlx5_ib and mlx5e to support the new cap bits. Fixes: 2c43c5a036be ("net/mlx5e: Enable local loopback in loopback selftest") Fixes: c85023e153e3 ("IB/mlx5: Add raw ethernet local loopback support") Fixes: bded747bb432 ("net/mlx5: Add raw ethernet local loopback firmware command") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Cc: kernel-team@fb.com Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf 2018-01-09 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Prevent out-of-bounds speculation in BPF maps by masking the index after bounds checks in order to fix spectre v1, and add an option BPF_JIT_ALWAYS_ON into Kconfig that allows for removing the BPF interpreter from the kernel in favor of JIT-only mode to make spectre v2 harder, from Alexei. 2) Remove false sharing of map refcount with max_entries which was used in spectre v1, from Daniel. 3) Add a missing NULL psock check in sockmap in order to fix a race, from John. 4) Fix test_align BPF selftest case since a recent change in verifier rejects the bit-wise arithmetic on pointers earlier but test_align update was missing, from Alexei. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-09bpf: avoid false sharing of map refcount with max_entriesDaniel Borkmann
In addition to commit b2157399cc98 ("bpf: prevent out-of-bounds speculation") also change the layout of struct bpf_map such that false sharing of fast-path members like max_entries is avoided when the maps reference counter is altered. Therefore enforce them to be placed into separate cachelines. pahole dump after change: struct bpf_map { const struct bpf_map_ops * ops; /* 0 8 */ struct bpf_map * inner_map_meta; /* 8 8 */ void * security; /* 16 8 */ enum bpf_map_type map_type; /* 24 4 */ u32 key_size; /* 28 4 */ u32 value_size; /* 32 4 */ u32 max_entries; /* 36 4 */ u32 map_flags; /* 40 4 */ u32 pages; /* 44 4 */ u32 id; /* 48 4 */ int numa_node; /* 52 4 */ bool unpriv_array; /* 56 1 */ /* XXX 7 bytes hole, try to pack */ /* --- cacheline 1 boundary (64 bytes) --- */ struct user_struct * user; /* 64 8 */ atomic_t refcnt; /* 72 4 */ atomic_t usercnt; /* 76 4 */ struct work_struct work; /* 80 32 */ char name[16]; /* 112 16 */ /* --- cacheline 2 boundary (128 bytes) --- */ /* size: 128, cachelines: 2, members: 17 */ /* sum members: 121, holes: 1, sum holes: 7 */ }; Now all entries in the first cacheline are read only throughout the life time of the map, set up once during map creation. Overall struct size and number of cachelines doesn't change from the reordering. struct bpf_map is usually first member and embedded in map structs in specific map implementations, so also avoid those members to sit at the end where it could potentially share the cacheline with first map values e.g. in the array since remote CPUs could trigger map updates just as well for those (easily dirtying members like max_entries intentionally as well) while having subsequent values in cache. Quoting from Google's Project Zero blog [1]: Additionally, at least on the Intel machine on which this was tested, bouncing modified cache lines between cores is slow, apparently because the MESI protocol is used for cache coherence [8]. Changing the reference counter of an eBPF array on one physical CPU core causes the cache line containing the reference counter to be bounced over to that CPU core, making reads of the reference counter on all other CPU cores slow until the changed reference counter has been written back to memory. Because the length and the reference counter of an eBPF array are stored in the same cache line, this also means that changing the reference counter on one physical CPU core causes reads of the eBPF array's length to be slow on other physical CPU cores (intentional false sharing). While this doesn't 'control' the out-of-bounds speculation through masking the index as in commit b2157399cc98, triggering a manipulation of the map's reference counter is really trivial, so lets not allow to easily affect max_entries from it. Splitting to separate cachelines also generally makes sense from a performance perspective anyway in that fast-path won't have a cache miss if the map gets pinned, reused in other progs, etc out of control path, thus also avoids unintentional false sharing. [1] https://googleprojectzero.blogspot.ch/2018/01/reading-privileged-memory-with-side.html Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Frag and UDP handling fixes in i40e driver, from Amritha Nambiar and Alexander Duyck. 2) Undo unintentional UAPI change in netfilter conntrack, from Florian Westphal. 3) Revert a change to how error codes are returned from dev_get_valid_name(), it broke some apps. 4) Cannot cache routes for ipv6 tunnels in the tunnel is ipv4/ipv6 dual-stack. From Eli Cooper. 5) Fix missed PMTU updates in geneve, from Xin Long. 6) Cure double free in macvlan, from Gao Feng. 7) Fix heap out-of-bounds write in rds_message_alloc_sgs(), from Mohamed Ghannam. 8) FEC bug fixes from FUgang Duan (mis-accounting of dev_id, missed deferral of probe when the regulator is not ready yet). 9) Missing DMA mapping error checks in 3c59x, from Neil Horman. 10) Turn off Broadcom tags for some b53 switches, from Florian Fainelli. 11) Fix OOPS when get_target_net() is passed an SKB whose NETLINK_CB() isn't initialized. From Andrei Vagin. 12) Fix crashes in fib6_add(), from Wei Wang. 13) PMTU bug fixes in SCTP from Marcelo Ricardo Leitner. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (56 commits) sh_eth: fix TXALCR1 offsets mdio-sun4i: Fix a memory leak phylink: mark expected switch fall-throughs in phylink_mii_ioctl sctp: fix the handling of ICMP Frag Needed for too small MTUs sctp: do not retransmit upon FragNeeded if PMTU discovery is disabled xen-netfront: enable device after manual module load bnxt_en: Fix the 'Invalid VF' id check in bnxt_vf_ndo_prep routine. bnxt_en: Fix population of flow_type in bnxt_hwrm_cfa_flow_alloc() sh_eth: fix SH7757 GEther initialization net: fec: free/restore resource in related probe error pathes uapi/if_ether.h: prevent redefinition of struct ethhdr ipv6: fix general protection fault in fib6_add() RDS: null pointer dereference in rds_atomic_free_op sh_eth: fix TSU resource handling net: stmmac: enable EEE in MII, GMII or RGMII only rtnetlink: give a user socket to get_target_net() MAINTAINERS: Update my email address. can: ems_usb: improve error reporting for error warning and error passive can: flex_can: Correct the checking for frame length in flexcan_start_xmit() can: gs_usb: fix return value of the "set_bittiming" callback ...
2018-01-09bpf: prevent out-of-bounds speculationAlexei Starovoitov
Under speculation, CPUs may mis-predict branches in bounds checks. Thus, memory accesses under a bounds check may be speculated even if the bounds check fails, providing a primitive for building a side channel. To avoid leaking kernel data round up array-based maps and mask the index after bounds check, so speculated load with out of bounds index will load either valid value from the array or zero from the padded area. Unconditionally mask index for all array types even when max_entries are not rounded to power of 2 for root user. When map is created by unpriv user generate a sequence of bpf insns that includes AND operation to make sure that JITed code includes the same 'index & index_mask' operation. If prog_array map is created by unpriv user replace bpf_tail_call(ctx, map, index); with if (index >= max_entries) { index &= map->index_mask; bpf_tail_call(ctx, map, index); } (along with roundup to power 2) to prevent out-of-bounds speculation. There is secondary redundant 'if (index >= max_entries)' in the interpreter and in all JITs, but they can be optimized later if necessary. Other array-like maps (cpumap, devmap, sockmap, perf_event_array, cgroup_array) cannot be used by unpriv, so no changes there. That fixes bpf side of "Variant 1: bounds check bypass (CVE-2017-5753)" on all architectures with and without JIT. v2->v3: Daniel noticed that attack potentially can be crafted via syscall commands without loading the program, so add masking to those paths as well. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-08locking/lockdep: Remove cross-release leftoversIngo Molnar
There's two cross-release leftover facilities: - the crossrelease_hist_*() irq-tracing callbacks (NOPs currently) - the complete_release_commit() callback (NOP as well) Remove them. Cc: David Sterba <dsterba@suse.com> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-08sysfs/cpu: Add vulnerability folderThomas Gleixner
As the meltdown/spectre problem affects several CPU architectures, it makes sense to have common way to express whether a system is affected by a particular vulnerability or not. If affected the way to express the mitigation should be common as well. Create /sys/devices/system/cpu/vulnerabilities folder and files for meltdown, spectre_v1 and spectre_v2. Allow architectures to override the show function. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linuxfoundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: David Woodhouse <dwmw@amazon.co.uk> Link: https://lkml.kernel.org/r/20180107214913.096657732@linutronix.de
2018-01-06Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: - untangle sys_close() abuses in xt_bpf - deal with register_shrinker() failures in sget() * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fix "netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1'" sget(): handle failures of register_shrinker() mm,vmscan: Make unregister_shrinker() no-op if register_shrinker() failed.
2018-01-05Merge branch 'efi-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI updates from Thomas Gleixner: - A fix for a add_efi_memmap parameter regression which ensures that the parameter is parsed before it is used. - Reinstate the virtual capsule mapping as the cached copy turned out to break Quark and other things - Remove Matt Fleming as EFI co-maintainer. He stepped back a few days ago. Thanks Matt for all your great work! * 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: MAINTAINERS: Remove Matt Fleming as EFI co-maintainer efi/capsule-loader: Reinstate virtual capsule mapping x86/efi: Fix kernel param add_efi_memmap regression
2018-01-05sh_eth: fix SH7757 GEther initializationSergei Shtylyov
Renesas SH7757 has 2 Fast and 2 Gigabit Ether controllers, while the 'sh_eth' driver can only reset and initialize TSU of the first controller pair. Shimoda-san tried to solve that adding the 'needs_init' member to the 'struct sh_eth_plat_data', however the platform code still never sets this flag. I think that we can infer this information from the 'devno' variable (set to 'platform_device::id') and reset/init the Ether controller pair only for an even 'devno'; therefore 'sh_eth_plat_data::needs_init' can be removed... Fixes: 150647fb2c31 ("net: sh_eth: change the condition of initialization") Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-05fix "netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1'"Al Viro
Descriptor table is a shared object; it's not a place where you can stick temporary references to files, especially when we don't need an opened file at all. Cc: stable@vger.kernel.org # v4.14 Fixes: 98589a0998b8 ("netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1'") Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>