aboutsummaryrefslogtreecommitdiffstats
path: root/arch/powerpc
AgeCommit message (Collapse)Author
2015-01-29vm: add VM_FAULT_SIGSEGV handling supportLinus Torvalds
The core VM already knows about VM_FAULT_SIGBUS, but cannot return a "you should SIGSEGV" error, because the SIGSEGV case was generally handled by the caller - usually the architecture fault handler. That results in lots of duplication - all the architecture fault handlers end up doing very similar "look up vma, check permissions, do retries etc" - but it generally works. However, there are cases where the VM actually wants to SIGSEGV, and applications _expect_ SIGSEGV. In particular, when accessing the stack guard page, libsigsegv expects a SIGSEGV. And it usually got one, because the stack growth is handled by that duplicated architecture fault handler. However, when the generic VM layer started propagating the error return from the stack expansion in commit fee7e49d4514 ("mm: propagate error from stack expansion even for guard page"), that now exposed the existing VM_FAULT_SIGBUS result to user space. And user space really expected SIGSEGV, not SIGBUS. To fix that case, we need to add a VM_FAULT_SIGSEGV, and teach all those duplicate architecture fault handlers about it. They all already have the code to handle SIGSEGV, so it's about just tying that new return value to the existing code, but it's all a bit annoying. This is the mindless minimal patch to do this. A more extensive patch would be to try to gather up the mostly shared fault handling logic into one generic helper routine, and long-term we really should do that cleanup. Just from this patch, you can generally see that most architectures just copied (directly or indirectly) the old x86 way of doing things, but in the meantime that original x86 model has been improved to hold the VM semaphore for shorter times etc and to handle VM_FAULT_RETRY and other "newer" things, so it would be a good idea to bring all those improvements to the generic case and teach other architectures about them too. Reported-and-tested-by: Takashi Iwai <tiwai@suse.de> Tested-by: Jan Engelhardt <jengelh@inai.de> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # "s390 still compiles and boots" Cc: linux-arch@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-27Merge tag 'powerpc-3.19-5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux Pull powerpc fixes from Michael Ellerman: "Two powerpc fixes" * tag 'powerpc-3.19-5' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: powerpc/powernv: Restore LPCR with LPCR_PECE1 cleared powerpc/xmon: Fix another endiannes issue in RTAS call from xmon
2015-01-24Merge tag 'pci-v3.19-fixes-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fixes from Bjorn Helgaas: "These are fixes for: - a resource management problem that causes a Radeon "Fatal error during GPU init" on machines where the BIOS programmed an invalid Root Port window. This was a regression in v3.16. - an Atheros AR93xx device that doesn't handle PCI bus resets correctly. This was a regression in v3.14. - an out-of-date email address" * tag 'pci-v3.19-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: MAINTAINERS: Update Richard Zhu's email address sparc/PCI: Clip bridge windows to fit in upstream windows powerpc/PCI: Clip bridge windows to fit in upstream windows parisc/PCI: Clip bridge windows to fit in upstream windows mn10300/PCI: Clip bridge windows to fit in upstream windows microblaze/PCI: Clip bridge windows to fit in upstream windows ia64/PCI: Clip bridge windows to fit in upstream windows frv/PCI: Clip bridge windows to fit in upstream windows alpha/PCI: Clip bridge windows to fit in upstream windows x86/PCI: Clip bridge windows to fit in upstream windows PCI: Add pci_claim_bridge_resource() to clip window if necessary PCI: Add pci_bus_clip_resource() to clip to fit upstream window PCI: Pass bridge device, not bus, when updating bridge windows PCI: Mark Atheros AR93xx to avoid bus reset PCI: Add flag for devices where we can't use bus reset
2015-01-23Merge tag 'fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull module and param fixes from Rusty Russell: "Surprising number of fixes this merge window :( The first two are minor fallout from the param rework which went in this merge window. The next three are a series which fixes a longstanding (but never previously reported and unlikely , so no CC stable) race between kallsyms and freeing the init section. Finally, a minor cleanup as our module refcount will now be -1 during unload" * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: module: make module_refcount() a signed integer. module: fix race in kallsyms resolution during module load success. module: remove mod arg from module_free, rename module_memfree(). module_arch_freeing_init(): new hook for archs before module->module_init freed. param: fix uninitialized read with CONFIG_DEBUG_LOCK_ALLOC param: initialize store function to NULL if not available.
2015-01-22powerpc/powernv: Restore LPCR with LPCR_PECE1 clearedShreyas B. Prabhu
LPCR_PECE1 bit controls whether decrementer interrupts are allowed to cause exit from power-saving mode. While waking up from winkle, restoring LPCR with LPCR_PECE1 set (i.e Decrementer interrupts allowed) can cause issue in the following scenario: - All the threads in a core are offlined. The core enters deep winkle. - Spurious interrupt wakes up a thread in the core. Here LPCR is restored with LPCR_PECE1 bit set. - Since it was a spurious interrupt on a offline thread, the thread clears the interrupt and goes back to winkle. - Here before the thread executes winkle and puts the core into deep winkle, if a decrementer interrupt occurs on any of the sibling threads in the core that thread wakes up. - Since in offline loop we are flushing interrupt only in case of external interrupt, the decrementer interrupt does not get flushed. So at this stage the thread is stuck in this is loop of waking up at 0x100 due to decrementer interrupt, not flushing the interrupt as only external interrupts get flushed, entering winkle, waking up at 0x100 again. Fix this by programming PORE to restore LPCR with LPCR_PECE1 bit cleared when waking up from winkle. Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds
Pull crypto fix from Herbert Xu: "This fixes a regression that arose from the change to add a crypto prefix to module names which was done to prevent the loading of arbitrary modules through the Crypto API. In particular, a number of modules were missing the crypto prefix which meant that they could no longer be autoloaded" * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: add missing crypto module aliases
2015-01-20module: remove mod arg from module_free, rename module_memfree().Rusty Russell
Nothing needs the module pointer any more, and the next patch will call it from RCU, where the module itself might no longer exist. Removing the arg is the safest approach. This just codifies the use of the module_alloc/module_free pattern which ftrace and bpf use. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Alexei Starovoitov <ast@kernel.org> Cc: Mikael Starvik <starvik@axis.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Ley Foon Tan <lftan@altera.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: x86@kernel.org Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: linux-cris-kernel@axis.com Cc: linux-kernel@vger.kernel.org Cc: linux-mips@linux-mips.org Cc: nios2-dev@lists.rocketboards.org Cc: linuxppc-dev@lists.ozlabs.org Cc: sparclinux@vger.kernel.org Cc: netdev@vger.kernel.org
2015-01-19powerpc/xmon: Fix another endiannes issue in RTAS call from xmonLaurent Dufour
The commit 3b8a3c010969 ("powerpc/pseries: Fix endiannes issue in RTAS call from xmon") was fixing an endianness issue in the call made from xmon to RTAS. However, as Michael Ellerman noticed, this fix was not complete, the token value was not byte swapped. This lead to call an unexpected and most of the time unexisting RTAS function, which is silently ignored by RTAS. This fix addresses this hole. Reported-by: Michael Ellerman <mpe@ellerman.id.au> Cc: stable@vger.kernel.org Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-16powerpc/PCI: Clip bridge windows to fit in upstream windowsYinghai Lu
Every PCI-PCI bridge window should fit inside an upstream bridge window because orphaned address space is unreachable from the primary side of the upstream bridge. If we inherit invalid bridge windows that overlap an upstream window from firmware, clip them to fit and update the bridge accordingly. [bhelgaas: changelog] Link: https://bugzilla.kernel.org/show_bug.cgi?id=85491 Reported-by: Marek Kordik <kordikmarek@gmail.com> Fixes: 5b28541552ef ("PCI: Restrict 64-bit prefetchable bridge windows to 64-bit resources") Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org> CC: Paul Mackerras <paulus@samba.org> CC: Michael Ellerman <mpe@ellerman.id.au> CC: Gavin Shan <gwshan@linux.vnet.ibm.com> CC: Anton Blanchard <anton@samba.org> CC: Sebastian Ott <sebott@linux.vnet.ibm.com> CC: Wei Yang <weiyang@linux.vnet.ibm.com> CC: Andrew Murray <amurray@embedded-bits.co.uk> CC: linuxppc-dev@lists.ozlabs.org
2015-01-13crypto: add missing crypto module aliasesMathias Krause
Commit 5d26a105b5a7 ("crypto: prefix module autoloading with "crypto-"") changed the automatic module loading when requesting crypto algorithms to prefix all module requests with "crypto-". This requires all crypto modules to have a crypto specific module alias even if their file name would otherwise match the requested crypto algorithm. Even though commit 5d26a105b5a7 added those aliases for a vast amount of modules, it was missing a few. Add the required MODULE_ALIAS_CRYPTO annotations to those files to make them get loaded automatically, again. This fixes, e.g., requesting 'ecb(blowfish-generic)', which used to work with kernels v3.18 and below. Also change MODULE_ALIAS() lines to MODULE_ALIAS_CRYPTO(). The former won't work for crypto modules any more. Fixes: 5d26a105b5a7 ("crypto: prefix module autoloading with "crypto-"") Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-01-12powerpc: Work around gcc bug in current_thread_info()Michael Ellerman
In commit a3e5b356b3ab "powerpc: Don't use local named register variable in current_thread_info" Anton changed the way we did current_thread_info() to accommodate LLVM, and it was not meant to have any effect elsewhere. Unfortunately it has exposed a gcc bug, where r1 gets copied into another register and then gcc uses that register to restore the toc after a function call, even when that register is volatile and has been clobbered by the function call. We could revert Anton's patch, but it's not clear the original code is safe either, we may just have been lucky. The cleanest solution is to just use the existing CURRENT_THREAD_INFO() asm macro, and call it using inline asm. Segher points out we don't need volatile on the asm, if the result of the shift is unused it's fine for the compiler to elide it. Fixes: a3e5b356b3ab ("powerpc: Don't use local named register variable in current_thread_info") Reported-by: Alexander Graf <agraf@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-12powernv: Fix OPAL tracepoint codeAnton Blanchard
Patch c49f63530bb6 ("powernv: Add OPAL tracepoints") has a spurious store to the stack: ld r12,opal_tracepoint_refcount@toc(r2); \ std r12,32(r1); \ The store was originally used to save the current tracepoint status so the entry and the exit tracepoints were always balanced. In the end I just created a separate path when tracepoints are enabled. The offset on the stack used for this store is not valid for ABIv2 and it causes strange issues. I noticed it because OPAL console input was broken. Fixes: c49f63530bb6 ("powernv: Add OPAL tracepoints") Cc: <stable@vger.kernel.org> # v3.17+ Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-29Revert "powerpc: Secondary CPUs must set cpu_callin_map after setting active ↵Michael Ellerman
and online" This reverts commit 7c5c92ed56d932b2c19c3f8aea86369509407d33. Although this did fix the bug it was aimed at, it also broke secondary startup on platforms that use give/take_timebase(). Unfortunately we didn't detect that while it was in next. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-29powerpc/kdump: Ignore failure in enabling big endian exception during crashHari Bathini
In LE kernel, we currently have a hack for kexec that resets the exception endian before starting a new kernel as the kernel that is loaded could be a big endian or a little endian kernel. In kdump case, resetting exception endian fails when one or more cpus is disabled. But we can ignore the failure and still go ahead, as in most cases crashkernel will be of same endianess as primary kernel and reseting endianess is not even needed in those cases. This patch adds a new inline function to say if this is kdump path. This function is used at places where such a check is needed. Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com> [mpe: Rename to kdump_in_progress(), use bool, and edit comment] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-29powerpc: Wire up sys_execveat() syscallPranith Kumar
Wire up sys_execveat(). This passes the selftests for the system call. Check success of execveat(3, '../execveat', 0)... [OK] Check success of execveat(5, 'execveat', 0)... [OK] Check success of execveat(6, 'execveat', 0)... [OK] Check success of execveat(-100, '/home/pranith/linux/...ftests/exec/execveat', 0)... [OK] Check success of execveat(99, '/home/pranith/linux/...ftests/exec/execveat', 0)... [OK] Check success of execveat(8, '', 4096)... [OK] Check success of execveat(17, '', 4096)... [OK] Check success of execveat(9, '', 4096)... [OK] Check success of execveat(14, '', 4096)... [OK] Check success of execveat(14, '', 4096)... [OK] Check success of execveat(15, '', 4096)... [OK] Check failure of execveat(8, '', 0) with ENOENT... [OK] Check failure of execveat(8, '(null)', 4096) with EFAULT... [OK] Check success of execveat(5, 'execveat.symlink', 0)... [OK] Check success of execveat(6, 'execveat.symlink', 0)... [OK] Check success of execveat(-100, '/home/pranith/linux/...xec/execveat.symlink', 0)... [OK] Check success of execveat(10, '', 4096)... [OK] Check success of execveat(10, '', 4352)... [OK] Check failure of execveat(5, 'execveat.symlink', 256) with ELOOP... [OK] Check failure of execveat(6, 'execveat.symlink', 256) with ELOOP... [OK] Check failure of execveat(-100, '/home/pranith/linux/tools/testing/selftests/exec/execveat.symlink', 256) with ELOOP... [OK] Check success of execveat(3, '../script', 0)... [OK] Check success of execveat(5, 'script', 0)... [OK] Check success of execveat(6, 'script', 0)... [OK] Check success of execveat(-100, '/home/pranith/linux/...elftests/exec/script', 0)... [OK] Check success of execveat(13, '', 4096)... [OK] Check success of execveat(13, '', 4352)... [OK] Check failure of execveat(18, '', 4096) with ENOENT... [OK] Check failure of execveat(7, 'script', 0) with ENOENT... [OK] Check success of execveat(16, '', 4096)... [OK] Check success of execveat(16, '', 4096)... [OK] Check success of execveat(4, '../script', 0)... [OK] Check success of execveat(4, 'script', 0)... [OK] Check success of execveat(4, '../script', 0)... [OK] Check failure of execveat(4, 'script', 0) with ENOENT... [OK] Check failure of execveat(5, 'execveat', 65535) with EINVAL... [OK] Check failure of execveat(5, 'no-such-file', 0) with ENOENT... [OK] Check failure of execveat(6, 'no-such-file', 0) with ENOENT... [OK] Check failure of execveat(-100, 'no-such-file', 0) with ENOENT... [OK] Check failure of execveat(5, '', 4096) with EACCES... [OK] Check failure of execveat(5, 'Makefile', 0) with EACCES... [OK] Check failure of execveat(11, '', 4096) with EACCES... [OK] Check failure of execveat(12, '', 4096) with EACCES... [OK] Check failure of execveat(99, '', 4096) with EBADF... [OK] Check failure of execveat(99, 'execveat', 0) with EBADF... [OK] Check failure of execveat(8, 'execveat', 0) with ENOTDIR... [OK] Invoke copy of 'execveat' via filename of length 4093: Check success of execveat(19, '', 4096)... [OK] Check success of execveat(5, 'xxxxxxxxxxxxxxxxxxxx...yyyyyyyyyyyyyyyyyyyy', 0)... [OK] Invoke copy of 'script' via filename of length 4093: Check success of execveat(20, '', 4096)... [OK] /bin/sh: 0: Can't open /dev/fd/5/xxxxxxx(... a long line of x's and y's, 0)... [OK] Check success of execveat(5, 'xxxxxxxxxxxxxxxxxxxx...yyyyyyyyyyyyyyyyyyyy', 0)... [OK] Tested on a 32-bit powerpc system. Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-20Merge tag 'pm-config-3.19-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull CONFIG_PM_RUNTIME elimination from Rafael Wysocki: "This removes the last few uses of CONFIG_PM_RUNTIME introduced recently and makes that config option finally go away. CONFIG_PM will be available directly from the menu now and also it will be selected automatically if CONFIG_SUSPEND or CONFIG_HIBERNATION is set" * tag 'pm-config-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM: Eliminate CONFIG_PM_RUNTIME tty: 8250_omap: Replace CONFIG_PM_RUNTIME with CONFIG_PM sound: sst-haswell-pcm: Replace CONFIG_PM_RUNTIME with CONFIG_PM spi: Replace CONFIG_PM_RUNTIME with CONFIG_PM
2014-12-19PM: Eliminate CONFIG_PM_RUNTIMERafael J. Wysocki
Having switched over all of the users of CONFIG_PM_RUNTIME to use CONFIG_PM directly, turn the latter into a user-selectable option and drop the former entirely from the tree. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Kevin Hilman <khilman@linaro.org>
2014-12-19Merge tag 'powerpc-3.19-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux Pull second batch of powerpc updates from Michael Ellerman: "The highlight is the series that reworks the idle management on powernv, which allows us to use deeper idle states on those machines. There's the fix from Anton for the "BUG at kernel/smpboot.c:134!" problem. An i2c driver for powernv. This is acked by Wolfram Sang, and he asked that we take it through the powerpc tree. A fix for audit from rgb at Red Hat, acked by Paul Moore who is one of the audit maintainers. A patch from Ben to export the symbol map of our OPAL firmware as a sysfs file, so that tools can use it. Also some CXL fixes, a couple of powerpc perf fixes, a fix for smt-enabled, and the patch to add __force to get_user() so we can use bitwise types" * tag 'powerpc-3.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: powerpc/powernv: Ignore smt-enabled on Power8 and later powerpc/uaccess: Allow get_user() with bitwise types powerpc/powernv: Expose OPAL firmware symbol map powernv/powerpc: Add winkle support for offline cpus powernv/cpuidle: Redesign idle states management powerpc/powernv: Enable Offline CPUs to enter deep idle states powerpc/powernv: Switch off MMU before entering nap/sleep/rvwinkle mode i2c: Driver to expose PowerNV platform i2c busses powerpc: add little endian flag to syscall_get_arch() power/perf/hv-24x7: Use kmem_cache_free() instead of kfree powerpc/perf/hv-24x7: Use per-cpu page buffer cxl: Unmap MMIO regions when detaching a context cxl: Add timeout to process element commands cxl: Change contexts_lock to a mutex to fix sleep while atomic bug powerpc: Secondary CPUs must set cpu_callin_map after setting active and online
2014-12-18Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM update from Paolo Bonzini: "3.19 changes for KVM: - spring cleaning: removed support for IA64, and for hardware- assisted virtualization on the PPC970 - ARM, PPC, s390 all had only small fixes For x86: - small performance improvements (though only on weird guests) - usual round of hardware-compliancy fixes from Nadav - APICv fixes - XSAVES support for hosts and guests. XSAVES hosts were broken because the (non-KVM) XSAVES patches inadvertently changed the KVM userspace ABI whenever XSAVES was enabled; hence, this part is going to stable. Guest support is just a matter of exposing the feature and CPUID leaves support" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (179 commits) KVM: move APIC types to arch/x86/ KVM: PPC: Book3S: Enable in-kernel XICS emulation by default KVM: PPC: Book3S HV: Improve H_CONFER implementation KVM: PPC: Book3S HV: Fix endianness of instruction obtained from HEIR register KVM: PPC: Book3S HV: Remove code for PPC970 processors KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactions KVM: PPC: Book3S HV: Simplify locking around stolen time calculations arch: powerpc: kvm: book3s_paired_singles.c: Remove unused function arch: powerpc: kvm: book3s_pr.c: Remove unused function arch: powerpc: kvm: book3s.c: Remove some unused functions arch: powerpc: kvm: book3s_32_mmu.c: Remove unused function KVM: PPC: Book3S HV: Check wait conditions before sleeping in kvmppc_vcore_blocked KVM: PPC: Book3S HV: ptes are big endian KVM: PPC: Book3S HV: Fix inaccuracies in ICP emulation for H_IPI KVM: PPC: Book3S HV: Fix KSM memory corruption KVM: PPC: Book3S HV: Fix an issue where guest is paused on receiving HMI KVM: PPC: Book3S HV: Fix computation of tlbie operand KVM: PPC: Book3S HV: Add missing HPTE unlock KVM: PPC: BookE: Improve irq inject tracepoint arm/arm64: KVM: Require in-kernel vgic for the arch timers ...
2014-12-18KVM: PPC: E500: Compile fix in this_cpu_writeAlexander Graf
Commit 69111bac42f5 ("powerpc: Replace __get_cpu_var uses") introduced compile breakage to the e500 target by introducing invalid automatically created C syntax. Fix up the breakage and make the code compile again. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-18powerpc/powernv: Ignore smt-enabled on Power8 and laterGreg Kurz
Starting with POWER8, the subcore logic relies on all threads of a core being booted so that they can participate in split mode switches. So on those machines we ignore the smt_enabled_at_boot setting (smt-enabled on the kernel command line). Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com> [mpe: Update comment and change log to be more precise] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-18powerpc/uaccess: Allow get_user() with bitwise typesMichael S. Tsirkin
At the moment, if p and x are both of the same bitwise type (eg. __le32), get_user(x, p) produces a sparse warning. This is because *p is loaded into a long then cast back to typeof(*p). When typeof(*p) is a bitwise type (which is uncommon), such a cast needs __force, otherwise sparse produces a warning. For non-bitwise types __force should have no effect, and should not hide any legitimate errors. Note that we are casting to typeof(*p) not typeof(x). Even with the cast, if x and *p are of different types we should get the warning, so I think we are not loosing the ability to detect any actual errors. virtio would like to use bitwise types with get_user() so fix these spurious warnings by adding __force. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> [mpe: Fill in changelog with more details] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-17KVM: PPC: Book3S: Enable in-kernel XICS emulation by defaultAnton Blanchard
The in-kernel XICS emulation is faster than doing it all in QEMU and it has got a lot of testing, so enable it by default. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-17KVM: PPC: Book3S HV: Improve H_CONFER implementationSam Bobroff
Currently the H_CONFER hcall is implemented in kernel virtual mode, meaning that whenever a guest thread does an H_CONFER, all the threads in that virtual core have to exit the guest. This is bad for performance because it interrupts the other threads even if they are doing useful work. The H_CONFER hcall is called by a guest VCPU when it is spinning on a spinlock and it detects that the spinlock is held by a guest VCPU that is currently not running on a physical CPU. The idea is to give this VCPU's time slice to the holder VCPU so that it can make progress towards releasing the lock. To avoid having the other threads exit the guest unnecessarily, we add a real-mode implementation of H_CONFER that checks whether the other threads are doing anything. If all the other threads are idle (i.e. in H_CEDE) or trying to confer (i.e. in H_CONFER), it returns H_TOO_HARD which causes a guest exit and allows the H_CONFER to be handled in virtual mode. Otherwise it spins for a short time (up to 10 microseconds) to give other threads the chance to observe that this thread is trying to confer. The spin loop also terminates when any thread exits the guest or when all other threads are idle or trying to confer. If the timeout is reached, the H_CONFER returns H_SUCCESS. In this case the guest VCPU will recheck the spinlock word and most likely call H_CONFER again. This also improves the implementation of the H_CONFER virtual mode handler. If the VCPU is part of a virtual core (vcore) which is runnable, there will be a 'runner' VCPU which has taken responsibility for running the vcore. In this case we yield to the runner VCPU rather than the target VCPU. We also introduce a check on the target VCPU's yield count: if it differs from the yield count passed to H_CONFER, the target VCPU has run since H_CONFER was called and may have already released the lock. This check is required by PAPR. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-17KVM: PPC: Book3S HV: Fix endianness of instruction obtained from HEIR registerPaul Mackerras
There are two ways in which a guest instruction can be obtained from the guest in the guest exit code in book3s_hv_rmhandlers.S. If the exit was caused by a Hypervisor Emulation interrupt (i.e. an illegal instruction), the offending instruction is in the HEIR register (Hypervisor Emulation Instruction Register). If the exit was caused by a load or store to an emulated MMIO device, we load the instruction from the guest by turning data relocation on and loading the instruction with an lwz instruction. Unfortunately, in the case where the guest has opposite endianness to the host, these two methods give results of different endianness, but both get put into vcpu->arch.last_inst. The HEIR value has been loaded using guest endianness, whereas the lwz will load the instruction using host endianness. The rest of the code that uses vcpu->arch.last_inst assumes it was loaded using host endianness. To fix this, we define a new vcpu field to store the HEIR value. Then, in kvmppc_handle_exit_hv(), we transfer the value from this new field to vcpu->arch.last_inst, doing a byte-swap if the guest and host endianness differ. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-17KVM: PPC: Book3S HV: Remove code for PPC970 processorsPaul Mackerras
This removes the code that was added to enable HV KVM to work on PPC970 processors. The PPC970 is an old CPU that doesn't support virtualizing guest memory. Removing PPC970 support also lets us remove the code for allocating and managing contiguous real-mode areas, the code for the !kvm->arch.using_mmu_notifiers case, the code for pinning pages of guest memory when first accessed and keeping track of which pages have been pinned, and the code for handling H_ENTER hypercalls in virtual mode. Book3S HV KVM is now supported only on POWER7 and POWER8 processors. The KVM_CAP_PPC_RMA capability now always returns 0. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-17KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactionsSuresh E. Warrier
This patch adds trace points in the guest entry and exit code and also for exceptions handled by the host in kernel mode - hypercalls and page faults. The new events are added to /sys/kernel/debug/tracing/events under a new subsystem called kvm_hv. Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-17KVM: PPC: Book3S HV: Simplify locking around stolen time calculationsPaul Mackerras
Currently the calculations of stolen time for PPC Book3S HV guests uses fields in both the vcpu struct and the kvmppc_vcore struct. The fields in the kvmppc_vcore struct are protected by the vcpu->arch.tbacct_lock of the vcpu that has taken responsibility for running the virtual core. This works correctly but confuses lockdep, because it sees that the code takes the tbacct_lock for a vcpu in kvmppc_remove_runnable() and then takes another vcpu's tbacct_lock in vcore_stolen_time(), and it thinks there is a possibility of deadlock, causing it to print reports like this: ============================================= [ INFO: possible recursive locking detected ] 3.18.0-rc7-kvm-00016-g8db4bc6 #89 Not tainted --------------------------------------------- qemu-system-ppc/6188 is trying to acquire lock: (&(&vcpu->arch.tbacct_lock)->rlock){......}, at: [<d00000000ecb1fe8>] .vcore_stolen_time+0x48/0xd0 [kvm_hv] but task is already holding lock: (&(&vcpu->arch.tbacct_lock)->rlock){......}, at: [<d00000000ecb25a0>] .kvmppc_remove_runnable.part.3+0x30/0xd0 [kvm_hv] other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(&vcpu->arch.tbacct_lock)->rlock); lock(&(&vcpu->arch.tbacct_lock)->rlock); *** DEADLOCK *** May be due to missing lock nesting notation 3 locks held by qemu-system-ppc/6188: #0: (&vcpu->mutex){+.+.+.}, at: [<d00000000eb93f98>] .vcpu_load+0x28/0xe0 [kvm] #1: (&(&vcore->lock)->rlock){+.+...}, at: [<d00000000ecb41b0>] .kvmppc_vcpu_run_hv+0x530/0x1530 [kvm_hv] #2: (&(&vcpu->arch.tbacct_lock)->rlock){......}, at: [<d00000000ecb25a0>] .kvmppc_remove_runnable.part.3+0x30/0xd0 [kvm_hv] stack backtrace: CPU: 40 PID: 6188 Comm: qemu-system-ppc Not tainted 3.18.0-rc7-kvm-00016-g8db4bc6 #89 Call Trace: [c000000b2754f3f0] [c000000000b31b6c] .dump_stack+0x88/0xb4 (unreliable) [c000000b2754f470] [c0000000000faeb8] .__lock_acquire+0x1878/0x2190 [c000000b2754f600] [c0000000000fbf0c] .lock_acquire+0xcc/0x1a0 [c000000b2754f6d0] [c000000000b2954c] ._raw_spin_lock_irq+0x4c/0x70 [c000000b2754f760] [d00000000ecb1fe8] .vcore_stolen_time+0x48/0xd0 [kvm_hv] [c000000b2754f7f0] [d00000000ecb25b4] .kvmppc_remove_runnable.part.3+0x44/0xd0 [kvm_hv] [c000000b2754f880] [d00000000ecb43ec] .kvmppc_vcpu_run_hv+0x76c/0x1530 [kvm_hv] [c000000b2754f9f0] [d00000000eb9f46c] .kvmppc_vcpu_run+0x2c/0x40 [kvm] [c000000b2754fa60] [d00000000eb9c9a4] .kvm_arch_vcpu_ioctl_run+0x54/0x160 [kvm] [c000000b2754faf0] [d00000000eb94538] .kvm_vcpu_ioctl+0x498/0x760 [kvm] [c000000b2754fcb0] [c000000000267eb4] .do_vfs_ioctl+0x444/0x770 [c000000b2754fd90] [c0000000002682a4] .SyS_ioctl+0xc4/0xe0 [c000000b2754fe30] [c0000000000092e4] syscall_exit+0x0/0x98 In order to make the locking easier to analyse, we change the code to use a spinlock in the kvmppc_vcore struct to protect the stolen_tb and preempt_tb fields. This lock needs to be an irq-safe lock since it is used in the kvmppc_core_vcpu_load_hv() and kvmppc_core_vcpu_put_hv() functions, which are called with the scheduler rq lock held, which is an irq-safe lock. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-17arch: powerpc: kvm: book3s_paired_singles.c: Remove unused functionRickard Strandqvist
Remove the function inst_set_field() that is not used anywhere. This was partially found by using a static code analysis program called cppcheck. Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-17arch: powerpc: kvm: book3s_pr.c: Remove unused functionRickard Strandqvist
Remove the function get_fpr_index() that is not used anywhere. This was partially found by using a static code analysis program called cppcheck. Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-17arch: powerpc: kvm: book3s.c: Remove some unused functionsRickard Strandqvist
Removes some functions that are not used anywhere: kvmppc_core_load_guest_debugstate() kvmppc_core_load_host_debugstate() This was partially found by using a static code analysis program called cppcheck. Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-17arch: powerpc: kvm: book3s_32_mmu.c: Remove unused functionRickard Strandqvist
Remove the function sr_nx() that is not used anywhere. This was partially found by using a static code analysis program called cppcheck. Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-15KVM: PPC: Book3S HV: Check wait conditions before sleeping in ↵Suresh E. Warrier
kvmppc_vcore_blocked The kvmppc_vcore_blocked() code does not check for the wait condition after putting the process on the wait queue. This means that it is possible for an external interrupt to become pending, but the vcpu to remain asleep until the next decrementer interrupt. The fix is to make one last check for pending exceptions and ceded state before calling schedule(). Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-15KVM: PPC: Book3S HV: ptes are big endianCédric Le Goater
When being restored from qemu, the kvm_get_htab_header are in native endian, but the ptes are big endian. This patch fixes restore on a KVM LE host. Qemu also needs a fix for this : http://lists.nongnu.org/archive/html/qemu-ppc/2014-11/msg00008.html Signed-off-by: Cédric Le Goater <clg@fr.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-15KVM: PPC: Book3S HV: Fix inaccuracies in ICP emulation for H_IPISuresh E. Warrier
This fixes some inaccuracies in the state machine for the virtualized ICP when implementing the H_IPI hcall (Set_MFFR and related states): 1. The old code wipes out any pending interrupts when the new MFRR is more favored than the CPPR but less favored than a pending interrupt (by always modifying xisr and the pending_pri). This can cause us to lose a pending external interrupt. The correct code here is to only modify the pending_pri and xisr in the ICP if the MFRR is equal to or more favored than the current pending pri (since in this case, it is guaranteed that that there cannot be a pending external interrupt). The code changes are required in both kvmppc_rm_h_ipi and kvmppc_h_ipi. 2. Again, in both kvmppc_rm_h_ipi and kvmppc_h_ipi, there is a check for whether MFRR is being made less favored AND further if new MFFR is also less favored than the current CPPR, we check for any resends pending in the ICP. These checks look like they are designed to cover the case where if the MFRR is being made less favored, we opportunistically trigger a resend of any interrupts that had been previously rejected. Although, this is not a state described by PAPR, this is an action we actually need to do especially if the CPPR is already at 0xFF. Because in this case, the resend bit will stay on until another ICP state change which may be a long time coming and the interrupt stays pending until then. The current code which checks for MFRR < CPPR is broken when CPPR is 0xFF since it will not get triggered in that case. Ideally, we would want to do a resend only if prio(pending_interrupt) < mfrr && prio(pending_interrupt) < cppr where pending interrupt is the one that was rejected. But we don't have the priority of the pending interrupt state saved, so we simply trigger a resend whenever the MFRR is made less favored. 3. In kvmppc_rm_h_ipi, where we save state to pass resends to the virtual mode, we also need to save the ICP whose need_resend we reset since this does not need to be my ICP (vcpu->arch.icp) as is incorrectly assumed by the current code. A new field rm_resend_icp is added to the kvmppc_icp structure for this purpose. Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-15KVM: PPC: Book3S HV: Fix KSM memory corruptionPaul Mackerras
Testing with KSM active in the host showed occasional corruption of guest memory. Typically a page that should have contained zeroes would contain values that look like the contents of a user process stack (values such as 0x0000_3fff_xxxx_xxx). Code inspection in kvmppc_h_protect revealed that there was a race condition with the possibility of granting write access to a page which is read-only in the host page tables. The code attempts to keep the host mapping read-only if the host userspace PTE is read-only, but if that PTE had been temporarily made invalid for any reason, the read-only check would not trigger and the host HPTE could end up read-write. Examination of the guest HPT in the failure situation revealed that there were indeed shared pages which should have been read-only that were mapped read-write. To close this race, we don't let a page go from being read-only to being read-write, as far as the real HPTE mapping the page is concerned (the guest view can go to read-write, but the actual mapping stays read-only). When the guest tries to write to the page, we take an HDSI and let kvmppc_book3s_hv_page_fault take care of providing a writable HPTE for the page. This eliminates the occasional corruption of shared pages that was previously seen with KSM active. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-15KVM: PPC: Book3S HV: Fix an issue where guest is paused on receiving HMIMahesh Salgaonkar
When we get an HMI (hypervisor maintenance interrupt) while in a guest, we see that guest enters into paused state. The reason is, in kvmppc_handle_exit_hv it falls through default path and returns to host instead of resuming guest. This causes guest to enter into paused state. HMI is a hypervisor only interrupt and it is safe to resume the guest since the host has handled it already. This patch adds a switch case to resume the guest. Without this patch we see guest entering into paused state with following console messages: [ 3003.329351] Severe Hypervisor Maintenance interrupt [Recovered] [ 3003.329356] Error detail: Timer facility experienced an error [ 3003.329359] HMER: 0840000000000000 [ 3003.329360] TFMR: 4a12000980a84000 [ 3003.329366] vcpu c0000007c35094c0 (40): [ 3003.329368] pc = c0000000000c2ba0 msr = 8000000000009032 trap = e60 [ 3003.329370] r 0 = c00000000021ddc0 r16 = 0000000000000046 [ 3003.329372] r 1 = c00000007a02bbd0 r17 = 00003ffff27d5d98 [ 3003.329375] r 2 = c0000000010980b8 r18 = 00001fffffc9a0b0 [ 3003.329377] r 3 = c00000000142d6b8 r19 = c00000000142d6b8 [ 3003.329379] r 4 = 0000000000000002 r20 = 0000000000000000 [ 3003.329381] r 5 = c00000000524a110 r21 = 0000000000000000 [ 3003.329383] r 6 = 0000000000000001 r22 = 0000000000000000 [ 3003.329386] r 7 = 0000000000000000 r23 = c00000000524a110 [ 3003.329388] r 8 = 0000000000000000 r24 = 0000000000000001 [ 3003.329391] r 9 = 0000000000000001 r25 = c00000007c31da38 [ 3003.329393] r10 = c0000000014280b8 r26 = 0000000000000002 [ 3003.329395] r11 = 746f6f6c2f68656c r27 = c00000000524a110 [ 3003.329397] r12 = 0000000028004484 r28 = c00000007c31da38 [ 3003.329399] r13 = c00000000fe01400 r29 = 0000000000000002 [ 3003.329401] r14 = 0000000000000046 r30 = c000000003011e00 [ 3003.329403] r15 = ffffffffffffffba r31 = 0000000000000002 [ 3003.329404] ctr = c00000000041a670 lr = c000000000272520 [ 3003.329405] srr0 = c00000000007e8d8 srr1 = 9000000000001002 [ 3003.329406] sprg0 = 0000000000000000 sprg1 = c00000000fe01400 [ 3003.329407] sprg2 = c00000000fe01400 sprg3 = 0000000000000005 [ 3003.329408] cr = 48004482 xer = 2000000000000000 dsisr = 42000000 [ 3003.329409] dar = 0000010015020048 [ 3003.329410] fault dar = 0000010015020048 dsisr = 42000000 [ 3003.329411] SLB (8 entries): [ 3003.329412] ESID = c000000008000000 VSID = 40016e7779000510 [ 3003.329413] ESID = d000000008000001 VSID = 400142add1000510 [ 3003.329414] ESID = f000000008000004 VSID = 4000eb1a81000510 [ 3003.329415] ESID = 00001f000800000b VSID = 40004fda0a000d90 [ 3003.329416] ESID = 00003f000800000c VSID = 400039f536000d90 [ 3003.329417] ESID = 000000001800000d VSID = 0001251b35150d90 [ 3003.329417] ESID = 000001000800000e VSID = 4001e46090000d90 [ 3003.329418] ESID = d000080008000019 VSID = 40013d349c000400 [ 3003.329419] lpcr = c048800001847001 sdr1 = 0000001b19000006 last_inst = ffffffff [ 3003.329421] trap=0xe60 | pc=0xc0000000000c2ba0 | msr=0x8000000000009032 [ 3003.329524] Severe Hypervisor Maintenance interrupt [Recovered] [ 3003.329526] Error detail: Timer facility experienced an error [ 3003.329527] HMER: 0840000000000000 [ 3003.329527] TFMR: 4a12000980a94000 [ 3006.359786] Severe Hypervisor Maintenance interrupt [Recovered] [ 3006.359792] Error detail: Timer facility experienced an error [ 3006.359795] HMER: 0840000000000000 [ 3006.359797] TFMR: 4a12000980a84000 Id Name State ---------------------------------------------------- 2 guest2 running 3 guest3 paused 4 guest4 running Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-15KVM: PPC: Book3S HV: Fix computation of tlbie operandPaul Mackerras
The B (segment size) field in the RB operand for the tlbie instruction is two bits, which we get from the top two bits of the first doubleword of the HPT entry to be invalidated. These bits go in bits 8 and 9 of the RB operand (bits 54 and 55 in IBM bit numbering). The compute_tlbie_rb() function gets these bits as v >> (62 - 8), which is not correct as it will bring in the top 10 bits, not just the top two. These extra bits could corrupt the AP, AVAL and L fields in the RB value. To fix this we shift right 62 bits and then shift left 8 bits, so we only get the two bits of the B field. The first doubleword of the HPT entry is under the control of the guest kernel. In fact, Linux guests will always put zeroes in bits 54 -- 61 (IBM bits 2 -- 9), but we should not rely on guests doing this. Signed-off-by: Paul Mackerras <paulus@samba.org> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-15KVM: PPC: Book3S HV: Add missing HPTE unlockAneesh Kumar K.V
In kvm_test_clear_dirty(), if we find an invalid HPTE we move on to the next HPTE without unlocking the invalid one. In fact we should never find an invalid and unlocked HPTE in the rmap chain, but for robustness we should unlock it. This adds the missing unlock. Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-15KVM: PPC: BookE: Improve irq inject tracepointAlexander Graf
When injecting an IRQ, we only document which IRQ priority (which translates to IRQ type) gets injected. However, when reading traces you don't necessarily have all the numbers in your head to know which IRQ really is meant. This patch converts the IRQ number field to a symbolic name that is in sync with the respective define. That way it's a lot easier for readers to figure out what interrupt gets injected. Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-15powerpc/powernv: Expose OPAL firmware symbol mapBenjamin Herrenschmidt
Newer versions of OPAL will provide this, so let's expose it to user space so tools like perf can use it to properly decode samples in firmware space. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-14Merge tag 'driver-core-3.19-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core update from Greg KH: "Here's the set of driver core patches for 3.19-rc1. They are dominated by the removal of the .owner field in platform drivers. They touch a lot of files, but they are "simple" changes, just removing a line in a structure. Other than that, a few minor driver core and debugfs changes. There are some ath9k patches coming in through this tree that have been acked by the wireless maintainers as they relied on the debugfs changes. Everything has been in linux-next for a while" * tag 'driver-core-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (324 commits) Revert "ath: ath9k: use debugfs_create_devm_seqfile() helper for seq_file entries" fs: debugfs: add forward declaration for struct device type firmware class: Deletion of an unnecessary check before the function call "vunmap" firmware loader: fix hung task warning dump devcoredump: provide a one-way disable function device: Add dev_<level>_once variants ath: ath9k: use debugfs_create_devm_seqfile() helper for seq_file entries ath: use seq_file api for ath9k debugfs files debugfs: add helper function to create device related seq_file drivers/base: cacheinfo: remove noisy error boot message Revert "core: platform: add warning if driver has no owner" drivers: base: support cpu cache information interface to userspace via sysfs drivers: base: add cpu_device_create to support per-cpu devices topology: replace custom attribute macros with standard DEVICE_ATTR* cpumask: factor out show_cpumap into separate helper function driver core: Fix unbalanced device reference in drivers_probe driver core: fix race with userland in device_add() sysfs/kernfs: make read requests on pre-alloc files use the buffer. sysfs/kernfs: allow attributes to request write buffer be pre-allocated. fs: sysfs: return EGBIG on write if offset is larger than file size ...
2014-12-15powernv/powerpc: Add winkle support for offline cpusShreyas B. Prabhu
Winkle is a deep idle state supported in power8 chips. A core enters winkle when all the threads of the core enter winkle. In this state power supply to the entire chiplet i.e core, private L2 and private L3 is turned off. As a result it gives higher powersavings compared to sleep. But entering winkle results in a total hypervisor state loss. Hence the hypervisor context has to be preserved before entering winkle and restored upon wake up. Power-on Reset Engine (PORE) is a dedicated engine which is responsible for powering on the chiplet during wake up. It can be programmed to restore the register contests of a few specific registers. This patch uses PORE to restore register state wherever possible and uses stack to save and restore rest of the necessary registers. With hypervisor state restore things fall under three categories- per-core state, per-subcore state and per-thread state. To manage this, extend the infrastructure introduced for sleep. Mainly we add a paca variable subcore_sibling_mask. Using this and the core_idle_state we can distingush first thread in core and subcore. Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-15powernv/cpuidle: Redesign idle states managementShreyas B. Prabhu
Deep idle states like sleep and winkle are per core idle states. A core enters these states only when all the threads enter either the particular idle state or a deeper one. There are tasks like fastsleep hardware bug workaround and hypervisor core state save which have to be done only by the last thread of the core entering deep idle state and similarly tasks like timebase resync, hypervisor core register restore that have to be done only by the first thread waking up from these state. The current idle state management does not have a way to distinguish the first/last thread of the core waking/entering idle states. Tasks like timebase resync are done for all the threads. This is not only is suboptimal, but can cause functionality issues when subcores and kvm is involved. This patch adds the necessary infrastructure to track idle states of threads in a per-core structure. It uses this info to perform tasks like fastsleep workaround and timebase resync only once per core. Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Originally-by: Preeti U. Murthy <preeti@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: linux-pm@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-15powerpc/powernv: Enable Offline CPUs to enter deep idle statesShreyas B. Prabhu
The secondary threads should enter deep idle states so as to gain maximum powersavings when the entire core is offline. To do so the offline path must be made aware of the available deepest idle state. Hence probe the device tree for the possible idle states in powernv core code and expose the deepest idle state through flags. Since the device tree is probed by the cpuidle driver as well, move the parameters required to discover the idle states into an appropriate common place to both the driver and the powernv core code. Another point is that fastsleep idle state may require workarounds in the kernel to function properly. This workaround is introduced in the subsequent patches. However neither the cpuidle driver or the hotplug path need be bothered about this workaround. They will be taken care of by the core powernv code. Originally-by: Srivatsa S. Bhat <srivatsa@mit.edu> Signed-off-by: Preeti U. Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Reviewed-by: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: linux-pm@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-15powerpc/powernv: Switch off MMU before entering nap/sleep/rvwinkle modePaul Mackerras
Currently, when going idle, we set the flag indicating that we are in nap mode (paca->kvm_hstate.hwthread_state) and then execute the nap (or sleep or rvwinkle) instruction, all with the MMU on. This is bad for two reasons: (a) the architecture specifies that those instructions must be executed with the MMU off, and in fact with only the SF, HV, ME and possibly RI bits set, and (b) this introduces a race, because as soon as we set the flag, another thread can switch the MMU to a guest context. If the race is lost, this thread will typically start looping on relocation-on ISIs at 0xc...4400. This fixes it by setting the MSR as required by the architecture before setting the flag or executing the nap/sleep/rvwinkle instruction. Cc: stable@vger.kernel.org [ shreyas@linux.vnet.ibm.com: Edited to handle LE ] Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-14i2c: Driver to expose PowerNV platform i2c bussesNeelesh Gupta
The patch exposes the available i2c busses on the PowerNV platform to the kernel and implements the bus driver to support i2c and smbus commands. The driver uses the platform device infrastructure to probe the busses on the platform and registers them with the i2c driver framework. Signed-off-by: Neelesh Gupta <neelegup@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Wolfram Sang <wsa@the-dreams.de> (I2C part, excluding the bindings) Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds
Pull crypto update from Herbert Xu: - The crypto API is now documented :) - Disallow arbitrary module loading through crypto API. - Allow get request with empty driver name through crypto_user. - Allow speed testing of arbitrary hash functions. - Add caam support for ctr(aes), gcm(aes) and their derivatives. - nx now supports concurrent hashing properly. - Add sahara support for SHA1/256. - Add ARM64 version of CRC32. - Misc fixes. * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (77 commits) crypto: tcrypt - Allow speed testing of arbitrary hash functions crypto: af_alg - add user space interface for AEAD crypto: qat - fix problem with coalescing enable logic crypto: sahara - add support for SHA1/256 crypto: sahara - replace tasklets with kthread crypto: sahara - add support for i.MX53 crypto: sahara - fix spinlock initialization crypto: arm - replace memset by memzero_explicit crypto: powerpc - replace memset by memzero_explicit crypto: sha - replace memset by memzero_explicit crypto: sparc - replace memset by memzero_explicit crypto: algif_skcipher - initialize upon init request crypto: algif_skcipher - removed unneeded code crypto: algif_skcipher - Fixed blocking recvmsg crypto: drbg - use memzero_explicit() for clearing sensitive data crypto: drbg - use MODULE_ALIAS_CRYPTO crypto: include crypto- module prefix in template crypto: user - add MODULE_ALIAS crypto: sha-mb - remove a bogus NULL check crytpo: qat - Fix 64 bytes requests ...
2014-12-13Merge branch 'akpm' (second patch-bomb from Andrew)Linus Torvalds
Merge second patchbomb from Andrew Morton: - the rest of MM - misc fs fixes - add execveat() syscall - new ratelimit feature for fault-injection - decompressor updates - ipc/ updates - fallocate feature creep - fsnotify cleanups - a few other misc things * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (99 commits) cgroups: Documentation: fix trivial typos and wrong paragraph numberings parisc: percpu: update comments referring to __get_cpu_var percpu: update local_ops.txt to reflect this_cpu operations percpu: remove __get_cpu_var and __raw_get_cpu_var macros fsnotify: remove destroy_list from fsnotify_mark fsnotify: unify inode and mount marks handling fallocate: create FAN_MODIFY and IN_MODIFY events mm/cma: make kmemleak ignore CMA regions slub: fix cpuset check in get_any_partial slab: fix cpuset check in fallback_alloc shmdt: use i_size_read() instead of ->i_size ipc/shm.c: fix overly aggressive shmdt() when calls span multiple segments ipc/msg: increase MSGMNI, remove scaling ipc/sem.c: increase SEMMSL, SEMMNI, SEMOPM ipc/sem.c: change memory barrier in sem_lock() to smp_rmb() lib/decompress.c: consistency of compress formats for kernel image decompress_bunzip2: off by one in get_next_block() usr/Kconfig: make initrd compression algorithm selection not expert fault-inject: add ratelimit option ratelimit: add initialization macro ...
2014-12-13gcov: enable GCOV_PROFILE_ALL from ARCH KconfigsRiku Voipio
Following the suggestions from Andrew Morton and Stephen Rothwell, Dont expand the ARCH list in kernel/gcov/Kconfig. Instead, define a ARCH_HAS_GCOV_PROFILE_ALL bool which architectures can enable. set ARCH_HAS_GCOV_PROFILE_ALL on Architectures where it was previously allowed + ARM64 which I tested. Signed-off-by: Riku Voipio <riku.voipio@linaro.org> Cc: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>