summaryrefslogtreecommitdiffstats
path: root/arch/powerpc
AgeCommit message (Collapse)Author
2014-12-20Merge tag 'pm-config-3.19-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull CONFIG_PM_RUNTIME elimination from Rafael Wysocki: "This removes the last few uses of CONFIG_PM_RUNTIME introduced recently and makes that config option finally go away. CONFIG_PM will be available directly from the menu now and also it will be selected automatically if CONFIG_SUSPEND or CONFIG_HIBERNATION is set" * tag 'pm-config-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM: Eliminate CONFIG_PM_RUNTIME tty: 8250_omap: Replace CONFIG_PM_RUNTIME with CONFIG_PM sound: sst-haswell-pcm: Replace CONFIG_PM_RUNTIME with CONFIG_PM spi: Replace CONFIG_PM_RUNTIME with CONFIG_PM
2014-12-19PM: Eliminate CONFIG_PM_RUNTIMERafael J. Wysocki
Having switched over all of the users of CONFIG_PM_RUNTIME to use CONFIG_PM directly, turn the latter into a user-selectable option and drop the former entirely from the tree. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Kevin Hilman <khilman@linaro.org>
2014-12-19Merge tag 'powerpc-3.19-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux Pull second batch of powerpc updates from Michael Ellerman: "The highlight is the series that reworks the idle management on powernv, which allows us to use deeper idle states on those machines. There's the fix from Anton for the "BUG at kernel/smpboot.c:134!" problem. An i2c driver for powernv. This is acked by Wolfram Sang, and he asked that we take it through the powerpc tree. A fix for audit from rgb at Red Hat, acked by Paul Moore who is one of the audit maintainers. A patch from Ben to export the symbol map of our OPAL firmware as a sysfs file, so that tools can use it. Also some CXL fixes, a couple of powerpc perf fixes, a fix for smt-enabled, and the patch to add __force to get_user() so we can use bitwise types" * tag 'powerpc-3.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: powerpc/powernv: Ignore smt-enabled on Power8 and later powerpc/uaccess: Allow get_user() with bitwise types powerpc/powernv: Expose OPAL firmware symbol map powernv/powerpc: Add winkle support for offline cpus powernv/cpuidle: Redesign idle states management powerpc/powernv: Enable Offline CPUs to enter deep idle states powerpc/powernv: Switch off MMU before entering nap/sleep/rvwinkle mode i2c: Driver to expose PowerNV platform i2c busses powerpc: add little endian flag to syscall_get_arch() power/perf/hv-24x7: Use kmem_cache_free() instead of kfree powerpc/perf/hv-24x7: Use per-cpu page buffer cxl: Unmap MMIO regions when detaching a context cxl: Add timeout to process element commands cxl: Change contexts_lock to a mutex to fix sleep while atomic bug powerpc: Secondary CPUs must set cpu_callin_map after setting active and online
2014-12-18Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM update from Paolo Bonzini: "3.19 changes for KVM: - spring cleaning: removed support for IA64, and for hardware- assisted virtualization on the PPC970 - ARM, PPC, s390 all had only small fixes For x86: - small performance improvements (though only on weird guests) - usual round of hardware-compliancy fixes from Nadav - APICv fixes - XSAVES support for hosts and guests. XSAVES hosts were broken because the (non-KVM) XSAVES patches inadvertently changed the KVM userspace ABI whenever XSAVES was enabled; hence, this part is going to stable. Guest support is just a matter of exposing the feature and CPUID leaves support" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (179 commits) KVM: move APIC types to arch/x86/ KVM: PPC: Book3S: Enable in-kernel XICS emulation by default KVM: PPC: Book3S HV: Improve H_CONFER implementation KVM: PPC: Book3S HV: Fix endianness of instruction obtained from HEIR register KVM: PPC: Book3S HV: Remove code for PPC970 processors KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactions KVM: PPC: Book3S HV: Simplify locking around stolen time calculations arch: powerpc: kvm: book3s_paired_singles.c: Remove unused function arch: powerpc: kvm: book3s_pr.c: Remove unused function arch: powerpc: kvm: book3s.c: Remove some unused functions arch: powerpc: kvm: book3s_32_mmu.c: Remove unused function KVM: PPC: Book3S HV: Check wait conditions before sleeping in kvmppc_vcore_blocked KVM: PPC: Book3S HV: ptes are big endian KVM: PPC: Book3S HV: Fix inaccuracies in ICP emulation for H_IPI KVM: PPC: Book3S HV: Fix KSM memory corruption KVM: PPC: Book3S HV: Fix an issue where guest is paused on receiving HMI KVM: PPC: Book3S HV: Fix computation of tlbie operand KVM: PPC: Book3S HV: Add missing HPTE unlock KVM: PPC: BookE: Improve irq inject tracepoint arm/arm64: KVM: Require in-kernel vgic for the arch timers ...
2014-12-18KVM: PPC: E500: Compile fix in this_cpu_writeAlexander Graf
Commit 69111bac42f5 ("powerpc: Replace __get_cpu_var uses") introduced compile breakage to the e500 target by introducing invalid automatically created C syntax. Fix up the breakage and make the code compile again. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-18powerpc/powernv: Ignore smt-enabled on Power8 and laterGreg Kurz
Starting with POWER8, the subcore logic relies on all threads of a core being booted so that they can participate in split mode switches. So on those machines we ignore the smt_enabled_at_boot setting (smt-enabled on the kernel command line). Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com> [mpe: Update comment and change log to be more precise] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-18powerpc/uaccess: Allow get_user() with bitwise typesMichael S. Tsirkin
At the moment, if p and x are both of the same bitwise type (eg. __le32), get_user(x, p) produces a sparse warning. This is because *p is loaded into a long then cast back to typeof(*p). When typeof(*p) is a bitwise type (which is uncommon), such a cast needs __force, otherwise sparse produces a warning. For non-bitwise types __force should have no effect, and should not hide any legitimate errors. Note that we are casting to typeof(*p) not typeof(x). Even with the cast, if x and *p are of different types we should get the warning, so I think we are not loosing the ability to detect any actual errors. virtio would like to use bitwise types with get_user() so fix these spurious warnings by adding __force. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> [mpe: Fill in changelog with more details] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-17KVM: PPC: Book3S: Enable in-kernel XICS emulation by defaultAnton Blanchard
The in-kernel XICS emulation is faster than doing it all in QEMU and it has got a lot of testing, so enable it by default. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-17KVM: PPC: Book3S HV: Improve H_CONFER implementationSam Bobroff
Currently the H_CONFER hcall is implemented in kernel virtual mode, meaning that whenever a guest thread does an H_CONFER, all the threads in that virtual core have to exit the guest. This is bad for performance because it interrupts the other threads even if they are doing useful work. The H_CONFER hcall is called by a guest VCPU when it is spinning on a spinlock and it detects that the spinlock is held by a guest VCPU that is currently not running on a physical CPU. The idea is to give this VCPU's time slice to the holder VCPU so that it can make progress towards releasing the lock. To avoid having the other threads exit the guest unnecessarily, we add a real-mode implementation of H_CONFER that checks whether the other threads are doing anything. If all the other threads are idle (i.e. in H_CEDE) or trying to confer (i.e. in H_CONFER), it returns H_TOO_HARD which causes a guest exit and allows the H_CONFER to be handled in virtual mode. Otherwise it spins for a short time (up to 10 microseconds) to give other threads the chance to observe that this thread is trying to confer. The spin loop also terminates when any thread exits the guest or when all other threads are idle or trying to confer. If the timeout is reached, the H_CONFER returns H_SUCCESS. In this case the guest VCPU will recheck the spinlock word and most likely call H_CONFER again. This also improves the implementation of the H_CONFER virtual mode handler. If the VCPU is part of a virtual core (vcore) which is runnable, there will be a 'runner' VCPU which has taken responsibility for running the vcore. In this case we yield to the runner VCPU rather than the target VCPU. We also introduce a check on the target VCPU's yield count: if it differs from the yield count passed to H_CONFER, the target VCPU has run since H_CONFER was called and may have already released the lock. This check is required by PAPR. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-17KVM: PPC: Book3S HV: Fix endianness of instruction obtained from HEIR registerPaul Mackerras
There are two ways in which a guest instruction can be obtained from the guest in the guest exit code in book3s_hv_rmhandlers.S. If the exit was caused by a Hypervisor Emulation interrupt (i.e. an illegal instruction), the offending instruction is in the HEIR register (Hypervisor Emulation Instruction Register). If the exit was caused by a load or store to an emulated MMIO device, we load the instruction from the guest by turning data relocation on and loading the instruction with an lwz instruction. Unfortunately, in the case where the guest has opposite endianness to the host, these two methods give results of different endianness, but both get put into vcpu->arch.last_inst. The HEIR value has been loaded using guest endianness, whereas the lwz will load the instruction using host endianness. The rest of the code that uses vcpu->arch.last_inst assumes it was loaded using host endianness. To fix this, we define a new vcpu field to store the HEIR value. Then, in kvmppc_handle_exit_hv(), we transfer the value from this new field to vcpu->arch.last_inst, doing a byte-swap if the guest and host endianness differ. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-17KVM: PPC: Book3S HV: Remove code for PPC970 processorsPaul Mackerras
This removes the code that was added to enable HV KVM to work on PPC970 processors. The PPC970 is an old CPU that doesn't support virtualizing guest memory. Removing PPC970 support also lets us remove the code for allocating and managing contiguous real-mode areas, the code for the !kvm->arch.using_mmu_notifiers case, the code for pinning pages of guest memory when first accessed and keeping track of which pages have been pinned, and the code for handling H_ENTER hypercalls in virtual mode. Book3S HV KVM is now supported only on POWER7 and POWER8 processors. The KVM_CAP_PPC_RMA capability now always returns 0. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-17KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactionsSuresh E. Warrier
This patch adds trace points in the guest entry and exit code and also for exceptions handled by the host in kernel mode - hypercalls and page faults. The new events are added to /sys/kernel/debug/tracing/events under a new subsystem called kvm_hv. Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-17KVM: PPC: Book3S HV: Simplify locking around stolen time calculationsPaul Mackerras
Currently the calculations of stolen time for PPC Book3S HV guests uses fields in both the vcpu struct and the kvmppc_vcore struct. The fields in the kvmppc_vcore struct are protected by the vcpu->arch.tbacct_lock of the vcpu that has taken responsibility for running the virtual core. This works correctly but confuses lockdep, because it sees that the code takes the tbacct_lock for a vcpu in kvmppc_remove_runnable() and then takes another vcpu's tbacct_lock in vcore_stolen_time(), and it thinks there is a possibility of deadlock, causing it to print reports like this: ============================================= [ INFO: possible recursive locking detected ] 3.18.0-rc7-kvm-00016-g8db4bc6 #89 Not tainted --------------------------------------------- qemu-system-ppc/6188 is trying to acquire lock: (&(&vcpu->arch.tbacct_lock)->rlock){......}, at: [<d00000000ecb1fe8>] .vcore_stolen_time+0x48/0xd0 [kvm_hv] but task is already holding lock: (&(&vcpu->arch.tbacct_lock)->rlock){......}, at: [<d00000000ecb25a0>] .kvmppc_remove_runnable.part.3+0x30/0xd0 [kvm_hv] other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(&vcpu->arch.tbacct_lock)->rlock); lock(&(&vcpu->arch.tbacct_lock)->rlock); *** DEADLOCK *** May be due to missing lock nesting notation 3 locks held by qemu-system-ppc/6188: #0: (&vcpu->mutex){+.+.+.}, at: [<d00000000eb93f98>] .vcpu_load+0x28/0xe0 [kvm] #1: (&(&vcore->lock)->rlock){+.+...}, at: [<d00000000ecb41b0>] .kvmppc_vcpu_run_hv+0x530/0x1530 [kvm_hv] #2: (&(&vcpu->arch.tbacct_lock)->rlock){......}, at: [<d00000000ecb25a0>] .kvmppc_remove_runnable.part.3+0x30/0xd0 [kvm_hv] stack backtrace: CPU: 40 PID: 6188 Comm: qemu-system-ppc Not tainted 3.18.0-rc7-kvm-00016-g8db4bc6 #89 Call Trace: [c000000b2754f3f0] [c000000000b31b6c] .dump_stack+0x88/0xb4 (unreliable) [c000000b2754f470] [c0000000000faeb8] .__lock_acquire+0x1878/0x2190 [c000000b2754f600] [c0000000000fbf0c] .lock_acquire+0xcc/0x1a0 [c000000b2754f6d0] [c000000000b2954c] ._raw_spin_lock_irq+0x4c/0x70 [c000000b2754f760] [d00000000ecb1fe8] .vcore_stolen_time+0x48/0xd0 [kvm_hv] [c000000b2754f7f0] [d00000000ecb25b4] .kvmppc_remove_runnable.part.3+0x44/0xd0 [kvm_hv] [c000000b2754f880] [d00000000ecb43ec] .kvmppc_vcpu_run_hv+0x76c/0x1530 [kvm_hv] [c000000b2754f9f0] [d00000000eb9f46c] .kvmppc_vcpu_run+0x2c/0x40 [kvm] [c000000b2754fa60] [d00000000eb9c9a4] .kvm_arch_vcpu_ioctl_run+0x54/0x160 [kvm] [c000000b2754faf0] [d00000000eb94538] .kvm_vcpu_ioctl+0x498/0x760 [kvm] [c000000b2754fcb0] [c000000000267eb4] .do_vfs_ioctl+0x444/0x770 [c000000b2754fd90] [c0000000002682a4] .SyS_ioctl+0xc4/0xe0 [c000000b2754fe30] [c0000000000092e4] syscall_exit+0x0/0x98 In order to make the locking easier to analyse, we change the code to use a spinlock in the kvmppc_vcore struct to protect the stolen_tb and preempt_tb fields. This lock needs to be an irq-safe lock since it is used in the kvmppc_core_vcpu_load_hv() and kvmppc_core_vcpu_put_hv() functions, which are called with the scheduler rq lock held, which is an irq-safe lock. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-17arch: powerpc: kvm: book3s_paired_singles.c: Remove unused functionRickard Strandqvist
Remove the function inst_set_field() that is not used anywhere. This was partially found by using a static code analysis program called cppcheck. Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-17arch: powerpc: kvm: book3s_pr.c: Remove unused functionRickard Strandqvist
Remove the function get_fpr_index() that is not used anywhere. This was partially found by using a static code analysis program called cppcheck. Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-17arch: powerpc: kvm: book3s.c: Remove some unused functionsRickard Strandqvist
Removes some functions that are not used anywhere: kvmppc_core_load_guest_debugstate() kvmppc_core_load_host_debugstate() This was partially found by using a static code analysis program called cppcheck. Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-17arch: powerpc: kvm: book3s_32_mmu.c: Remove unused functionRickard Strandqvist
Remove the function sr_nx() that is not used anywhere. This was partially found by using a static code analysis program called cppcheck. Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-15KVM: PPC: Book3S HV: Check wait conditions before sleeping in ↵Suresh E. Warrier
kvmppc_vcore_blocked The kvmppc_vcore_blocked() code does not check for the wait condition after putting the process on the wait queue. This means that it is possible for an external interrupt to become pending, but the vcpu to remain asleep until the next decrementer interrupt. The fix is to make one last check for pending exceptions and ceded state before calling schedule(). Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-15KVM: PPC: Book3S HV: ptes are big endianCédric Le Goater
When being restored from qemu, the kvm_get_htab_header are in native endian, but the ptes are big endian. This patch fixes restore on a KVM LE host. Qemu also needs a fix for this : http://lists.nongnu.org/archive/html/qemu-ppc/2014-11/msg00008.html Signed-off-by: Cédric Le Goater <clg@fr.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-15KVM: PPC: Book3S HV: Fix inaccuracies in ICP emulation for H_IPISuresh E. Warrier
This fixes some inaccuracies in the state machine for the virtualized ICP when implementing the H_IPI hcall (Set_MFFR and related states): 1. The old code wipes out any pending interrupts when the new MFRR is more favored than the CPPR but less favored than a pending interrupt (by always modifying xisr and the pending_pri). This can cause us to lose a pending external interrupt. The correct code here is to only modify the pending_pri and xisr in the ICP if the MFRR is equal to or more favored than the current pending pri (since in this case, it is guaranteed that that there cannot be a pending external interrupt). The code changes are required in both kvmppc_rm_h_ipi and kvmppc_h_ipi. 2. Again, in both kvmppc_rm_h_ipi and kvmppc_h_ipi, there is a check for whether MFRR is being made less favored AND further if new MFFR is also less favored than the current CPPR, we check for any resends pending in the ICP. These checks look like they are designed to cover the case where if the MFRR is being made less favored, we opportunistically trigger a resend of any interrupts that had been previously rejected. Although, this is not a state described by PAPR, this is an action we actually need to do especially if the CPPR is already at 0xFF. Because in this case, the resend bit will stay on until another ICP state change which may be a long time coming and the interrupt stays pending until then. The current code which checks for MFRR < CPPR is broken when CPPR is 0xFF since it will not get triggered in that case. Ideally, we would want to do a resend only if prio(pending_interrupt) < mfrr && prio(pending_interrupt) < cppr where pending interrupt is the one that was rejected. But we don't have the priority of the pending interrupt state saved, so we simply trigger a resend whenever the MFRR is made less favored. 3. In kvmppc_rm_h_ipi, where we save state to pass resends to the virtual mode, we also need to save the ICP whose need_resend we reset since this does not need to be my ICP (vcpu->arch.icp) as is incorrectly assumed by the current code. A new field rm_resend_icp is added to the kvmppc_icp structure for this purpose. Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-15KVM: PPC: Book3S HV: Fix KSM memory corruptionPaul Mackerras
Testing with KSM active in the host showed occasional corruption of guest memory. Typically a page that should have contained zeroes would contain values that look like the contents of a user process stack (values such as 0x0000_3fff_xxxx_xxx). Code inspection in kvmppc_h_protect revealed that there was a race condition with the possibility of granting write access to a page which is read-only in the host page tables. The code attempts to keep the host mapping read-only if the host userspace PTE is read-only, but if that PTE had been temporarily made invalid for any reason, the read-only check would not trigger and the host HPTE could end up read-write. Examination of the guest HPT in the failure situation revealed that there were indeed shared pages which should have been read-only that were mapped read-write. To close this race, we don't let a page go from being read-only to being read-write, as far as the real HPTE mapping the page is concerned (the guest view can go to read-write, but the actual mapping stays read-only). When the guest tries to write to the page, we take an HDSI and let kvmppc_book3s_hv_page_fault take care of providing a writable HPTE for the page. This eliminates the occasional corruption of shared pages that was previously seen with KSM active. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-15KVM: PPC: Book3S HV: Fix an issue where guest is paused on receiving HMIMahesh Salgaonkar
When we get an HMI (hypervisor maintenance interrupt) while in a guest, we see that guest enters into paused state. The reason is, in kvmppc_handle_exit_hv it falls through default path and returns to host instead of resuming guest. This causes guest to enter into paused state. HMI is a hypervisor only interrupt and it is safe to resume the guest since the host has handled it already. This patch adds a switch case to resume the guest. Without this patch we see guest entering into paused state with following console messages: [ 3003.329351] Severe Hypervisor Maintenance interrupt [Recovered] [ 3003.329356] Error detail: Timer facility experienced an error [ 3003.329359] HMER: 0840000000000000 [ 3003.329360] TFMR: 4a12000980a84000 [ 3003.329366] vcpu c0000007c35094c0 (40): [ 3003.329368] pc = c0000000000c2ba0 msr = 8000000000009032 trap = e60 [ 3003.329370] r 0 = c00000000021ddc0 r16 = 0000000000000046 [ 3003.329372] r 1 = c00000007a02bbd0 r17 = 00003ffff27d5d98 [ 3003.329375] r 2 = c0000000010980b8 r18 = 00001fffffc9a0b0 [ 3003.329377] r 3 = c00000000142d6b8 r19 = c00000000142d6b8 [ 3003.329379] r 4 = 0000000000000002 r20 = 0000000000000000 [ 3003.329381] r 5 = c00000000524a110 r21 = 0000000000000000 [ 3003.329383] r 6 = 0000000000000001 r22 = 0000000000000000 [ 3003.329386] r 7 = 0000000000000000 r23 = c00000000524a110 [ 3003.329388] r 8 = 0000000000000000 r24 = 0000000000000001 [ 3003.329391] r 9 = 0000000000000001 r25 = c00000007c31da38 [ 3003.329393] r10 = c0000000014280b8 r26 = 0000000000000002 [ 3003.329395] r11 = 746f6f6c2f68656c r27 = c00000000524a110 [ 3003.329397] r12 = 0000000028004484 r28 = c00000007c31da38 [ 3003.329399] r13 = c00000000fe01400 r29 = 0000000000000002 [ 3003.329401] r14 = 0000000000000046 r30 = c000000003011e00 [ 3003.329403] r15 = ffffffffffffffba r31 = 0000000000000002 [ 3003.329404] ctr = c00000000041a670 lr = c000000000272520 [ 3003.329405] srr0 = c00000000007e8d8 srr1 = 9000000000001002 [ 3003.329406] sprg0 = 0000000000000000 sprg1 = c00000000fe01400 [ 3003.329407] sprg2 = c00000000fe01400 sprg3 = 0000000000000005 [ 3003.329408] cr = 48004482 xer = 2000000000000000 dsisr = 42000000 [ 3003.329409] dar = 0000010015020048 [ 3003.329410] fault dar = 0000010015020048 dsisr = 42000000 [ 3003.329411] SLB (8 entries): [ 3003.329412] ESID = c000000008000000 VSID = 40016e7779000510 [ 3003.329413] ESID = d000000008000001 VSID = 400142add1000510 [ 3003.329414] ESID = f000000008000004 VSID = 4000eb1a81000510 [ 3003.329415] ESID = 00001f000800000b VSID = 40004fda0a000d90 [ 3003.329416] ESID = 00003f000800000c VSID = 400039f536000d90 [ 3003.329417] ESID = 000000001800000d VSID = 0001251b35150d90 [ 3003.329417] ESID = 000001000800000e VSID = 4001e46090000d90 [ 3003.329418] ESID = d000080008000019 VSID = 40013d349c000400 [ 3003.329419] lpcr = c048800001847001 sdr1 = 0000001b19000006 last_inst = ffffffff [ 3003.329421] trap=0xe60 | pc=0xc0000000000c2ba0 | msr=0x8000000000009032 [ 3003.329524] Severe Hypervisor Maintenance interrupt [Recovered] [ 3003.329526] Error detail: Timer facility experienced an error [ 3003.329527] HMER: 0840000000000000 [ 3003.329527] TFMR: 4a12000980a94000 [ 3006.359786] Severe Hypervisor Maintenance interrupt [Recovered] [ 3006.359792] Error detail: Timer facility experienced an error [ 3006.359795] HMER: 0840000000000000 [ 3006.359797] TFMR: 4a12000980a84000 Id Name State ---------------------------------------------------- 2 guest2 running 3 guest3 paused 4 guest4 running Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-15KVM: PPC: Book3S HV: Fix computation of tlbie operandPaul Mackerras
The B (segment size) field in the RB operand for the tlbie instruction is two bits, which we get from the top two bits of the first doubleword of the HPT entry to be invalidated. These bits go in bits 8 and 9 of the RB operand (bits 54 and 55 in IBM bit numbering). The compute_tlbie_rb() function gets these bits as v >> (62 - 8), which is not correct as it will bring in the top 10 bits, not just the top two. These extra bits could corrupt the AP, AVAL and L fields in the RB value. To fix this we shift right 62 bits and then shift left 8 bits, so we only get the two bits of the B field. The first doubleword of the HPT entry is under the control of the guest kernel. In fact, Linux guests will always put zeroes in bits 54 -- 61 (IBM bits 2 -- 9), but we should not rely on guests doing this. Signed-off-by: Paul Mackerras <paulus@samba.org> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-15KVM: PPC: Book3S HV: Add missing HPTE unlockAneesh Kumar K.V
In kvm_test_clear_dirty(), if we find an invalid HPTE we move on to the next HPTE without unlocking the invalid one. In fact we should never find an invalid and unlocked HPTE in the rmap chain, but for robustness we should unlock it. This adds the missing unlock. Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-15KVM: PPC: BookE: Improve irq inject tracepointAlexander Graf
When injecting an IRQ, we only document which IRQ priority (which translates to IRQ type) gets injected. However, when reading traces you don't necessarily have all the numbers in your head to know which IRQ really is meant. This patch converts the IRQ number field to a symbolic name that is in sync with the respective define. That way it's a lot easier for readers to figure out what interrupt gets injected. Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-15powerpc/powernv: Expose OPAL firmware symbol mapBenjamin Herrenschmidt
Newer versions of OPAL will provide this, so let's expose it to user space so tools like perf can use it to properly decode samples in firmware space. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-14Merge tag 'driver-core-3.19-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core update from Greg KH: "Here's the set of driver core patches for 3.19-rc1. They are dominated by the removal of the .owner field in platform drivers. They touch a lot of files, but they are "simple" changes, just removing a line in a structure. Other than that, a few minor driver core and debugfs changes. There are some ath9k patches coming in through this tree that have been acked by the wireless maintainers as they relied on the debugfs changes. Everything has been in linux-next for a while" * tag 'driver-core-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (324 commits) Revert "ath: ath9k: use debugfs_create_devm_seqfile() helper for seq_file entries" fs: debugfs: add forward declaration for struct device type firmware class: Deletion of an unnecessary check before the function call "vunmap" firmware loader: fix hung task warning dump devcoredump: provide a one-way disable function device: Add dev_<level>_once variants ath: ath9k: use debugfs_create_devm_seqfile() helper for seq_file entries ath: use seq_file api for ath9k debugfs files debugfs: add helper function to create device related seq_file drivers/base: cacheinfo: remove noisy error boot message Revert "core: platform: add warning if driver has no owner" drivers: base: support cpu cache information interface to userspace via sysfs drivers: base: add cpu_device_create to support per-cpu devices topology: replace custom attribute macros with standard DEVICE_ATTR* cpumask: factor out show_cpumap into separate helper function driver core: Fix unbalanced device reference in drivers_probe driver core: fix race with userland in device_add() sysfs/kernfs: make read requests on pre-alloc files use the buffer. sysfs/kernfs: allow attributes to request write buffer be pre-allocated. fs: sysfs: return EGBIG on write if offset is larger than file size ...
2014-12-15powernv/powerpc: Add winkle support for offline cpusShreyas B. Prabhu
Winkle is a deep idle state supported in power8 chips. A core enters winkle when all the threads of the core enter winkle. In this state power supply to the entire chiplet i.e core, private L2 and private L3 is turned off. As a result it gives higher powersavings compared to sleep. But entering winkle results in a total hypervisor state loss. Hence the hypervisor context has to be preserved before entering winkle and restored upon wake up. Power-on Reset Engine (PORE) is a dedicated engine which is responsible for powering on the chiplet during wake up. It can be programmed to restore the register contests of a few specific registers. This patch uses PORE to restore register state wherever possible and uses stack to save and restore rest of the necessary registers. With hypervisor state restore things fall under three categories- per-core state, per-subcore state and per-thread state. To manage this, extend the infrastructure introduced for sleep. Mainly we add a paca variable subcore_sibling_mask. Using this and the core_idle_state we can distingush first thread in core and subcore. Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-15powernv/cpuidle: Redesign idle states managementShreyas B. Prabhu
Deep idle states like sleep and winkle are per core idle states. A core enters these states only when all the threads enter either the particular idle state or a deeper one. There are tasks like fastsleep hardware bug workaround and hypervisor core state save which have to be done only by the last thread of the core entering deep idle state and similarly tasks like timebase resync, hypervisor core register restore that have to be done only by the first thread waking up from these state. The current idle state management does not have a way to distinguish the first/last thread of the core waking/entering idle states. Tasks like timebase resync are done for all the threads. This is not only is suboptimal, but can cause functionality issues when subcores and kvm is involved. This patch adds the necessary infrastructure to track idle states of threads in a per-core structure. It uses this info to perform tasks like fastsleep workaround and timebase resync only once per core. Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Originally-by: Preeti U. Murthy <preeti@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: linux-pm@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-15powerpc/powernv: Enable Offline CPUs to enter deep idle statesShreyas B. Prabhu
The secondary threads should enter deep idle states so as to gain maximum powersavings when the entire core is offline. To do so the offline path must be made aware of the available deepest idle state. Hence probe the device tree for the possible idle states in powernv core code and expose the deepest idle state through flags. Since the device tree is probed by the cpuidle driver as well, move the parameters required to discover the idle states into an appropriate common place to both the driver and the powernv core code. Another point is that fastsleep idle state may require workarounds in the kernel to function properly. This workaround is introduced in the subsequent patches. However neither the cpuidle driver or the hotplug path need be bothered about this workaround. They will be taken care of by the core powernv code. Originally-by: Srivatsa S. Bhat <srivatsa@mit.edu> Signed-off-by: Preeti U. Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Reviewed-by: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: linux-pm@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-15powerpc/powernv: Switch off MMU before entering nap/sleep/rvwinkle modePaul Mackerras
Currently, when going idle, we set the flag indicating that we are in nap mode (paca->kvm_hstate.hwthread_state) and then execute the nap (or sleep or rvwinkle) instruction, all with the MMU on. This is bad for two reasons: (a) the architecture specifies that those instructions must be executed with the MMU off, and in fact with only the SF, HV, ME and possibly RI bits set, and (b) this introduces a race, because as soon as we set the flag, another thread can switch the MMU to a guest context. If the race is lost, this thread will typically start looping on relocation-on ISIs at 0xc...4400. This fixes it by setting the MSR as required by the architecture before setting the flag or executing the nap/sleep/rvwinkle instruction. Cc: stable@vger.kernel.org [ shreyas@linux.vnet.ibm.com: Edited to handle LE ] Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-14i2c: Driver to expose PowerNV platform i2c bussesNeelesh Gupta
The patch exposes the available i2c busses on the PowerNV platform to the kernel and implements the bus driver to support i2c and smbus commands. The driver uses the platform device infrastructure to probe the busses on the platform and registers them with the i2c driver framework. Signed-off-by: Neelesh Gupta <neelegup@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Wolfram Sang <wsa@the-dreams.de> (I2C part, excluding the bindings) Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds
Pull crypto update from Herbert Xu: - The crypto API is now documented :) - Disallow arbitrary module loading through crypto API. - Allow get request with empty driver name through crypto_user. - Allow speed testing of arbitrary hash functions. - Add caam support for ctr(aes), gcm(aes) and their derivatives. - nx now supports concurrent hashing properly. - Add sahara support for SHA1/256. - Add ARM64 version of CRC32. - Misc fixes. * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (77 commits) crypto: tcrypt - Allow speed testing of arbitrary hash functions crypto: af_alg - add user space interface for AEAD crypto: qat - fix problem with coalescing enable logic crypto: sahara - add support for SHA1/256 crypto: sahara - replace tasklets with kthread crypto: sahara - add support for i.MX53 crypto: sahara - fix spinlock initialization crypto: arm - replace memset by memzero_explicit crypto: powerpc - replace memset by memzero_explicit crypto: sha - replace memset by memzero_explicit crypto: sparc - replace memset by memzero_explicit crypto: algif_skcipher - initialize upon init request crypto: algif_skcipher - removed unneeded code crypto: algif_skcipher - Fixed blocking recvmsg crypto: drbg - use memzero_explicit() for clearing sensitive data crypto: drbg - use MODULE_ALIAS_CRYPTO crypto: include crypto- module prefix in template crypto: user - add MODULE_ALIAS crypto: sha-mb - remove a bogus NULL check crytpo: qat - Fix 64 bytes requests ...
2014-12-13Merge branch 'akpm' (second patch-bomb from Andrew)Linus Torvalds
Merge second patchbomb from Andrew Morton: - the rest of MM - misc fs fixes - add execveat() syscall - new ratelimit feature for fault-injection - decompressor updates - ipc/ updates - fallocate feature creep - fsnotify cleanups - a few other misc things * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (99 commits) cgroups: Documentation: fix trivial typos and wrong paragraph numberings parisc: percpu: update comments referring to __get_cpu_var percpu: update local_ops.txt to reflect this_cpu operations percpu: remove __get_cpu_var and __raw_get_cpu_var macros fsnotify: remove destroy_list from fsnotify_mark fsnotify: unify inode and mount marks handling fallocate: create FAN_MODIFY and IN_MODIFY events mm/cma: make kmemleak ignore CMA regions slub: fix cpuset check in get_any_partial slab: fix cpuset check in fallback_alloc shmdt: use i_size_read() instead of ->i_size ipc/shm.c: fix overly aggressive shmdt() when calls span multiple segments ipc/msg: increase MSGMNI, remove scaling ipc/sem.c: increase SEMMSL, SEMMNI, SEMOPM ipc/sem.c: change memory barrier in sem_lock() to smp_rmb() lib/decompress.c: consistency of compress formats for kernel image decompress_bunzip2: off by one in get_next_block() usr/Kconfig: make initrd compression algorithm selection not expert fault-inject: add ratelimit option ratelimit: add initialization macro ...
2014-12-13gcov: enable GCOV_PROFILE_ALL from ARCH KconfigsRiku Voipio
Following the suggestions from Andrew Morton and Stephen Rothwell, Dont expand the ARCH list in kernel/gcov/Kconfig. Instead, define a ARCH_HAS_GCOV_PROFILE_ALL bool which architectures can enable. set ARCH_HAS_GCOV_PROFILE_ALL on Architectures where it was previously allowed + ARM64 which I tested. Signed-off-by: Riku Voipio <riku.voipio@linaro.org> Cc: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13mm/debug-pagealloc: make debug-pagealloc boottime configurableJoonsoo Kim
Now, we have prepared to avoid using debug-pagealloc in boottime. So introduce new kernel-parameter to disable debug-pagealloc in boottime, and makes related functions to be disabled in this case. Only non-intuitive part is change of guard page functions. Because guard page is effective only if debug-pagealloc is enabled, turning off according to debug-pagealloc is reasonable thing to do. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Dave Hansen <dave@sr71.net> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Jungsoo Son <jungsoo.son@lge.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull another networking update from David Miller: "Small follow-up to the main merge pull from the other day: 1) Alexander Duyck's DMA memory barrier patch set. 2) cxgb4 driver fixes from Karen Xie. 3) Add missing export of fixed_phy_register() to modules, from Mark Salter. 4) DSA bug fixes from Florian Fainelli" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (24 commits) net/macb: add TX multiqueue support for gem linux/interrupt.h: remove the definition of unused tasklet_hi_enable jme: replace calls to redundant function net: ethernet: davicom: Allow to select DM9000 for nios2 net: ethernet: smsc: Allow to select SMC91X for nios2 cxgb4: Add support for QSA modules libcxgbi: fix freeing skb prematurely cxgb4i: use set_wr_txq() to set tx queues cxgb4i: handle non-pdu-aligned rx data cxgb4i: additional types of negative advice cxgb4/cxgb4i: set the max. pdu length in firmware cxgb4i: fix credit check for tx_data_wr cxgb4i: fix tx immediate data credit check net: phy: export fixed_phy_register() fib_trie: Fix trie balancing issue if new node pushes down existing node vlan: Add ability to always enable TSO/UFO r8169:update rtl8168g pcie ephy parameter net: dsa: bcm_sf2: force link for all fixed PHY devices fm10k/igb/ixgbe: Use dma_rmb on Rx descriptor reads r8169: Use dma_rmb() and dma_wmb() for DescOwn checks ...
2014-12-12Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree update from Jiri Kosina: "Usual stuff: documentation updates, printk() fixes, etc" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (24 commits) intel_ips: fix a type in error message cpufreq: cpufreq-dt: Move newline to end of error message ps3rom: fix error return code treewide: fix typo in printk and Kconfig ARM: dts: bcm63138: change "interupts" to "interrupts" Replace mentions of "list_struct" to "list_head" kernel: trace: fix printk message scsi: mpt2sas: fix ioctl in comment zbud, zswap: change module author email clocksource: Fix 'clcoksource' typo in comment arm: fix wording of "Crotex" in CONFIG_ARCH_EXYNOS3 help gpio: msm-v1: make boolean argument more obvious usb: Fix typo in usb-serial-simple.c PCI: Fix comment typo 'COMFIG_PM_OPS' powerpc: Fix comment typo 'CONIFG_8xx' powerpc: Fix comment typos 'CONFiG_ALTIVEC' clk: st: Spelling s/stucture/structure/ isci: Spelling s/stucture/structure/ usb: gadget: zero: Spelling s/infrastucture/infrastructure/ treewide: Fix company name in module descriptions ...
2014-12-12powerpc: add little endian flag to syscall_get_arch()Richard Guy Briggs
Since both ppc and ppc64 have LE variants which are now reported by uname, add that flag (__AUDIT_ARCH_LE) to syscall_get_arch() and add AUDIT_ARCH_PPC64LE variant. Without this, perf trace and auditctl fail. Mainline kernel reports ppc64le (per a058801) but there is no matching AUDIT_ARCH_PPC64LE. Since 32-bit PPC LE is not supported by audit, don't advertise it in AUDIT_ARCH_PPC* variants. See: https://www.redhat.com/archives/linux-audit/2014-August/msg00082.html https://www.redhat.com/archives/linux-audit/2014-December/msg00004.html Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-12power/perf/hv-24x7: Use kmem_cache_free() instead of kfreeSukadev Bhattiprolu
Use kmem_cache_free() to free a buffer allocated with kmem_cache_alloc(). Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-12powerpc/perf/hv-24x7: Use per-cpu page buffersukadev@linux.vnet.ibm.com
The 24x7 counters are continuously running and not updated on an interrupt. So we record the event counts when stopping the event or deleting it. But to "read" a single counter in 24x7, we allocate a page and pass it into the hypervisor (The HV returns the page full of counters from which we extract the specific counter for this event). We allocate a page using GFP_USER and when deleting the event, we end up with the following warning because we are blocking in interrupt context. [ 698.641709] BUG: scheduling while atomic: swapper/0/0/0x10010000 We could use GFP_ATOMIC but that could result in failures. Pre-allocate a buffer so we don't have to allocate in interrupt context. Further as Michael Ellerman suggested, use Per-CPU buffer so we only need to allocate once per CPU. Cc: stable@vger.kernel.org Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-11arch: Add lightweight memory barriers dma_rmb() and dma_wmb()Alexander Duyck
There are a number of situations where the mandatory barriers rmb() and wmb() are used to order memory/memory operations in the device drivers and those barriers are much heavier than they actually need to be. For example in the case of PowerPC wmb() calls the heavy-weight sync instruction when for coherent memory operations all that is really needed is an lsync or eieio instruction. This commit adds a coherent only version of the mandatory memory barriers rmb() and wmb(). In most cases this should result in the barrier being the same as the SMP barriers for the SMP case, however in some cases we use a barrier that is somewhere in between rmb() and smp_rmb(). For example on ARM the rmb barriers break down as follows: Barrier Call Explanation --------- -------- ---------------------------------- rmb() dsb() Data synchronization barrier - system dma_rmb() dmb(osh) data memory barrier - outer sharable smp_rmb() dmb(ish) data memory barrier - inner sharable These new barriers are not as safe as the standard rmb() and wmb(). Specifically they do not guarantee ordering between coherent and incoherent memories. The primary use case for these would be to enforce ordering of reads and writes when accessing coherent memory that is shared between the CPU and a device. It may also be noted that there is no dma_mb(). Most architectures don't provide a good mechanism for performing a coherent only full barrier without resorting to the same mechanism used in mb(). As such there isn't much to be gained in trying to define such a function. Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Michael Ellerman <michael@ellerman.id.au> Cc: Michael Neuling <mikey@neuling.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: David Miller <davem@davemloft.net> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-11arch: Cleanup read_barrier_depends() and commentsAlexander Duyck
This patch is meant to cleanup the handling of read_barrier_depends and smp_read_barrier_depends. In multiple spots in the kernel headers read_barrier_depends is defined as "do {} while (0)", however we then go into the SMP vs non-SMP sections and have the SMP version reference read_barrier_depends, and the non-SMP define it as yet another empty do/while. With this commit I went through and cleaned out the duplicate definitions and reduced the number of definitions down to 2 per header. In addition I moved the 50 line comments for the macro from the x86 and mips headers that defined it as an empty do/while to those that were actually defining the macro, alpha and blackfin. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-11Merge tag 'powerpc-3.19-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux Pull powerpc updates from Michael Ellerman: "Some nice cleanups like removing bootmem, and removal of __get_cpu_var(). There is one patch to mm/gup.c. This is the generic GUP implementation, but is only used by us and arm(64). We have an ack from Steve Capper, and although we didn't get an ack from Andrew he told us to take the patch through the powerpc tree. There's one cxl patch. This is in drivers/misc, but Greg said he was happy for us to manage fixes for it. There is an infrastructure patch to support an IPMI driver for OPAL. There is also an RTC driver for OPAL. We weren't able to get any response from the RTC maintainer, Alessandro Zummo, so in the end we just merged the driver. The usual batch of Freescale updates from Scott" * tag 'powerpc-3.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: (101 commits) powerpc/powernv: Return to cpu offline loop when finished in KVM guest powerpc/book3s: Fix partial invalidation of TLBs in MCE code. powerpc/mm: don't do tlbie for updatepp request with NO HPTE fault powerpc/xmon: Cleanup the breakpoint flags powerpc/xmon: Enable HW instruction breakpoint on POWER8 powerpc/mm/thp: Use tlbiel if possible powerpc/mm/thp: Remove code duplication powerpc/mm/hugetlb: Sanity check gigantic hugepage count powerpc/oprofile: Disable pagefaults during user stack read powerpc/mm: Check for matching hpte without taking hpte lock powerpc: Drop useless warning in eeh_init() powerpc/powernv: Cleanup unused MCE definitions/declarations. powerpc/eeh: Dump PHB diag-data early powerpc/eeh: Recover EEH error on ownership change for BCM5719 powerpc/eeh: Set EEH_PE_RESET on PE reset powerpc/eeh: Refactor eeh_reset_pe() powerpc: Remove more traces of bootmem powerpc/pseries: Initialise nvram_pstore_info's buf_lock cxl: Name interrupts in /proc/interrupt cxl: Return error to PSL if IRQ demultiplexing fails & print clearer warning ...
2014-12-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: 1) New offloading infrastructure and example 'rocker' driver for offloading of switching and routing to hardware. This work was done by a large group of dedicated individuals, not limited to: Scott Feldman, Jiri Pirko, Thomas Graf, John Fastabend, Jamal Hadi Salim, Andy Gospodarek, Florian Fainelli, Roopa Prabhu 2) Start making the networking operate on IOV iterators instead of modifying iov objects in-situ during transfers. Thanks to Al Viro and Herbert Xu. 3) A set of new netlink interfaces for the TIPC stack, from Richard Alpe. 4) Remove unnecessary looping during ipv6 routing lookups, from Martin KaFai Lau. 5) Add PAUSE frame generation support to gianfar driver, from Matei Pavaluca. 6) Allow for larger reordering levels in TCP, which are easily achievable in the real world right now, from Eric Dumazet. 7) Add a variable of napi_schedule that doesn't need to disable cpu interrupts, from Eric Dumazet. 8) Use a doubly linked list to optimize neigh_parms_release(), from Nicolas Dichtel. 9) Various enhancements to the kernel BPF verifier, and allow eBPF programs to actually be attached to sockets. From Alexei Starovoitov. 10) Support TSO/LSO in sunvnet driver, from David L Stevens. 11) Allow controlling ECN usage via routing metrics, from Florian Westphal. 12) Remote checksum offload, from Tom Herbert. 13) Add split-header receive, BQL, and xmit_more support to amd-xgbe driver, from Thomas Lendacky. 14) Add MPLS support to openvswitch, from Simon Horman. 15) Support wildcard tunnel endpoints in ipv6 tunnels, from Steffen Klassert. 16) Do gro flushes on a per-device basis using a timer, from Eric Dumazet. This tries to resolve the conflicting goals between the desired handling of bulk vs. RPC-like traffic. 17) Allow userspace to ask for the CPU upon what a packet was received/steered, via SO_INCOMING_CPU. From Eric Dumazet. 18) Limit GSO packets to half the current congestion window, from Eric Dumazet. 19) Add a generic helper so that all drivers set their RSS keys in a consistent way, from Eric Dumazet. 20) Add xmit_more support to enic driver, from Govindarajulu Varadarajan. 21) Add VLAN packet scheduler action, from Jiri Pirko. 22) Support configurable RSS hash functions via ethtool, from Eyal Perry. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1820 commits) Fix race condition between vxlan_sock_add and vxlan_sock_release net/macb: fix compilation warning for print_hex_dump() called with skb->mac_header net/mlx4: Add support for A0 steering net/mlx4: Refactor QUERY_PORT net/mlx4_core: Add explicit error message when rule doesn't meet configuration net/mlx4: Add A0 hybrid steering net/mlx4: Add mlx4_bitmap zone allocator net/mlx4: Add a check if there are too many reserved QPs net/mlx4: Change QP allocation scheme net/mlx4_core: Use tasklet for user-space CQ completion events net/mlx4_core: Mask out host side virtualization features for guests net/mlx4_en: Set csum level for encapsulated packets be2net: Export tunnel offloads only when a VxLAN tunnel is created gianfar: Fix dma check map error when DMA_API_DEBUG is enabled cxgb4/csiostor: Don't use MASTER_MUST for fw_hello call net: fec: only enable mdio interrupt before phy device link up net: fec: clear all interrupt events to support i.MX6SX net: fec: reset fep link status in suspend function net: sock: fix access via invalid file descriptor net: introduce helper macro for_each_cmsghdr ...
2014-12-11Merge tag 'devicetree-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/glikely/linux Pull devicetree changes from Grant Likely: "Lots of activity in the devicetree code for v3.18. Most of it is related to getting all of the overlay support code in place, but there are other important things in there. Highlights: - OF_RECONFIG notifiers for SPI, I2C and Platform devices. Those subsystems can now respond to live changes to the device tree. - CONFIG_OF_OVERLAY method for applying live changes to the device tree - Removal of the of_allnodes list. This used to be used to iterate over all the nodes in the device tree, but it is unnecessary because the same thing can be done by iterating over the list of child pointers. Getting rid of of_allnodes saves some memory and avoids the possibility of of_allnodes being sorted differently from the child lists. - Support for retrieving original DTB blob via sysfs. Needed by kexec. - More unittests - Documentation and minor bug fixes" * tag 'devicetree-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/glikely/linux: (42 commits) of: Delete unnecessary check before calling "of_node_put()" of: Drop ->next pointer from struct device_node spi: Check for spi_of_notifier when CONFIG_OF_DYNAMIC=y of: support passing console options with stdout-path of: add optional options parameter to of_find_node_by_path() of: Add bindings for chosen node, stdout-path of: Remove unneeded and incorrect MODULE_DEVICE_TABLE ARM: dt: fix up PL011 device tree bindings of: base, fix of_property_read_string_helper kernel-doc of: remove select of non-existant OF_DEVICE config symbol spi/of: Add OF notifier handler spi/of: Create new device registration method and accessors i2c/of: Add OF_RECONFIG notifier handler i2c/of: Factor out Devicetree registration code of/overlay: Add overlay unittests of/overlay: Introduce DT overlay support of/reconfig: Add OF_DYNAMIC notifier for platform_bus_type of/reconfig: Always use the same structure for notifiers of/reconfig: Add debug output for OF_RECONFIG notifiers of/reconfig: Add empty stubs for the of_reconfig methods ...
2014-12-10Merge tag 'trace-3.19' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "There was a lot of clean ups and minor fixes. One of those clean ups was to the trace_seq code. It also removed the return values to the trace_seq_*() functions and use trace_seq_has_overflowed() to see if the buffer filled up or not. This is similar to work being done to the seq_file code as well in another tree. Some of the other goodies include: - Added some "!" (NOT) logic to the tracing filter. - Fixed the frame pointer logic to the x86_64 mcount trampolines - Added the logic for dynamic trampolines on !CONFIG_PREEMPT systems. That is, the ftrace trampoline can be dynamically allocated and be called directly by functions that only have a single hook to them" * tag 'trace-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (55 commits) tracing: Truncated output is better than nothing tracing: Add additional marks to signal very large time deltas Documentation: describe trace_buf_size parameter more accurately tracing: Allow NOT to filter AND and OR clauses tracing: Add NOT to filtering logic ftrace/fgraph/x86: Have prepare_ftrace_return() take ip as first parameter ftrace/x86: Get rid of ftrace_caller_setup ftrace/x86: Have save_mcount_regs macro also save stack frames if needed ftrace/x86: Add macro MCOUNT_REG_SIZE for amount of stack used to save mcount regs ftrace/x86: Simplify save_mcount_regs on getting RIP ftrace/x86: Have save_mcount_regs store RIP in %rdi for first parameter ftrace/x86: Rename MCOUNT_SAVE_FRAME and add more detailed comments ftrace/x86: Move MCOUNT_SAVE_FRAME out of header file ftrace/x86: Have static tracing also use ftrace_caller_setup ftrace/x86: Have static function tracing always test for function graph kprobes: Add IPMODIFY flag to kprobe_ftrace_ops ftrace, kprobes: Support IPMODIFY flag to find IP modify conflict kprobes/ftrace: Recover original IP if pre_handler doesn't change it tracing/trivial: Fix typos and make an int into a bool tracing: Deletion of an unnecessary check before iput() ...
2014-12-10Merge branch 'akpm' (patchbomb from Andrew)Linus Torvalds
Merge first patchbomb from Andrew Morton: - a few minor cifs fixes - dma-debug upadtes - ocfs2 - slab - about half of MM - procfs - kernel/exit.c - panic.c tweaks - printk upates - lib/ updates - checkpatch updates - fs/binfmt updates - the drivers/rtc tree - nilfs - kmod fixes - more kernel/exit.c - various other misc tweaks and fixes * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (190 commits) exit: pidns: fix/update the comments in zap_pid_ns_processes() exit: pidns: alloc_pid() leaks pid_namespace if child_reaper is exiting exit: exit_notify: re-use "dead" list to autoreap current exit: reparent: call forget_original_parent() under tasklist_lock exit: reparent: avoid find_new_reaper() if no children exit: reparent: introduce find_alive_thread() exit: reparent: introduce find_child_reaper() exit: reparent: document the ->has_child_subreaper checks exit: reparent: s/while_each_thread/for_each_thread/ in find_new_reaper() exit: reparent: fix the cross-namespace PR_SET_CHILD_SUBREAPER reparenting exit: reparent: fix the dead-parent PR_SET_CHILD_SUBREAPER reparenting exit: proc: don't try to flush /proc/tgid/task/tgid exit: release_task: fix the comment about group leader accounting exit: wait: drop tasklist_lock before psig->c* accounting exit: wait: don't use zombie->real_parent exit: wait: cleanup the ptrace_reparented() checks usermodehelper: kill the kmod_thread_locker logic usermodehelper: don't use CLONE_VFORK for ____call_usermodehelper() fs/hfs/catalog.c: fix comparison bug in hfs_cat_keycmp nilfs2: fix the nilfs_iget() vs. nilfs_new_inode() races ...
2014-12-10ppc/cell: replace get_unused_fd() with get_unused_fd_flags(0)Yann Droneaud
This patch replaces calls to get_unused_fd() with equivalent call to get_unused_fd_flags(0) to preserve current behavor for existing code. In a further patch, get_unused_fd() will be removed so that new code start using get_unused_fd_flags(), with the hope O_CLOEXEC could be used, either by default or choosen by userspace. Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10mm: fix huge zero page accounting in smaps reportKirill A. Shutemov
As a small zero page, huge zero page should not be accounted in smaps report as normal page. For small pages we rely on vm_normal_page() to filter out zero page, but vm_normal_page() is not designed to handle pmds. We only get here due hackish cast pmd to pte in smaps_pte_range() -- pte and pmd format is not necessary compatible on each and every architecture. Let's add separate codepath to handle pmds. follow_trans_huge_pmd() will detect huge zero page for us. We would need pmd_dirty() helper to do this properly. The patch adds it to THP-enabled architectures which don't yet have one. [akpm@linux-foundation.org: use do_div to fix 32-bit build] Signed-off-by: "Kirill A. Shutemov" <kirill@shutemov.name> Reported-by: Fengguang Wu <fengguang.wu@intel.com> Tested-by: Fengwei Yin <yfw.kernel@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>