aboutsummaryrefslogtreecommitdiffstats
path: root/arch/powerpc/kernel/asm-offsets.c
AgeCommit message (Collapse)Author
2014-01-09KVM: PPC: Store FP/VSX/VMX state in thread_fp/vr_state structuresPaul Mackerras
This uses struct thread_fp_state and struct thread_vr_state to store the floating-point, VMX/Altivec and VSX state, rather than flat arrays. This makes transferring the state to/from the thread_struct simpler and allows us to unify the get/set_one_reg implementations for the VSX registers. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-12-18powerpc: book3s: kvm: Don't abuse host r2 in exit pathAneesh Kumar K.V
We don't use PACATOC for PR. Avoid updating HOST_R2 with PR KVM mode when both HV and PR are enabled in the kernel. Without this we get the below crash (qemu) Unable to handle kernel paging request for data at address 0xffffffffffff8310 Faulting instruction address: 0xc00000000001d5a4 cpu 0x2: Vector: 300 (Data Access) at [c0000001dc53aef0] pc: c00000000001d5a4: .vtime_delta.isra.1+0x34/0x1d0 lr: c00000000001d760: .vtime_account_system+0x20/0x60 sp: c0000001dc53b170 msr: 8000000000009032 dar: ffffffffffff8310 dsisr: 40000000 current = 0xc0000001d76c62d0 paca = 0xc00000000fef1100 softe: 0 irq_happened: 0x01 pid = 4472, comm = qemu-system-ppc enter ? for help [c0000001dc53b200] c00000000001d760 .vtime_account_system+0x20/0x60 [c0000001dc53b290] c00000000008d050 .kvmppc_handle_exit_pr+0x60/0xa50 [c0000001dc53b340] c00000000008f51c kvm_start_lightweight+0xb4/0xc4 [c0000001dc53b510] c00000000008cdf0 .kvmppc_vcpu_run_pr+0x150/0x2e0 [c0000001dc53b9e0] c00000000008341c .kvmppc_vcpu_run+0x2c/0x40 [c0000001dc53ba50] c000000000080af4 .kvm_arch_vcpu_ioctl_run+0x54/0x1b0 [c0000001dc53bae0] c00000000007b4c8 .kvm_vcpu_ioctl+0x478/0x730 [c0000001dc53bca0] c0000000002140cc .do_vfs_ioctl+0x4ac/0x770 [c0000001dc53bd80] c0000000002143e8 .SyS_ioctl+0x58/0xb0 [c0000001dc53be30] c000000000009e58 syscall_exit+0x0/0x98 Signed-off-by: Alexander Graf <agraf@suse.de>
2013-12-05powerpc/book3s: handle machine check in Linux host.Mahesh Salgaonkar
Move machine check entry point into Linux. So far we were dependent on firmware to decode MCE error details and handover the high level info to OS. This patch introduces early machine check routine that saves the MCE information (srr1, srr0, dar and dsisr) to the emergency stack. We allocate stack frame on emergency stack and set the r1 accordingly. This allows us to be prepared to take another exception without loosing context. One thing to note here that, if we get another machine check while ME bit is off then we risk a checkstop. Hence we restrict ourselves to save only MCE information and register saved on PACA_EXMC save are before we turn the ME bit on. We use paca->in_mce flag to differentiate between first entry and nested machine check entry which helps proper use of emergency stack. We increment paca->in_mce every time we enter in early machine check handler and decrement it while leaving. When we enter machine check early handler first time (paca->in_mce == 0), we are sure nobody is using MC emergency stack and allocate a stack frame at the start of the emergency stack. During subsequent entry (paca->in_mce > 0), we know that r1 points inside emergency stack and we allocate separate stack frame accordingly. This prevents us from clobbering MCE information during nested machine checks. The early machine check handler changes are placed under CPU_FTR_HVMODE section. This makes sure that the early machine check handler will get executed only in hypervisor kernel. This is the code flow: Machine Check Interrupt | V 0x200 vector ME=0, IR=0, DR=0 | V +-----------------------------------------------+ |machine_check_pSeries_early: | ME=0, IR=0, DR=0 | Alloc frame on emergency stack | | Save srr1, srr0, dar and dsisr on stack | +-----------------------------------------------+ | (ME=1, IR=0, DR=0, RFID) | V machine_check_handle_early ME=1, IR=0, DR=0 | V +-----------------------------------------------+ | machine_check_early (r3=pt_regs) | ME=1, IR=0, DR=0 | Things to do: (in next patches) | | Flush SLB for SLB errors | | Flush TLB for TLB errors | | Decode and save MCE info | +-----------------------------------------------+ | (Fall through existing exception handler routine.) | V machine_check_pSerie ME=1, IR=0, DR=0 | (ME=1, IR=1, DR=1, RFID) | V machine_check_common ME=1, IR=1, DR=1 . . . Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-11-15Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM changes from Paolo Bonzini: "Here are the 3.13 KVM changes. There was a lot of work on the PPC side: the HV and emulation flavors can now coexist in a single kernel is probably the most interesting change from a user point of view. On the x86 side there are nested virtualization improvements and a few bugfixes. ARM got transparent huge page support, improved overcommit, and support for big endian guests. Finally, there is a new interface to connect KVM with VFIO. This helps with devices that use NoSnoop PCI transactions, letting the driver in the guest execute WBINVD instructions. This includes some nVidia cards on Windows, that fail to start without these patches and the corresponding userspace changes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (146 commits) kvm, vmx: Fix lazy FPU on nested guest arm/arm64: KVM: PSCI: propagate caller endianness to the incoming vcpu arm/arm64: KVM: MMIO support for BE guest kvm, cpuid: Fix sparse warning kvm: Delete prototype for non-existent function kvm_check_iopl kvm: Delete prototype for non-existent function complete_pio hung_task: add method to reset detector pvclock: detect watchdog reset at pvclock read kvm: optimize out smp_mb after srcu_read_unlock srcu: API for barrier after srcu read unlock KVM: remove vm mmap method KVM: IOMMU: hva align mapping page size KVM: x86: trace cpuid emulation when called from emulator KVM: emulator: cleanup decode_register_operand() a bit KVM: emulator: check rex prefix inside decode_register() KVM: x86: fix emulation of "movzbl %bpl, %eax" kvm_host: typo fix KVM: x86: emulate SAHF instruction MAINTAINERS: add tree for kvm.git Documentation/kvm: add a 00-INDEX file ...
2013-11-04Merge branch 'kvm-ppc-queue' of git://github.com/agraf/linux-2.6 into queueGleb Natapov
Conflicts: arch/powerpc/include/asm/processor.h
2013-10-18powerpc: move debug registers in a structureBharat Bhushan
This way we can use same data type struct with KVM and also help in using other debug related function. Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> Acked-by: Michael Neuling <mikey@neuling.org> [scottwood@freescale.com: removed obvious debug_reg comment] Signed-off-by: Scott Wood <scottwood@freescale.com>
2013-10-17kvm: powerpc: book3s: Add a new config variable CONFIG_KVM_BOOK3S_HV_POSSIBLEAneesh Kumar K.V
This help ups to select the relevant code in the kernel code when we later move HV and PR bits as seperate modules. The patch also makes the config options for PR KVM selectable Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17kvm: powerpc: book3s: pr: Rename KVM_BOOK3S_PR to KVM_BOOK3S_PR_POSSIBLEAneesh Kumar K.V
With later patches supporting PR kvm as a kernel module, the changes that has to be built into the main kernel binary to enable PR KVM module is now selected via KVM_BOOK3S_PR_POSSIBLE Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17powerpc: move debug registers in a structureBharat Bhushan
This way we can use same data type struct with KVM and also help in using other debug related function. Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17KVM: PPC: Book3S PR: Keep volatile reg values in vcpu rather than shadow_vcpuPaul Mackerras
Currently PR-style KVM keeps the volatile guest register values (R0 - R13, CR, LR, CTR, XER, PC) in a shadow_vcpu struct rather than the main kvm_vcpu struct. For 64-bit, the shadow_vcpu exists in two places, a kmalloc'd struct and in the PACA, and it gets copied back and forth in kvmppc_core_vcpu_load/put(), because the real-mode code can't rely on being able to access the kmalloc'd struct. This changes the code to copy the volatile values into the shadow_vcpu as one of the last things done before entering the guest. Similarly the values are copied back out of the shadow_vcpu to the kvm_vcpu immediately after exiting the guest. We arrange for interrupts to be still disabled at this point so that we can't get preempted on 64-bit and end up copying values from the wrong PACA. This means that the accessor functions in kvm_book3s.h for these registers are greatly simplified, and are same between PR and HV KVM. In places where accesses to shadow_vcpu fields are now replaced by accesses to the kvm_vcpu, we can also remove the svcpu_get/put pairs. Finally, on 64-bit, we don't need the kmalloc'd struct at all any more. With this, the time to read the PVR one million times in a loop went from 567.7ms to 575.5ms (averages of 6 values), an increase of about 1.4% for this worse-case test for guest entries and exits. The standard deviation of the measurements is about 11ms, so the difference is only marginally significant statistically. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17KVM: PPC: Book3S HV: Support POWER6 compatibility mode on POWER7Paul Mackerras
This enables us to use the Processor Compatibility Register (PCR) on POWER7 to put the processor into architecture 2.05 compatibility mode when running a guest. In this mode the new instructions and registers that were introduced on POWER7 are disabled in user mode. This includes all the VSX facilities plus several other instructions such as ldbrx, stdbrx, popcntw, popcntd, etc. To select this mode, we have a new register accessible through the set/get_one_reg interface, called KVM_REG_PPC_ARCH_COMPAT. Setting this to zero gives the full set of capabilities of the processor. Setting it to one of the "logical" PVR values defined in PAPR puts the vcpu into the compatibility mode for the corresponding architecture level. The supported values are: 0x0f000002 Architecture 2.05 (POWER6) 0x0f000003 Architecture 2.06 (POWER7) 0x0f100003 Architecture 2.06+ (POWER7+) Since the PCR is per-core, the architecture compatibility level and the corresponding PCR value are stored in the struct kvmppc_vcore, and are therefore shared between all vcpus in a virtual core. Signed-off-by: Paul Mackerras <paulus@samba.org> [agraf: squash in fix to add missing break statements and documentation] Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17KVM: PPC: Book3S HV: Add support for guest Program Priority RegisterPaul Mackerras
POWER7 and later IBM server processors have a register called the Program Priority Register (PPR), which controls the priority of each hardware CPU SMT thread, and affects how fast it runs compared to other SMT threads. This priority can be controlled by writing to the PPR or by use of a set of instructions of the form or rN,rN,rN which are otherwise no-ops but have been defined to set the priority to particular levels. This adds code to context switch the PPR when entering and exiting guests and to make the PPR value accessible through the SET/GET_ONE_REG interface. When entering the guest, we set the PPR as late as possible, because if we are setting a low thread priority it will make the code run slowly from that point on. Similarly, the first-level interrupt handlers save the PPR value in the PACA very early on, and set the thread priority to the medium level, so that the interrupt handling code runs at a reasonable speed. Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17KVM: PPC: Book3S HV: Store LPCR value for each virtual corePaul Mackerras
This adds the ability to have a separate LPCR (Logical Partitioning Control Register) value relating to a guest for each virtual core, rather than only having a single value for the whole VM. This corresponds to what real POWER hardware does, where there is a LPCR per CPU thread but most of the fields are required to have the same value on all active threads in a core. The per-virtual-core LPCR can be read and written using the GET/SET_ONE_REG interface. Userspace can can only modify the following fields of the LPCR value: DPFD Default prefetch depth ILE Interrupt little-endian TC Translation control (secondary HPT hash group search disable) We still maintain a per-VM default LPCR value in kvm->arch.lpcr, which contains bits relating to memory management, i.e. the Virtualized Partition Memory (VPM) bits and the bits relating to guest real mode. When this default value is updated, the update needs to be propagated to the per-vcore values, so we add a kvmppc_update_lpcr() helper to do that. Signed-off-by: Paul Mackerras <paulus@samba.org> [agraf: fix whitespace] Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17KVM: PPC: Book3S HV: Implement timebase offset for guestsPaul Mackerras
This allows guests to have a different timebase origin from the host. This is needed for migration, where a guest can migrate from one host to another and the two hosts might have a different timebase origin. However, the timebase seen by the guest must not go backwards, and should go forwards only by a small amount corresponding to the time taken for the migration. Therefore this provides a new per-vcpu value accessed via the one_reg interface using the new KVM_REG_PPC_TB_OFFSET identifier. This value defaults to 0 and is not modified by KVM. On entering the guest, this value is added onto the timebase, and on exiting the guest, it is subtracted from the timebase. This is only supported for recent POWER hardware which has the TBU40 (timebase upper 40 bits) register. Writing to the TBU40 register only alters the upper 40 bits of the timebase, leaving the lower 24 bits unchanged. This provides a way to modify the timebase for guest migration without disturbing the synchronization of the timebase registers across CPU cores. The kernel rounds up the value given to a multiple of 2^24. Timebase values stored in KVM structures (struct kvm_vcpu, struct kvmppc_vcore, etc.) are stored as host timebase values. The timebase values in the dispatch trace log need to be guest timebase values, however, since that is read directly by the guest. This moves the setting of vcpu->arch.dec_expires on guest exit to a point after we have restored the host timebase so that vcpu->arch.dec_expires is a host timebase value. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17KVM: PPC: Book3S HV: Save/restore SIAR and SDAR along with other PMU registersPaul Mackerras
Currently we are not saving and restoring the SIAR and SDAR registers in the PMU (performance monitor unit) on guest entry and exit. The result is that performance monitoring tools in the guest could get false information about where a program was executing and what data it was accessing at the time of a performance monitor interrupt. This fixes it by saving and restoring these registers along with the other PMU registers on guest entry/exit. This also provides a way for userspace to access these values for a vcpu via the one_reg interface. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-11powerpc: Provide for giveup_fpu/altivec to save state in alternate locationPaul Mackerras
This provides a facility which is intended for use by KVM, where the contents of the FP/VSX and VMX (Altivec) registers can be saved away to somewhere other than the thread_struct when kernel code wants to use floating point or VMX instructions. This is done by providing a pointer in the thread_struct to indicate where the state should be saved to. The giveup_fpu() and giveup_altivec() functions test these pointers and save state to the indicated location if they are non-NULL. Note that the MSR_FP/VEC bits in task->thread.regs->msr are still used to indicate whether the CPU register state is live, even when an alternate save location is being used. This also provides load_fp_state() and load_vr_state() functions, which load up FP/VSX and VMX state from memory into the CPU registers, and corresponding store_fp_state() and store_vr_state() functions, which store FP/VSX and VMX state into memory from the CPU registers. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-10-11powerpc: Put FP/VSX and VR state into structuresPaul Mackerras
This creates new 'thread_fp_state' and 'thread_vr_state' structures to store FP/VSX state (including FPSCR) and Altivec/VSX state (including VSCR), and uses them in the thread_struct. In the thread_fp_state, the FPRs and VSRs are represented as u64 rather than double, since we rarely perform floating-point computations on the values, and this will enable the structures to be used in KVM code as well. Similarly FPSCR is now a u64 rather than a structure of two 32-bit values. This takes the offsets out of the macros such as SAVE_32FPRS, REST_32FPRS, etc. This enables the same macros to be used for normal and transactional state, enabling us to delete the transactional versions of the macros. This also removes the unused do_load_up_fpu and do_load_up_altivec, which were in fact buggy since they didn't create large enough stack frames to account for the fact that load_up_fpu and load_up_altivec are not designed to be called from C and assume that their caller's stack frame is an interrupt frame. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-09-25powerpc: Remove ksp_limit on ppc64Benjamin Herrenschmidt
We've been keeping that field in thread_struct for a while, it contains the "limit" of the current stack pointer and is meant to be used for detecting stack overflows. It has a few problems however: - First, it was never actually *used* on 64-bit. Set and updated but not actually exploited - When switching stack to/from irq and softirq stacks, it's update is racy unless we hard disable interrupts, which is costly. This is fine on 32-bit as we don't soft-disable there but not on 64-bit. Thus rather than fixing 2 in order to implement 1 in some hypothetical future, let's remove the code completely from 64-bit. In order to avoid a clutter of ifdef's, we remove the updates from C code completely during interrupt stack switching, and instead maintain it from the asm helper that is used to do the stack switching in the first place. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-09-04Merge branch 'next' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Gleb Natapov: "The highlights of the release are nested EPT and pv-ticketlocks support (hypervisor part, guest part, which is most of the code, goes through tip tree). Apart of that there are many fixes for all arches" Fix up semantic conflicts as discussed in the pull request thread.. * 'next' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (88 commits) ARM: KVM: Add newlines to panic strings ARM: KVM: Work around older compiler bug ARM: KVM: Simplify tracepoint text ARM: KVM: Fix kvm_set_pte assignment ARM: KVM: vgic: Bump VGIC_NR_IRQS to 256 ARM: KVM: Bugfix: vgic_bytemap_get_reg per cpu regs ARM: KVM: vgic: fix GICD_ICFGRn access ARM: KVM: vgic: simplify vgic_get_target_reg KVM: MMU: remove unused parameter KVM: PPC: Book3S PR: Rework kvmppc_mmu_book3s_64_xlate() KVM: PPC: Book3S PR: Make instruction fetch fallback work for system calls KVM: PPC: Book3S PR: Don't corrupt guest state when kernel uses VMX KVM: x86: update masterclock when kvmclock_offset is calculated (v2) KVM: PPC: Book3S: Fix compile error in XICS emulation KVM: PPC: Book3S PR: return appropriate error when allocation fails arch: powerpc: kvm: add signed type cast for comparation KVM: x86: add comments where MMIO does not return to the emulator KVM: vmx: count exits to userspace during invalid guest emulation KVM: rename __kvm_io_bus_sort_cmp to kvm_io_bus_cmp kvm: optimize away THP checks in kvm_is_mmio_pfn() ...
2013-08-29Merge remote-tracking branch 'origin/next' into kvm-ppc-nextAlexander Graf
Conflicts: mm/Kconfig CMA DMA split and ZSWAP introduction were conflicting, fix up manually.
2013-08-09powerpc/tm: Fix context switching TAR, PPR and DSCR SPRsMichael Neuling
If a transaction is rolled back, the Target Address Register (TAR), Processor Priority Register (PPR) and Data Stream Control Register (DSCR) should be restored to the checkpointed values before the transaction began. Any changes to these SPRs inside the transaction should not be visible in the abort handler. Currently Linux doesn't save or restore the checkpointed TAR, PPR or DSCR. If we preempt a processes inside a transaction which has modified any of these, on process restore, that same transaction may be aborted we but we won't see the checkpointed versions of these SPRs. This adds checkpointed versions of these SPRs to the thread_struct and adds the save/restore of these three SPRs to the treclaim/trechkpt code. Without this if any of these SPRs are modified during a transaction, users may incorrectly see a speculated SPR value even if the transaction is aborted. Signed-off-by: Michael Neuling <mikey@neuling.org> Cc: <stable@vger.kernel.org> [v3.10] Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-07-25KVM: PPC: Book3S PR: Load up SPRG3 register with guest value on guest entryPaul Mackerras
Unlike the other general-purpose SPRs, SPRG3 can be read by usermode code, and is used in recent kernels to store the CPU and NUMA node numbers so that they can be read by VDSO functions. Thus we need to load the guest's SPRG3 value into the real SPRG3 register when entering the guest, and restore the host's value when exiting the guest. We don't need to save the guest SPRG3 value when exiting the guest as usermode code can't modify SPRG3. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-07-01powerpc/perf: Drop MMCRA from thread_structMichael Ellerman
In commit 59affcd "Context switch more PMU related SPRs" I added more PMU SPRs to thread_struct, later modified in commit b11ae95. To add insult to injury it turns out we don't need to switch MMCRA as it's only user readable, and the value is recomputed by the PMU code. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20powerpc: Restore dbcr0 on user space exitBharat Bhushan
On BookE (Branch taken + Single Step) is as same as Branch Taken on BookS and in Linux we simulate BookS behavior for BookE as well. When doing so, in Branch taken handling we want to set DBCR0_IC but we update the current->thread->dbcr0 and not DBCR0. Now on 64bit the current->thread.dbcr0 (and other debug registers) is synchronized ONLY on context switch flow. But after handling Branch taken in debug exception if we return back to user space without context switch then single stepping change (DBCR0_ICMP) does not get written in h/w DBCR0 and Instruction Complete exception does not happen. This fixes using ptrace reliably on BookE-PowerPC lmbench latency test (lat_syscall) Results are (they varies a little on each run) 1) ./lat_syscall <action> /dev/shm/uImage action: Open read write stat fstat null Before: 3.8618 0.2017 0.2851 1.6789 0.2256 0.0856 After: 3.8580 0.2017 0.2851 1.6955 0.2255 0.0856 1) ./lat_syscall -P 2 -N 10 <action> /dev/shm/uImage action: Open read write stat fstat null Before: 4.1388 0.2238 0.3066 1.7106 0.2256 0.0856 After: 4.1413 0.2236 0.3062 1.7107 0.2256 0.0856 [ Slightly modified to avoid extra branch in the fast path on Book3S and fix build on all non-BookE 64-bit -- BenH ] Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-24powerpc: Context switch more PMU related SPRsMichael Ellerman
In commit 9353374 "Context switch the new EBB SPRs" we added support for context switching some new EBB SPRs. However despite four of us signing off on that patch we missed some. To be fair these are not actually new SPRs, but they are now potentially user accessible so need to be context switched. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-05Merge tag 'kvm-3.10-1' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Gleb Natapov: "Highlights of the updates are: general: - new emulated device API - legacy device assignment is now optional - irqfd interface is more generic and can be shared between arches x86: - VMCS shadow support and other nested VMX improvements - APIC virtualization and Posted Interrupt hardware support - Optimize mmio spte zapping ppc: - BookE: in-kernel MPIC emulation with irqfd support - Book3S: in-kernel XICS emulation (incomplete) - Book3S: HV: migration fixes - BookE: more debug support preparation - BookE: e6500 support ARM: - reworking of Hyp idmaps s390: - ioeventfd for virtio-ccw And many other bug fixes, cleanups and improvements" * tag 'kvm-3.10-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (204 commits) kvm: Add compat_ioctl for device control API KVM: x86: Account for failing enable_irq_window for NMI window request KVM: PPC: Book3S: Add API for in-kernel XICS emulation kvm/ppc/mpic: fix missing unlock in set_base_addr() kvm/ppc: Hold srcu lock when calling kvm_io_bus_read/write kvm/ppc/mpic: remove users kvm/ppc/mpic: fix mmio region lists when multiple guests used kvm/ppc/mpic: remove default routes from documentation kvm: KVM_CAP_IOMMU only available with device assignment ARM: KVM: iterate over all CPUs for CPU compatibility check KVM: ARM: Fix spelling in error message ARM: KVM: define KVM_ARM_MAX_VCPUS unconditionally KVM: ARM: Fix API documentation for ONE_REG encoding ARM: KVM: promote vfp_host pointer to generic host cpu context ARM: KVM: add architecture specific hook for capabilities ARM: KVM: perform HYP initilization for hotplugged CPUs ARM: KVM: switch to a dual-step HYP init code ARM: KVM: rework HYP page table freeing ARM: KVM: enforce maximum size for identity mapped code ARM: KVM: move to a KVM provided HYP idmap ...
2013-05-02powerpc: Context switch the new EBB SPRsMichael Ellerman
This context switches the new Event Based Branching (EBB) SPRs. The three new SPRs are: - Event Based Branch Handler Register (EBBHR) - Event Based Branch Return Register (EBBRR) - Branch Event Status and Control Register (BESCR) Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26KVM: PPC: Book3S HV: Speed up wakeups of CPUs on HV KVMBenjamin Herrenschmidt
Currently, we wake up a CPU by sending a host IPI with smp_send_reschedule() to thread 0 of that core, which will take all threads out of the guest, and cause them to re-evaluate their interrupt status on the way back in. This adds a mechanism to differentiate real host IPIs from IPIs sent by KVM for guest threads to poke each other, in order to target the guest threads precisely when possible and avoid that global switch of the core to host state. We then use this new facility in the in-kernel XICS code. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26KVM: PPC: Book3S HV: Report VPA and DTL modifications in dirty mapPaul Mackerras
At present, the KVM_GET_DIRTY_LOG ioctl doesn't report modifications done by the host to the virtual processor areas (VPAs) and dispatch trace logs (DTLs) registered by the guest. This is because those modifications are done either in real mode or in the host kernel context, and in neither case does the access go through the guest's HPT, and thus no change (C) bit gets set in the guest's HPT. However, the changes done by the host do need to be tracked so that the modified pages get transferred when doing live migration. In order to track these modifications, this adds a dirty flag to the struct representing the VPA/DTL areas, and arranges to set the flag when the VPA/DTL gets modified by the host. Then, when we are collecting the dirty log, we also check the dirty flags for the VPA and DTL for each vcpu and set the relevant bit in the dirty log if necessary. Doing this also means we now need to keep track of the guest physical address of the VPA/DTL areas. So as not to lose track of modifications to a VPA/DTL area when it gets unregistered, or when a new area gets registered in its place, we need to transfer the dirty state to the rmap chain. This adds code to kvmppc_unpin_guest_page() to do that if the area was dirty. To simplify that code, we now require that all VPA, DTL and SLB shadow buffer areas fit within a single host page. Guests already comply with this requirement because pHyp requires that these areas not cross a 4k boundary. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-03-22KVM: PPC: booke: Added debug handlerBharat Bhushan
Installed debug handler will be used for guest debug support and debug facility emulation features (patches for these features will follow this patch). Signed-off-by: Liu Yu <yu.liu@freescale.com> [bharat.bhushan@freescale.com: Substantial changes] Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-02-24Merge tag 'kvm-3.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Marcelo Tosatti: "KVM updates for the 3.9 merge window, including x86 real mode emulation fixes, stronger memory slot interface restrictions, mmu_lock spinlock hold time reduction, improved handling of large page faults on shadow, initial APICv HW acceleration support, s390 channel IO based virtio, amongst others" * tag 'kvm-3.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (143 commits) Revert "KVM: MMU: lazily drop large spte" x86: pvclock kvm: align allocation size to page size KVM: nVMX: Remove redundant get_vmcs12 from nested_vmx_exit_handled_msr x86 emulator: fix parity calculation for AAD instruction KVM: PPC: BookE: Handle alignment interrupts booke: Added DBCR4 SPR number KVM: PPC: booke: Allow multiple exception types KVM: PPC: booke: use vcpu reference from thread_struct KVM: Remove user_alloc from struct kvm_memory_slot KVM: VMX: disable apicv by default KVM: s390: Fix handling of iscs. KVM: MMU: cleanup __direct_map KVM: MMU: remove pt_access in mmu_set_spte KVM: MMU: cleanup mapping-level KVM: MMU: lazily drop large spte KVM: VMX: cleanup vmx_set_cr0(). KVM: VMX: add missing exit names to VMX_EXIT_REASONS array KVM: VMX: disable SMEP feature when guest is in non-paging mode KVM: Remove duplicate text in api.txt Revert "KVM: MMU: split kvm_mmu_free_page" ...
2013-02-15powerpc: Add transactional memory paca scratch register to show_regsMichael Neuling
Add transactional memory paca scratch register to show_regs. This is useful for debugging. Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-02-15powerpc: New macros for transactional memory supportMichael Neuling
This adds new macros for saving and restoring checkpointed architected state from and to the thread_struct. It also adds some debugging macros for when your brain explodes trying to debug your transactional memory enabled kernel. Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-02-15powerpc/kvm/book3s_hv: Preserve guest CFAR register valuePaul Mackerras
The CFAR (Come-From Address Register) is a useful debugging aid that exists on POWER7 processors. Currently HV KVM doesn't save or restore the CFAR register for guest vcpus, making the CFAR of limited use in guests. This adds the necessary code to capture the CFAR value saved in the early exception entry code (it has to be saved before any branch is executed), save it in the vcpu.arch struct, and restore it on entry to the guest. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-02-13KVM: PPC: booke: use vcpu reference from thread_structBharat Bhushan
Like other places, use thread_struct to get vcpu reference. Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-02-08powerpc: Add support for context switching the TAR registerIan Munsie
This patch adds support for enabling and context switching the Target Address Register in Power8. The TAR is a new special purpose register that can be used for computed branches with the bctar[l] (branch conditional to TAR) instruction in the same manner as the count and link registers. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-01-10powerpc: Define ppr in thread_structHaren Myneni
[PATCH 4/6] powerpc: Define ppr in thread_struct ppr in thread_struct is used to save PPR and restore it before process exits from kernel. This patch sets the default priority to 3 when tasks are created such that users can use 4 for higher priority tasks. Signed-off-by: Haren Myneni <haren@us.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-12-06KVM: PPC: Book3S HV: Improve handling of local vs. global TLB invalidationsPaul Mackerras
When we change or remove a HPT (hashed page table) entry, we can do either a global TLB invalidation (tlbie) that works across the whole machine, or a local invalidation (tlbiel) that only affects this core. Currently we do local invalidations if the VM has only one vcpu or if the guest requests it with the H_LOCAL flag, though the guest Linux kernel currently doesn't ever use H_LOCAL. Then, to cope with the possibility that vcpus moving around to different physical cores might expose stale TLB entries, there is some code in kvmppc_hv_entry to flush the whole TLB of entries for this VM if either this vcpu is now running on a different physical core from where it last ran, or if this physical core last ran a different vcpu. There are a number of problems on POWER7 with this as it stands: - The TLB invalidation is done per thread, whereas it only needs to be done per core, since the TLB is shared between the threads. - With the possibility of the host paging out guest pages, the use of H_LOCAL by an SMP guest is dangerous since the guest could possibly retain and use a stale TLB entry pointing to a page that had been removed from the guest. - The TLB invalidations that we do when a vcpu moves from one physical core to another are unnecessary in the case of an SMP guest that isn't using H_LOCAL. - The optimization of using local invalidations rather than global should apply to guests with one virtual core, not just one vcpu. (None of this applies on PPC970, since there we always have to invalidate the whole TLB when entering and leaving the guest, and we can't support paging out guest memory.) To fix these problems and simplify the code, we now maintain a simple cpumask of which cpus need to flush the TLB on entry to the guest. (This is indexed by cpu, though we only ever use the bits for thread 0 of each core.) Whenever we do a local TLB invalidation, we set the bits for every cpu except the bit for thread 0 of the core that we're currently running on. Whenever we enter a guest, we test and clear the bit for our core, and flush the TLB if it was set. On initial startup of the VM, and when resetting the HPT, we set all the bits in the need_tlb_flush cpumask, since any core could potentially have stale TLB entries from the previous VM to use the same LPID, or the previous contents of the HPT. Then, we maintain a count of the number of online virtual cores, and use that when deciding whether to use a local invalidation rather than the number of online vcpus. The code to make that decision is extracted out into a new function, global_invalidates(). For multi-core guests on POWER7 (i.e. when we are using mmu notifiers), we now never do local invalidations regardless of the H_LOCAL flag. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2012-09-07Merge branch 'merge' into nextBenjamin Herrenschmidt
Brings in various bug fixes from 3.6-rcX
2012-09-07powerpc: Restore VDSO information on critical exception om BookEMihai Caraman
Critical exception on 64-bit booke uses user-visible SPRG3 as scratch. Restore VDSO information in SPRG3 on exception prolog. Use a common sprg3 field in PACA for all powerpc64 architectures. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-09-05powerpc: Restore correct DSCR in context switchAnton Blanchard
During a context switch we always restore the per thread DSCR value. If we aren't doing explicit DSCR management (ie thread.dscr_inherit == 0) and the default DSCR changed while the process has been sleeping we end up with the wrong value. Check thread.dscr_inherit and select the default DSCR or per thread DSCR as required. This was found with the following test case, when running with more threads than CPUs (ie forcing context switching): http://ozlabs.org/~anton/junkcode/dscr_default_test.c With the four patches applied I can run a combination of all test cases successfully at the same time: http://ozlabs.org/~anton/junkcode/dscr_default_test.c http://ozlabs.org/~anton/junkcode/dscr_explicit_test.c http://ozlabs.org/~anton/junkcode/dscr_inherit_test.c Signed-off-by: Anton Blanchard <anton@samba.org> Cc: <stable@kernel.org> # 3.0+ Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-11powerpc: Add VDSO version of getcpuAnton Blanchard
We have a request for a fast method of getting CPU and NUMA node IDs from userspace. This patch implements a getcpu VDSO function, similar to x86. Ben suggested we use SPRG3 which is userspace readable. SPRG3 can be modified by a KVM guest, so we save the SPRG3 value in the paca and restore it when transitioning from the guest to the host. I have a glibc patch that implements sched_getcpu on top of this. Testing on a POWER7: baseline: 538 cycles vdso: 30 cycles Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-05-24Merge branch 'next' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM changes from Avi Kivity: "Changes include additional instruction emulation, page-crossing MMIO, faster dirty logging, preventing the watchdog from killing a stopped guest, module autoload, a new MSI ABI, and some minor optimizations and fixes. Outside x86 we have a small s390 and a very large ppc update. Regarding the new (for kvm) rebaseless workflow, some of the patches that were merged before we switch trees had to be rebased, while others are true pulls. In either case the signoffs should be correct now." Fix up trivial conflicts in Documentation/feature-removal-schedule.txt arch/powerpc/kvm/book3s_segment.S and arch/x86/include/asm/kvm_para.h. I suspect the kvm_para.h resolution ends up doing the "do I have cpuid" check effectively twice (it was done differently in two different commits), but better safe than sorry ;) * 'next' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (125 commits) KVM: make asm-generic/kvm_para.h have an ifdef __KERNEL__ block KVM: s390: onereg for timer related registers KVM: s390: epoch difference and TOD programmable field KVM: s390: KVM_GET/SET_ONEREG for s390 KVM: s390: add capability indicating COW support KVM: Fix mmu_reload() clash with nested vmx event injection KVM: MMU: Don't use RCU for lockless shadow walking KVM: VMX: Optimize %ds, %es reload KVM: VMX: Fix %ds/%es clobber KVM: x86 emulator: convert bsf/bsr instructions to emulate_2op_SrcV_nobyte() KVM: VMX: unlike vmcs on fail path KVM: PPC: Emulator: clean up SPR reads and writes KVM: PPC: Emulator: clean up instruction parsing kvm/powerpc: Add new ioctl to retreive server MMU infos kvm/book3s: Make kernel emulated H_PUT_TCE available for "PR" KVM KVM: PPC: bookehv: Fix r8/r13 storing in level exception handler KVM: PPC: Book3S: Enable IRQs during exit handling KVM: PPC: Fix PR KVM on POWER7 bare metal KVM: PPC: Fix stbux emulation KVM: PPC: bookehv: Use lwz/stw instead of PPC_LL/PPC_STL for 32-bit fields ...
2012-04-30powerpc: Remove iseries specific fields in lppacaAnton Blanchard
Remove all the iseries specific fields in the lppaca. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-04-08KVM: PPC: Book 3S: Fix compilation for !HV configsPaul Mackerras
Commits 2f5cdd5487 ("KVM: PPC: Book3S HV: Make secondary threads more robust against stray IPIs") and 1c2066b0f7 ("KVM: PPC: Book3S HV: Make virtual processor area registration more robust") added fields to struct kvm_vcpu_arch inside #ifdef CONFIG_KVM_BOOK3S_64_HV regions, and added lines to arch/powerpc/kernel/asm-offsets.c to generate assembler constants for their offsets. Unfortunately this led to compile errors on Book 3S machines for configs that had KVM enabled but not CONFIG_KVM_BOOK3S_64_HV. This fixes the problem by moving the offending lines inside #ifdef CONFIG_KVM_BOOK3S_64_HV regions. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-08KVM: PPC: Book3S HV: Make virtual processor area registration more robustPaul Mackerras
The PAPR API allows three sorts of per-virtual-processor areas to be registered (VPA, SLB shadow buffer, and dispatch trace log), and furthermore, these can be registered and unregistered for another virtual CPU. Currently we just update the vcpu fields pointing to these areas at the time of registration or unregistration. If this is done on another vcpu, there is the possibility that the target vcpu is using those fields at the time and could end up using a bogus pointer and corrupting memory. This fixes the race by making the target cpu itself do the update, so we can be sure that the update happens at a time when the fields aren't being used. Each area now has a struct kvmppc_vpa which is used to manage these updates. There is also a spinlock which protects access to all of the kvmppc_vpa structs, other than to the pinned_addr fields. (We could have just taken the spinlock when using the vpa, slb_shadow or dtl fields, but that would mean taking the spinlock on every guest entry and exit.) This also changes 'struct dtl' (which was undefined) to 'struct dtl_entry', which is what the rest of the kernel uses. Thanks to Michael Ellerman <michael@ellerman.id.au> for pointing out the need to initialize vcpu->arch.vpa_update_lock. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-08KVM: PPC: Book3S HV: Make secondary threads more robust against stray IPIsPaul Mackerras
Currently on POWER7, if we are running the guest on a core and we don't need all the hardware threads, we do nothing to ensure that the unused threads aren't executing in the kernel (other than checking that they are offline). We just assume they're napping and we don't do anything to stop them trying to enter the kernel while the guest is running. This means that a stray IPI can wake up the hardware thread and it will then try to enter the kernel, but since the core is in guest context, it will execute code from the guest in hypervisor mode once it turns the MMU on, which tends to lead to crashes or hangs in the host. This fixes the problem by adding two new one-byte flags in the kvmppc_host_state structure in the PACA which are used to interlock between the primary thread and the unused secondary threads when entering the guest. With these flags, the primary thread can ensure that the unused secondaries are not already in kernel mode (i.e. handling a stray IPI) and then indicate that they should not try to enter the kernel if they do get woken for any reason. Instead they will go into KVM code, find that there is no vcpu to run, acknowledge and clear the IPI and go back to nap mode. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-08KVM: PPC: booke: category E.HV (GS-mode) supportScott Wood
Chips such as e500mc that implement category E.HV in Power ISA 2.06 provide hardware virtualization features, including a new MSR mode for guest state. The guest OS can perform many operations without trapping into the hypervisor, including transitions to and from guest userspace. Since we can use SRR1[GS] to reliably tell whether an exception came from guest state, instead of messing around with IVPR, we use DO_KVM similarly to book3s. Current issues include: - Machine checks from guest state are not routed to the host handler. - The guest can cause a host oops by executing an emulated instruction in a page that lacks read permission. Existing e500/4xx support has the same problem. Includes work by Ashish Kalra <Ashish.Kalra@freescale.com>, Varun Sethi <Varun.Sethi@freescale.com>, and Liu Yu <yu.liu@freescale.com>. Signed-off-by: Scott Wood <scottwood@freescale.com> [agraf: remove pt_regs usage] Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-28Merge branch 'kvm-updates/3.4' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Avi Kivity: "Changes include timekeeping improvements, support for assigning host PCI devices that share interrupt lines, s390 user-controlled guests, a large ppc update, and random fixes." This is with the sign-off's fixed, hopefully next merge window we won't have rebased commits. * 'kvm-updates/3.4' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (130 commits) KVM: Convert intx_mask_lock to spin lock KVM: x86: fix kvm_write_tsc() TSC matching thinko x86: kvmclock: abstract save/restore sched_clock_state KVM: nVMX: Fix erroneous exception bitmap check KVM: Ignore the writes to MSR_K7_HWCR(3) KVM: MMU: make use of ->root_level in reset_rsvds_bits_mask KVM: PMU: add proper support for fixed counter 2 KVM: PMU: Fix raw event check KVM: PMU: warn when pin control is set in eventsel msr KVM: VMX: Fix delayed load of shared MSRs KVM: use correct tlbs dirty type in cmpxchg KVM: Allow host IRQ sharing for assigned PCI 2.3 devices KVM: Ensure all vcpus are consistent with in-kernel irqchip settings KVM: x86 emulator: Allow PM/VM86 switch during task switch KVM: SVM: Fix CPL updates KVM: x86 emulator: VM86 segments must have DPL 3 KVM: x86 emulator: Fix task switch privilege checks arch/powerpc/kvm/book3s_hv.c: included linux/sched.h twice KVM: x86 emulator: correctly mask pmc index bits in RDPMC instruction emulation KVM: mmu_notifier: Flush TLBs before releasing mmu_lock ...
2012-03-21powerpc: Remove the remaining CONFIG_PPC_ISERIES piecesStephen Rothwell
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>