summaryrefslogtreecommitdiffstats
path: root/arch/arm/kvm
AgeCommit message (Collapse)Author
2015-10-20KVM: arm: use GIC support unconditionallyArnd Bergmann
The vgic code on ARM is built for all configurations that enable KVM, but the parent_data field that it references is only present when CONFIG_IRQ_DOMAIN_HIERARCHY is set: virt/kvm/arm/vgic.c: In function 'kvm_vgic_map_phys_irq': virt/kvm/arm/vgic.c:1781:13: error: 'struct irq_data' has no member named 'parent_data' This flag is implied by the GIC driver, and indeed the VGIC code only makes sense if a GIC is present. This changes the CONFIG_KVM symbol to always select GIC, which avoids the issue. Fixes: 662d9715840 ("arm/arm64: KVM: Kill CONFIG_KVM_ARM_{VGIC,TIMER}") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-10-20KVM: arm/arm64: Fix memory leak if timer initialization failsPavel Fedin
Jump to correct label and free kvm_host_cpu_state Reviewed-by: Wei Huang <wei@redhat.com> Signed-off-by: Pavel Fedin <p.fedin@samsung.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-09-17arm/arm64: KVM: Remove 'config KVM_ARM_MAX_VCPUS'Ming Lei
This patch removes config option of KVM_ARM_MAX_VCPUS, and like other ARCHs, just choose the maximum allowed value from hardware, and follows the reasons: 1) from distribution view, the option has to be defined as the max allowed value because it need to meet all kinds of virtulization applications and need to support most of SoCs; 2) using a bigger value doesn't introduce extra memory consumption, and the help text in Kconfig isn't accurate because kvm_vpu structure isn't allocated until request of creating VCPU is sent from QEMU; 3) the main effect is that the field of vcpus[] in 'struct kvm' becomes a bit bigger(sizeof(void *) per vcpu) and need more cache lines to hold the structure, but 'struct kvm' is one generic struct, and it has worked well on other ARCHs already in this way. Also, the world switch frequecy is often low, for example, it is ~2000 when running kernel building load in VM from APM xgene KVM host, so the effect is very small, and the difference can't be observed in my test at all. Cc: Dann Frazier <dann.frazier@canonical.com> Signed-off-by: Ming Lei <ming.lei@canonical.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-09-17arm: KVM: Disable virtual timer even if the guest is not using itMarc Zyngier
When running a guest with the architected timer disabled (with QEMU and the kernel_irqchip=off option, for example), it is important to make sure the timer gets turned off. Otherwise, the guest may try to enable it anyway, leading to a screaming HW interrupt. The fix is to unconditionally turn off the virtual timer on guest exit. Cc: stable@vger.kernel.org Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-09-16arm/arm64: KVM: vgic: Check for !irqchip_in_kernel() when mapping resourcesPavel Fedin
Until b26e5fdac43c ("arm/arm64: KVM: introduce per-VM ops"), kvm_vgic_map_resources() used to include a check on irqchip_in_kernel(), and vgic_v2_map_resources() still has it. But now vm_ops are not initialized until we call kvm_vgic_create(). Therefore kvm_vgic_map_resources() can being called without a VGIC, and we die because vm_ops.map_resources is NULL. Fixing this restores QEMU's kernel-irqchip=off option to a working state, allowing to use GIC emulation in userspace. Fixes: b26e5fdac43c ("arm/arm64: KVM: introduce per-VM ops") Cc: stable@vger.kernel.org Signed-off-by: Pavel Fedin <p.fedin@samsung.com> [maz: reworked commit message] Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-09-16arm: KVM: Fix incorrect device to IPA mappingMarek Majtyka
A critical bug has been found in device memory stage1 translation for VMs with more then 4GB of address space. Once vm_pgoff size is smaller then pa (which is true for LPAE case, u32 and u64 respectively) some more significant bits of pa may be lost as a shift operation is performed on u32 and later cast onto u64. Example: vm_pgoff(u32)=0x00210030, PAGE_SHIFT=12 expected pa(u64): 0x0000002010030000 produced pa(u64): 0x0000000010030000 The fix is to change the order of operations (casting first onto phys_addr_t and then shifting). Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> [maz: fixed changelog and patch formatting] Cc: stable@vger.kernel.org Signed-off-by: Marek Majtyka <marek.majtyka@tieto.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-09-04arm/arm64: KVM: Fix PSCI affinity info return value for non valid coresAlexander Spyridakis
If a guest requests the affinity info for a non-existing vCPU we need to properly return an error, instead of erroneously reporting an off state. Signed-off-by: Alexander Spyridakis <a.spyridakis@virtualopensystems.com> Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-08-19arm: KVM: keep arm vfp/simd exit handling consistent with arm64Mario Smarduch
After enhancing arm64 FP/SIMD exit handling, ARMv7 VFP exit branch is moved to guest trap handling. This allows us to keep exit handling flow between both architectures consistent. Signed-off-by: Mario Smarduch <m.smarduch@samsung.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-08-12KVM: arm/arm64: timer: Allow the timer to control the active stateMarc Zyngier
In order to remove the crude hack where we sneak the masked bit into the timer's control register, make use of the phys_irq_map API control the active state of the interrupt. This causes some limited changes to allow for potential error propagation. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-08-12KVM: arm/arm64: vgic: Allow dynamic mapping of physical/virtual interruptsMarc Zyngier
In order to be able to feed physical interrupts to a guest, we need to be able to establish the virtual-physical mapping between the two worlds. The mappings are kept in a set of RCU lists, indexed by virtual interrupts. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-08-12arm/arm64: KVM: Move vgic handling to a non-preemptible sectionMarc Zyngier
As we're about to introduce some serious GIC-poking to the vgic code, it is important to make sure that we're going to poke the part of the GIC that belongs to the CPU we're about to run on (otherwise, we'd end up with some unexpected interrupts firing)... Introducing a non-preemptible section in kvm_arch_vcpu_ioctl_run prevents the problem from occuring. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-08-12arm/arm64: KVM: Fix ordering of timer/GIC on guest entryMarc Zyngier
As we now inject the timer interrupt when we're about to enter the guest, it makes a lot more sense to make sure this happens before the vgic code queues the pending interrupts. Otherwise, we get the interrupt on the following exit, which is not great for latency (and leads to all kind of bizarre issues when using with active interrupts at the HW level). Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-07-21KVM: arm64: introduce vcpu->arch.debug_ptrAlex Bennée
This introduces a level of indirection for the debug registers. Instead of using the sys_regs[] directly we store registers in a structure in the vcpu. The new kvm_arm_reset_debug_ptr() sets the debug ptr to the guest context. Because we no longer give the sys_regs offset for the sys_reg_desc->reg field, but instead the index into a debug-specific struct we need to add a number of additional trap functions for each register. Also as the generic generic user-space access code no longer works we have introduced a new pair of function pointers to the sys_reg_desc structure to override the generic code when needed. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-07-21KVM: arm: introduce kvm_arm_init/setup/clear_debugAlex Bennée
This is a precursor for later patches which will need to do more to setup debug state before entering the hyp.S switch code. The existing functionality for setting mdcr_el2 has been moved out of hyp.S and now uses the value kept in vcpu->arch.mdcr_el2. As the assembler used to previously mask and preserve MDCR_EL2.HPMN I've had to add a mechanism to save the value of mdcr_el2 as a per-cpu variable during the initialisation code. The kernel never sets this number so we are assuming the bootcode has set up the correct value here. This also moves the conditional setting of the TDA bit from the hyp code into the C code which is currently used for the lazy debug register context switch code. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-07-21KVM: arm: guest debug, add stub KVM_SET_GUEST_DEBUG ioctlAlex Bennée
This commit adds a stub function to support the KVM_SET_GUEST_DEBUG ioctl. Any unsupported flag will return -EINVAL. For now, only KVM_GUESTDBG_ENABLE is supported, although it won't have any effects. Signed-off-by: Alex Bennée <alex.bennee@linaro.org>. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-06-26Merge branch 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-armLinus Torvalds
Pull ARM updates from Russell King: "Bigger items included in this update are: - A series of updates from Arnd for ARM randconfig build failures - Updates from Dmitry for StrongARM SA-1100 to move IRQ handling to drivers/irqchip/ - Move ARMs SP804 timer to drivers/clocksource/ - Perf updates from Mark Rutland in preparation to move the ARM perf code into drivers/ so it can be shared with ARM64. - MCPM updates from Nicolas - Add support for taking platform serial number from DT - Re-implement Keystone2 physical address space switch to conform to architecture requirements - Clean up ARMv7 LPAE code, which goes in hand with the Keystone2 changes. - L2C cleanups to avoid unlocking caches if we're prevented by the secure support to unlock. - Avoid cleaning a potentially dirty cache containing stale data on CPU initialisation - Add ARM-only entry point for secondary startup (for machines that can only call into a Thumb kernel in ARM mode). Same thing is also done for the resume entry point. - Provide arch_irqs_disabled via asm-generic - Enlarge ARMv7M vector table - Always use BFD linker for VDSO, as gold doesn't accept some of the options we need. - Fix an incorrect BSYM (for Thumb symbols) usage, and convert all BSYM compiler macros to a "badr" (for branch address). - Shut up compiler warnings provoked by our cmpxchg() implementation. - Ensure bad xchg sizes fail to link" * 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm: (75 commits) ARM: Fix build if CLKDEV_LOOKUP is not configured ARM: fix new BSYM() usage introduced via for-arm-soc branch ARM: 8383/1: nommu: avoid deprecated source register on mov ARM: 8391/1: l2c: add options to overwrite prefetching behavior ARM: 8390/1: irqflags: Get arch_irqs_disabled from asm-generic ARM: 8387/1: arm/mm/dma-mapping.c: Add arm_coherent_dma_mmap ARM: 8388/1: tcm: Don't crash when TCM banks are protected by TrustZone ARM: 8384/1: VDSO: force use of BFD linker ARM: 8385/1: VDSO: group link options ARM: cmpxchg: avoid warnings from macro-ized cmpxchg() implementations ARM: remove __bad_xchg definition ARM: 8369/1: ARMv7M: define size of vector table for Vybrid ARM: 8382/1: clocksource: make ARM_TIMER_SP804 depend on GENERIC_SCHED_CLOCK ARM: 8366/1: move Dual-Timer SP804 driver to drivers/clocksource ARM: 8365/1: introduce sp804_timer_disable and remove arm_timer.h inclusion ARM: 8364/1: fix BE32 module loading ARM: 8360/1: add secondary_startup_arm prototype in header file ARM: 8359/1: correct secondary_startup_arm mode ARM: proc-v7: sanitise and document registers around errata ARM: proc-v7: clean up MIDR access ...
2015-06-24Merge tag 'arm64-upstream' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: "Mostly refactoring/clean-up: - CPU ops and PSCI (Power State Coordination Interface) refactoring following the merging of the arm64 ACPI support, together with handling of Trusted (secure) OS instances - Using fixmap for permanent FDT mapping, removing the initial dtb placement requirements (within 512MB from the start of the kernel image). This required moving the FDT self reservation out of the memreserve processing - Idmap (1:1 mapping used for MMU on/off) handling clean-up - Removing flush_cache_all() - not safe on ARM unless the MMU is off. Last stages of CPU power down/up are handled by firmware already - "Alternatives" (run-time code patching) refactoring and support for immediate branch patching, GICv3 CPU interface access - User faults handling clean-up And some fixes: - Fix for VDSO building with broken ELF toolchains - Fix another case of init_mm.pgd usage for user mappings (during ASID roll-over broadcasting) - Fix for FPSIMD reloading after CPU hotplug - Fix for missing syscall trace exit - Workaround for .inst asm bug - Compat fix for switching the user tls tpidr_el0 register" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (42 commits) arm64: use private ratelimit state along with show_unhandled_signals arm64: show unhandled SP/PC alignment faults arm64: vdso: work-around broken ELF toolchains in Makefile arm64: kernel: rename __cpu_suspend to keep it aligned with arm arm64: compat: print compat_sp instead of sp arm64: mm: Fix freeing of the wrong memmap entries with !SPARSEMEM_VMEMMAP arm64: entry: fix context tracking for el0_sp_pc arm64: defconfig: enable memtest arm64: mm: remove reference to tlb.S from comment block arm64: Do not attempt to use init_mm in reset_context() arm64: KVM: Switch vgic save/restore to alternative_insn arm64: alternative: Introduce feature for GICv3 CPU interface arm64: psci: fix !CONFIG_HOTPLUG_CPU build warning arm64: fix bug for reloading FPSIMD state after CPU hotplug. arm64: kernel thread don't need to save fpsimd context. arm64: fix missing syscall trace exit arm64: alternative: Work around .inst assembler bugs arm64: alternative: Merge alternative-asm.h into alternative.h arm64: alternative: Allow immediate branch as alternative instruction arm64: Rework alternate sequence for ARM erratum 845719 ...
2015-06-19Merge tag 'kvm-arm-for-4.2' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/ARM changes for v4.2: - Proper guest time accounting - FP access fix for 32bit - The usual pile of GIC fixes - PSCI fixes - Random cleanups
2015-06-17arm/arm64: KVM: vgic: Do not save GICH_HCR / ICH_HCR_EL2Marc Zyngier
The GIC Hypervisor Configuration Register is used to enable the delivery of virtual interupts to a guest, as well as to define in which conditions maintenance interrupts are delivered to the host. This register doesn't contain any information that we need to read back (the EOIcount is utterly useless for us). So let's save ourselves some cycles, and not save it before writing zero to it. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-06-17ARM: kvm: psci: fix handling of unimplemented functionsLorenzo Pieralisi
According to the PSCI specification and the SMC/HVC calling convention, PSCI function_ids that are not implemented must return NOT_SUPPORTED as return value. Current KVM implementation takes an unhandled PSCI function_id as an error and injects an undefined instruction into the guest if PSCI implementation is called with a function_id that is not handled by the resident PSCI version (ie it is not implemented), which is not the behaviour expected by a guest when calling a PSCI function_id that is not implemented. This patch fixes this issue by returning NOT_SUPPORTED whenever the kvm PSCI call is executed for a function_id that is not implemented by the PSCI kvm layer. Cc: <stable@vger.kernel.org> # 3.18+ Cc: Christoffer Dall <christoffer.dall@linaro.org> Acked-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-06-17KVM: arm/arm64: Enable the KVM-VFIO deviceKim Phillips
The KVM-VFIO device is used by the QEMU VFIO device. It is used to record the list of in-use VFIO groups so that KVM can manipulate them. Signed-off-by: Kim Phillips <kim.phillips@linaro.org> Signed-off-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-06-17arm/arm64: KVM: Properly account for guest CPU timeChristoffer Dall
Until now we have been calling kvm_guest_exit after re-enabling interrupts when we come back from the guest, but this has the unfortunate effect that CPU time accounting done in the context of timer interrupts occurring while the guest is running doesn't properly notice that the time since the last tick was spent in the guest. Inspired by the comment in the x86 code, move the kvm_guest_exit() call below the local_irq_enable() call and change __kvm_guest_exit() to kvm_guest_exit(), because we are now calling this function with interrupts enabled. We have to now explicitly disable preemption and not enable preemption before we've called kvm_guest_exit(), since otherwise we could be preempted and everything happening before we eventually get scheduled again would be accounted for as guest time. At the same time, move the trace_kvm_exit() call outside of the atomic section, since there is no reason for us to do that with interrupts disabled. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-06-17kvm: remove one useless check extensionTiejun Chen
We already check KVM_CAP_IRQFD in generic once enable CONFIG_HAVE_KVM_IRQFD, kvm_vm_ioctl_check_extension_generic() | + switch (arg) { + ... + #ifdef CONFIG_HAVE_KVM_IRQFD + case KVM_CAP_IRQFD: + #endif + ... + return 1; + ... + } | + kvm_vm_ioctl_check_extension() So its not necessary to check this in arch again, and also fix one typo, s/emlation/emulation. Signed-off-by: Tiejun Chen <tiejun.chen@intel.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-06-17arm: KVM: force execution of HCPTR access on VM exitMarc Zyngier
On VM entry, we disable access to the VFP registers in order to perform a lazy save/restore of these registers. On VM exit, we restore access, test if we did enable them before, and save/restore the guest/host registers if necessary. In this sequence, the FPEXC register is always accessed, irrespective of the trapping configuration. If the guest didn't touch the VFP registers, then the HCPTR access has now enabled such access, but we're missing a barrier to ensure architectural execution of the new HCPTR configuration. If the HCPTR access has been delayed/reordered, the subsequent access to FPEXC will cause a trap, which we aren't prepared to handle at all. The same condition exists when trapping to enable VFP for the guest. The fix is to introduce a barrier after enabling VFP access. In the vmexit case, it can be relaxed to only takes place if the guest hasn't accessed its view of the VFP registers, making the access to FPEXC safe. The set_hcptr macro is modified to deal with both vmenter/vmexit and vmtrap operations, and now takes an optional label that is branched to when the guest hasn't touched the VFP registers. Reported-by: Vikram Sethi <vikrams@codeaurora.org> Cc: stable@kernel.org # v3.9+ Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-06-09ARM: KVM: Remove pointless void pointer castFiro Yang
No need to cast the void pointer returned by kmalloc() in arch/arm/kvm/mmu.c::kvm_alloc_stage2_pgd(). Signed-off-by: Firo Yang <firogm@gmail.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-05-28KVM: add "new" argument to kvm_arch_commit_memory_regionPaolo Bonzini
This lets the function access the new memory slot without going through kvm_memslots and id_to_memslot. It will simplify the code when more than one address space will be supported. Unfortunately, the "const"ness of the new argument must be casted away in two places. Fixing KVM to accept const struct kvm_memory_slot pointers would require modifications in pretty much all architectures, and is left for later. Reviewed-by: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-27arm/arm64: kvm: add missing PSCI includeMark Rutland
We make use of the PSCI function IDs, but don't explicitly include the header which defines them. Relying on transitive header includes is fragile and will be broken as headers are refactored. This patch includes the relevant header file directly so as to avoid future breakage. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Hanjun Guo <hanjun.guo@linaro.org> Tested-by: Hanjun Guo <hanjun.guo@linaro.org> Cc: Marc Zyngier <marc.zyngier@arm.com>
2015-05-26KVM: add memslots argument to kvm_arch_memslots_updatedPaolo Bonzini
Prepare for the case of multiple address spaces. Reviewed-by: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-26KVM: const-ify uses of struct kvm_userspace_memory_regionPaolo Bonzini
Architecture-specific helpers are not supposed to muck with struct kvm_userspace_memory_region contents. Add const to enforce this. In order to eliminate the only write in __kvm_set_memory_region, the cleaning of deleted slots is pulled up from update_memslots to __kvm_set_memory_region. Reviewed-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Reviewed-by: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-26KVM: use kvm_memslots whenever possiblePaolo Bonzini
kvm_memslots provides lockdep checking. Use it consistently instead of explicit dereferencing of kvm->memslots. Reviewed-by: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-08ARM: kvm: fix a bad BSYM() usageRussell King
BSYM() should only be used when refering to local symbols in the same assembly file which are resolved by the assembler, and not for linker-fixed up symbols. The use of BSYM() with panic is incorrect as the linker is involved in fixing up this relocation, and it knows whether panic() is ARM or Thumb. Acked-by: Nicolas Pitre <nico@linaro.org> Acked-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-05-07KVM: arm/mips/x86/power use __kvm_guest_{enter|exit}Christian Borntraeger
Use __kvm_guest_{enter|exit} instead of kvm_guest_{enter|exit} where interrupts are disabled. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-04-26Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull second batch of KVM changes from Paolo Bonzini: "This mostly includes the PPC changes for 4.1, which this time cover Book3S HV only (debugging aids, minor performance improvements and some cleanups). But there are also bug fixes and small cleanups for ARM, x86 and s390. The task_migration_notifier revert and real fix is still pending review, but I'll send it as soon as possible after -rc1" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (29 commits) KVM: arm/arm64: check IRQ number on userland injection KVM: arm: irqfd: fix value returned by kvm_irq_map_gsi KVM: VMX: Preserve host CR4.MCE value while in guest mode. KVM: PPC: Book3S HV: Use msgsnd for signalling threads on POWER8 KVM: PPC: Book3S HV: Translate kvmhv_commence_exit to C KVM: PPC: Book3S HV: Streamline guest entry and exit KVM: PPC: Book3S HV: Use bitmap of active threads rather than count KVM: PPC: Book3S HV: Use decrementer to wake napping threads KVM: PPC: Book3S HV: Don't wake thread with no vcpu on guest IPI KVM: PPC: Book3S HV: Get rid of vcore nap_count and n_woken KVM: PPC: Book3S HV: Move vcore preemption point up into kvmppc_run_vcpu KVM: PPC: Book3S HV: Minor cleanups KVM: PPC: Book3S HV: Simplify handling of VCPUs that need a VPA update KVM: PPC: Book3S HV: Accumulate timing information for real-mode code KVM: PPC: Book3S HV: Create debugfs file for each guest's HPT KVM: PPC: Book3S HV: Add ICP real mode counters KVM: PPC: Book3S HV: Move virtual mode ICP functions to real-mode KVM: PPC: Book3S HV: Convert ICS mutex lock to spin lock KVM: PPC: Book3S HV: Add guest->host real mode completion counters KVM: PPC: Book3S HV: Add helpers for lock/unlock hpte ...
2015-04-22KVM: arm/arm64: check IRQ number on userland injectionAndre Przywara
When userland injects a SPI via the KVM_IRQ_LINE ioctl we currently only check it against a fixed limit, which historically is set to 127. With the new dynamic IRQ allocation the effective limit may actually be smaller (64). So when now a malicious or buggy userland injects a SPI in that range, we spill over on our VGIC bitmaps and bytemaps memory. I could trigger a host kernel NULL pointer dereference with current mainline by injecting some bogus IRQ number from a hacked kvmtool: ----------------- .... DEBUG: kvm_vgic_inject_irq(kvm, cpu=0, irq=114, level=1) DEBUG: vgic_update_irq_pending(kvm, cpu=0, irq=114, level=1) DEBUG: IRQ #114 still in the game, writing to bytemap now... Unable to handle kernel NULL pointer dereference at virtual address 00000000 pgd = ffffffc07652e000 [00000000] *pgd=00000000f658b003, *pud=00000000f658b003, *pmd=0000000000000000 Internal error: Oops: 96000006 [#1] PREEMPT SMP Modules linked in: CPU: 1 PID: 1053 Comm: lkvm-msi-irqinj Not tainted 4.0.0-rc7+ #3027 Hardware name: FVP Base (DT) task: ffffffc0774e9680 ti: ffffffc0765a8000 task.ti: ffffffc0765a8000 PC is at kvm_vgic_inject_irq+0x234/0x310 LR is at kvm_vgic_inject_irq+0x30c/0x310 pc : [<ffffffc0000ae0a8>] lr : [<ffffffc0000ae180>] pstate: 80000145 ..... So this patch fixes this by checking the SPI number against the actual limit. Also we remove the former legacy hard limit of 127 in the ioctl code. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> CC: <stable@vger.kernel.org> # 4.0, 3.19, 3.18 [maz: wrap KVM_ARM_IRQ_GIC_MAX with #ifndef __KERNEL__, as suggested by Christopher Covington] Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-04-16Merge tag 'arm64-upstream' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: "Here are the core arm64 updates for 4.1. Highlights include a significant rework to head.S (allowing us to boot on machines with physical memory at a really high address), an AES performance boost on Cortex-A57 and the ability to run a 32-bit userspace with 64k pages (although this requires said userspace to be built with a recent binutils). The head.S rework spilt over into KVM, so there are some changes under arch/arm/ which have been acked by Marc Zyngier (KVM co-maintainer). In particular, the linker script changes caused us some issues in -next, so there are a few merge commits where we had to apply fixes on top of a stable branch. Other changes include: - AES performance boost for Cortex-A57 - AArch32 (compat) userspace with 64k pages - Cortex-A53 erratum workaround for #845719 - defconfig updates (new platforms, PCI, ...)" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (39 commits) arm64: fix midr range for Cortex-A57 erratum 832075 arm64: errata: add workaround for cortex-a53 erratum #845719 arm64: Use bool function return values of true/false not 1/0 arm64: defconfig: updates for 4.1 arm64: Extract feature parsing code from cpu_errata.c arm64: alternative: Allow immediate branch as alternative instruction arm64: insn: Add aarch64_insn_decode_immediate ARM: kvm: round HYP section to page size instead of log2 upper bound ARM: kvm: assert on HYP section boundaries not actual code size arm64: head.S: ensure idmap_t0sz is visible arm64: pmu: add support for interrupt-affinity property dt: pmu: extend ARM PMU binding to allow for explicit interrupt affinity arm64: head.S: ensure visibility of page tables arm64: KVM: use ID map with increased VA range if required arm64: mm: increase VA range of identity map ARM: kvm: implement replacement for ld's LOG2CEIL() arm64: proc: remove unused cpu_get_pgd macro arm64: enforce x1|x2|x3 == 0 upon kernel entry as per boot protocol arm64: remove __calc_phys_offset arm64: merge __enable_mmu and __turn_mmu_on ...
2015-04-07Merge tag 'kvm-arm-for-4.1' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into 'kvm-next' KVM/ARM changes for v4.1: - fixes for live migration - irqfd support - kvm-io-bus & vgic rework to enable ioeventfd - page ageing for stage-2 translation - various cleanups
2015-03-30KVM: arm/arm64: enable KVM_CAP_IOEVENTFDNikolay Nikolaev
As the infrastructure for eventfd has now been merged, report the ioeventfd capability as being supported. Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com> [maz: grouped the case entry with the others, fixed commit log] Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-03-30KVM: arm/arm64: rework MMIO abort handling to use KVM MMIO busAndre Przywara
Currently we have struct kvm_exit_mmio for encapsulating MMIO abort data to be passed on from syndrome decoding all the way down to the VGIC register handlers. Now as we switch the MMIO handling to be routed through the KVM MMIO bus, it does not make sense anymore to use that structure already from the beginning. So we keep the data in local variables until we put them into the kvm_io_bus framework. Then we fill kvm_exit_mmio in the VGIC only, making it a VGIC private structure. On that way we replace the data buffer in that structure with a pointer pointing to a single location in a local variable, so we get rid of some copying on the way. With all of the virtual GIC emulation code now being registered with the kvm_io_bus, we can remove all of the old MMIO handling code and its dispatching functionality. I didn't bother to rename kvm_exit_mmio (to vgic_mmio or something), because that touches a lot of code lines without any good reason. This is based on an original patch by Nikolay. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Cc: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-03-27Merge branch 'aarch64/kvm-bounce-page' into aarch64/for-next/coreWill Deacon
Just as we thought we'd fixed this, another old linker reared its ugly head trying to build linux-next. Unfortunately, it's the linker binary provided on kernel.org, so give up trying to be clever and align the hyp page to 4k.
2015-03-27ARM: kvm: round HYP section to page size instead of log2 upper boundArd Biesheuvel
Older binutils do not support expressions involving the values of external symbols so just round up the HYP region to the page size. Tested-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> [will: when will this ever end?!] Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-03-26KVM: arm/arm64: remove now unneeded include directory from MakefileAndre Przywara
virt/kvm was never really a good include directory for anything else than locally included headers. With the move of iodev.h there is no need anymore to add this directory the compiler's include path, so remove it from the arm and arm64 kvm Makefile. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-03-23arm64: KVM: use ID map with increased VA range if requiredArd Biesheuvel
This patch modifies the HYP init code so it can deal with system RAM residing at an offset which exceeds the reach of VA_BITS. Like for EL1, this involves configuring an additional level of translation for the ID map. However, in case of EL2, this implies that all translations use the extra level, as we cannot seamlessly switch between translation tables with different numbers of translation levels. So add an extra translation table at the root level. Since the ID map and the runtime HYP map are guaranteed not to overlap, they can share this root level, and we can essentially merge these two tables into one. Tested-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-03-19ARM, arm64: kvm: get rid of the bounce pageArd Biesheuvel
The HYP init bounce page is a runtime construct that ensures that the HYP init code does not cross a page boundary. However, this is something we can do perfectly well at build time, by aligning the code appropriately. For arm64, we just align to 4 KB, and enforce that the code size is less than 4 KB, regardless of the chosen page size. For ARM, the whole code is less than 256 bytes, so we tweak the linker script to align at a power of 2 upper bound of the code size Note that this also fixes a benign off-by-one error in the original bounce page code, where a bounce page would be allocated unnecessarily if the code was exactly 1 page in size. On ARM, it also fixes an issue with very large kernels reported by Arnd Bergmann, where stub sections with linker emitted veneers could erroneously trigger the size/alignment ASSERT() in the linker script. Tested-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-03-14arm/arm64: KVM: Fix migration race in the arch timerChristoffer Dall
When a VCPU is no longer running, we currently check to see if it has a timer scheduled in the future, and if it does, we schedule a host hrtimer to notify is in case the timer expires while the VCPU is still not running. When the hrtimer fires, we mask the guest's timer and inject the timer IRQ (still relying on the guest unmasking the time when it receives the IRQ). This is all good and fine, but when migration a VM (checkpoint/restore) this introduces a race. It is unlikely, but possible, for the following sequence of events to happen: 1. Userspace stops the VM 2. Hrtimer for VCPU is scheduled 3. Userspace checkpoints the VGIC state (no pending timer interrupts) 4. The hrtimer fires, schedules work in a workqueue 5. Workqueue function runs, masks the timer and injects timer interrupt 6. Userspace checkpoints the timer state (timer masked) At restore time, you end up with a masked timer without any timer interrupts and your guest halts never receiving timer interrupts. Fix this by only kicking the VCPU in the workqueue function, and sample the expired state of the timer when entering the guest again and inject the interrupt and mask the timer only then. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-03-14arm/arm64: KVM: export VCPU power state via MP_STATE ioctlAlex Bennée
To cleanly restore an SMP VM we need to ensure that the current pause state of each vcpu is correctly recorded. Things could get confused if the CPU starts running after migration restore completes when it was paused before it state was captured. We use the existing KVM_GET/SET_MP_STATE ioctl to do this. The arm/arm64 interface is a lot simpler as the only valid states are KVM_MP_STATE_RUNNABLE and KVM_MP_STATE_STOPPED. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-03-12arm/arm64: KVM: Optimize handling of Access Flag faultsMarc Zyngier
Now that we have page aging in Stage-2, it becomes obvious that we're doing way too much work handling the fault. The page is not going anywhere (it is still mapped), the page tables are already allocated, and all we want is to flip a bit in the PMD or PTE. Also, we can avoid any form of TLB invalidation, since a page with the AF bit off is not allowed to be cached. An obvious solution is to have a separate handler for FSC_ACCESS, where we pride ourselves to only do the very minimum amount of work. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-03-12arm/arm64: KVM: Implement Stage-2 page agingMarc Zyngier
Until now, KVM/arm didn't care much for page aging (who was swapping anyway?), and simply provided empty hooks to the core KVM code. With server-type systems now being available, things are quite different. This patch implements very simple support for page aging, by clearing the Access flag in the Stage-2 page tables. On access fault, the current fault handling will write the PTE or PMD again, putting the Access flag back on. It should be possible to implement a much faster handling for Access faults, but that's left for a later patch. With this in place, performance in VMs is degraded much more gracefully. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-03-12arm/arm64: KVM: Allow handle_hva_to_gpa to return a valueMarc Zyngier
So far, handle_hva_to_gpa was never required to return a value. As we prepare to age pages at Stage-2, we need to be able to return a value from the iterator (kvm_test_age_hva). Adapt the code to handle this situation. No semantic change. Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-03-12KVM: arm/arm64: add irqfd supportEric Auger
This patch enables irqfd on arm/arm64. Both irqfd and resamplefd are supported. Injection is implemented in vgic.c without routing. This patch enables CONFIG_HAVE_KVM_EVENTFD and CONFIG_HAVE_KVM_IRQFD. KVM_CAP_IRQFD is now advertised. KVM_CAP_IRQFD_RESAMPLE capability automatically is advertised as soon as CONFIG_HAVE_KVM_IRQFD is set. Irqfd injection is restricted to SPI. The rationale behind not supporting PPI irqfd injection is that any device using a PPI would be a private-to-the-CPU device (timer for instance), so its state would have to be context-switched along with the VCPU and would require in-kernel wiring anyhow. It is not a relevant use case for irqfds. Signed-off-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-03-12KVM: arm/arm64: implement kvm_arch_intc_initializedEric Auger
On arm/arm64 the VGIC is dynamically instantiated and it is useful to expose its state, especially for irqfd setup. This patch defines __KVM_HAVE_ARCH_INTC_INITIALIZED and implements kvm_arch_intc_initialized. Signed-off-by: Eric Auger <eric.auger@linaro.org> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>