aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/virt/kvm/devices
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/virt/kvm/devices')
-rw-r--r--Documentation/virt/kvm/devices/arm-vgic-its.rst7
-rw-r--r--Documentation/virt/kvm/devices/arm-vgic-v3.rst9
-rw-r--r--Documentation/virt/kvm/devices/vcpu.rst165
-rw-r--r--Documentation/virt/kvm/devices/vfio.rst48
-rw-r--r--Documentation/virt/kvm/devices/vm.rst88
-rw-r--r--Documentation/virt/kvm/devices/xics.rst2
-rw-r--r--Documentation/virt/kvm/devices/xive.rst4
7 files changed, 296 insertions, 27 deletions
diff --git a/Documentation/virt/kvm/devices/arm-vgic-its.rst b/Documentation/virt/kvm/devices/arm-vgic-its.rst
index 6c304fd2b1b4..e053124f77c4 100644
--- a/Documentation/virt/kvm/devices/arm-vgic-its.rst
+++ b/Documentation/virt/kvm/devices/arm-vgic-its.rst
@@ -52,7 +52,10 @@ KVM_DEV_ARM_VGIC_GRP_CTRL
KVM_DEV_ARM_ITS_SAVE_TABLES
save the ITS table data into guest RAM, at the location provisioned
- by the guest in corresponding registers/table entries.
+ by the guest in corresponding registers/table entries. Should userspace
+ require a form of dirty tracking to identify which pages are modified
+ by the saving process, it should use a bitmap even if using another
+ mechanism to track the memory dirtied by the vCPUs.
The layout of the tables in guest memory defines an ABI. The entries
are laid out in little endian format as described in the last paragraph.
@@ -80,7 +83,7 @@ KVM_DEV_ARM_VGIC_GRP_CTRL
-EFAULT Invalid guest ram access
-EBUSY One or more VCPUS are running
-EACCES The virtual ITS is backed by a physical GICv4 ITS, and the
- state is not available
+ state is not available without GICv4.1
======= ==========================================================
KVM_DEV_ARM_VGIC_GRP_ITS_REGS
diff --git a/Documentation/virt/kvm/devices/arm-vgic-v3.rst b/Documentation/virt/kvm/devices/arm-vgic-v3.rst
index 5dd3bff51978..5817edb4e046 100644
--- a/Documentation/virt/kvm/devices/arm-vgic-v3.rst
+++ b/Documentation/virt/kvm/devices/arm-vgic-v3.rst
@@ -59,6 +59,13 @@ Groups:
It is invalid to mix calls with KVM_VGIC_V3_ADDR_TYPE_REDIST and
KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION attributes.
+ Note that to obtain reproducible results (the same VCPU being associated
+ with the same redistributor across a save/restore operation), VCPU creation
+ order, redistributor region creation order as well as the respective
+ interleaves of VCPU and region creation MUST be preserved. Any change in
+ either ordering may result in a different vcpu_id/redistributor association,
+ resulting in a VM that will fail to run at restore time.
+
Errors:
======= =============================================================
@@ -228,7 +235,7 @@ Groups:
KVM_DEV_ARM_VGIC_CTRL_INIT
request the initialization of the VGIC, no additional parameter in
- kvm_device_attr.addr.
+ kvm_device_attr.addr. Must be called after all VCPUs have been created.
KVM_DEV_ARM_VGIC_SAVE_PENDING_TABLES
save all LPI pending bits into guest RAM pending tables.
diff --git a/Documentation/virt/kvm/devices/vcpu.rst b/Documentation/virt/kvm/devices/vcpu.rst
index 9963e680770a..31f14ec4a65b 100644
--- a/Documentation/virt/kvm/devices/vcpu.rst
+++ b/Documentation/virt/kvm/devices/vcpu.rst
@@ -25,8 +25,10 @@ Returns:
======= ========================================================
-EBUSY The PMU overflow interrupt is already set
- -ENXIO The overflow interrupt not set when attempting to get it
- -ENODEV PMUv3 not supported
+ -EFAULT Error reading interrupt number
+ -ENXIO PMUv3 not supported or the overflow interrupt not set
+ when attempting to get it
+ -ENODEV KVM_ARM_VCPU_PMU_V3 feature missing from VCPU
-EINVAL Invalid PMU overflow interrupt number supplied or
trying to set the IRQ number without using an in-kernel
irqchip.
@@ -45,9 +47,10 @@ all vcpus, while as an SPI it must be a separate number per vcpu.
Returns:
======= ======================================================
+ -EEXIST Interrupt number already used
-ENODEV PMUv3 not supported or GIC not initialized
- -ENXIO PMUv3 not properly configured or in-kernel irqchip not
- configured as required prior to calling this attribute
+ -ENXIO PMUv3 not supported, missing VCPU feature or interrupt
+ number not set
-EBUSY PMUv3 already initialized
======= ======================================================
@@ -55,11 +58,89 @@ Request the initialization of the PMUv3. If using the PMUv3 with an in-kernel
virtual GIC implementation, this must be done after initializing the in-kernel
irqchip.
+1.3 ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_FILTER
+-----------------------------------------
+
+:Parameters: in kvm_device_attr.addr the address for a PMU event filter is a
+ pointer to a struct kvm_pmu_event_filter
+
+:Returns:
+
+ ======= ======================================================
+ -ENODEV PMUv3 not supported or GIC not initialized
+ -ENXIO PMUv3 not properly configured or in-kernel irqchip not
+ configured as required prior to calling this attribute
+ -EBUSY PMUv3 already initialized or a VCPU has already run
+ -EINVAL Invalid filter range
+ ======= ======================================================
+
+Request the installation of a PMU event filter described as follows::
+
+ struct kvm_pmu_event_filter {
+ __u16 base_event;
+ __u16 nevents;
+
+ #define KVM_PMU_EVENT_ALLOW 0
+ #define KVM_PMU_EVENT_DENY 1
+
+ __u8 action;
+ __u8 pad[3];
+ };
+
+A filter range is defined as the range [@base_event, @base_event + @nevents),
+together with an @action (KVM_PMU_EVENT_ALLOW or KVM_PMU_EVENT_DENY). The
+first registered range defines the global policy (global ALLOW if the first
+@action is DENY, global DENY if the first @action is ALLOW). Multiple ranges
+can be programmed, and must fit within the event space defined by the PMU
+architecture (10 bits on ARMv8.0, 16 bits from ARMv8.1 onwards).
+
+Note: "Cancelling" a filter by registering the opposite action for the same
+range doesn't change the default action. For example, installing an ALLOW
+filter for event range [0:10) as the first filter and then applying a DENY
+action for the same range will leave the whole range as disabled.
+
+Restrictions: Event 0 (SW_INCR) is never filtered, as it doesn't count a
+hardware event. Filtering event 0x1E (CHAIN) has no effect either, as it
+isn't strictly speaking an event. Filtering the cycle counter is possible
+using event 0x11 (CPU_CYCLES).
+
+1.4 ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_SET_PMU
+------------------------------------------
+
+:Parameters: in kvm_device_attr.addr the address to an int representing the PMU
+ identifier.
+
+:Returns:
+
+ ======= ====================================================
+ -EBUSY PMUv3 already initialized, a VCPU has already run or
+ an event filter has already been set
+ -EFAULT Error accessing the PMU identifier
+ -ENXIO PMU not found
+ -ENODEV PMUv3 not supported or GIC not initialized
+ -ENOMEM Could not allocate memory
+ ======= ====================================================
+
+Request that the VCPU uses the specified hardware PMU when creating guest events
+for the purpose of PMU emulation. The PMU identifier can be read from the "type"
+file for the desired PMU instance under /sys/devices (or, equivalent,
+/sys/bus/even_source). This attribute is particularly useful on heterogeneous
+systems where there are at least two CPU PMUs on the system. The PMU that is set
+for one VCPU will be used by all the other VCPUs. It isn't possible to set a PMU
+if a PMU event filter is already present.
+
+Note that KVM will not make any attempts to run the VCPU on the physical CPUs
+associated with the PMU specified by this attribute. This is entirely left to
+userspace. However, attempting to run the VCPU on a physical CPU not supported
+by the PMU will fail and KVM_RUN will return with
+exit_reason = KVM_EXIT_FAIL_ENTRY and populate the fail_entry struct by setting
+hardare_entry_failure_reason field to KVM_EXIT_FAIL_ENTRY_CPU_UNSUPPORTED and
+the cpu field to the processor id.
2. GROUP: KVM_ARM_VCPU_TIMER_CTRL
=================================
-:Architectures: ARM, ARM64
+:Architectures: ARM64
2.1. ATTRIBUTES: KVM_ARM_VCPU_TIMER_IRQ_VTIMER, KVM_ARM_VCPU_TIMER_IRQ_PTIMER
-----------------------------------------------------------------------------
@@ -90,6 +171,8 @@ configured values on other VCPUs. Userspace should configure the interrupt
numbers on at least one VCPU after creating all VCPUs and before running any
VCPUs.
+.. _kvm_arm_vcpu_pvtime_ctrl:
+
3. GROUP: KVM_ARM_VCPU_PVTIME_CTRL
==================================
@@ -110,5 +193,75 @@ Returns:
Specifies the base address of the stolen time structure for this VCPU. The
base address must be 64 byte aligned and exist within a valid guest memory
-region. See Documentation/virt/kvm/arm/pvtime.txt for more information
+region. See Documentation/virt/kvm/arm/pvtime.rst for more information
including the layout of the stolen time structure.
+
+4. GROUP: KVM_VCPU_TSC_CTRL
+===========================
+
+:Architectures: x86
+
+4.1 ATTRIBUTE: KVM_VCPU_TSC_OFFSET
+
+:Parameters: 64-bit unsigned TSC offset
+
+Returns:
+
+ ======= ======================================
+ -EFAULT Error reading/writing the provided
+ parameter address.
+ -ENXIO Attribute not supported
+ ======= ======================================
+
+Specifies the guest's TSC offset relative to the host's TSC. The guest's
+TSC is then derived by the following equation:
+
+ guest_tsc = host_tsc + KVM_VCPU_TSC_OFFSET
+
+This attribute is useful to adjust the guest's TSC on live migration,
+so that the TSC counts the time during which the VM was paused. The
+following describes a possible algorithm to use for this purpose.
+
+From the source VMM process:
+
+1. Invoke the KVM_GET_CLOCK ioctl to record the host TSC (tsc_src),
+ kvmclock nanoseconds (guest_src), and host CLOCK_REALTIME nanoseconds
+ (host_src).
+
+2. Read the KVM_VCPU_TSC_OFFSET attribute for every vCPU to record the
+ guest TSC offset (ofs_src[i]).
+
+3. Invoke the KVM_GET_TSC_KHZ ioctl to record the frequency of the
+ guest's TSC (freq).
+
+From the destination VMM process:
+
+4. Invoke the KVM_SET_CLOCK ioctl, providing the source nanoseconds from
+ kvmclock (guest_src) and CLOCK_REALTIME (host_src) in their respective
+ fields. Ensure that the KVM_CLOCK_REALTIME flag is set in the provided
+ structure.
+
+ KVM will advance the VM's kvmclock to account for elapsed time since
+ recording the clock values. Note that this will cause problems in
+ the guest (e.g., timeouts) unless CLOCK_REALTIME is synchronized
+ between the source and destination, and a reasonably short time passes
+ between the source pausing the VMs and the destination executing
+ steps 4-7.
+
+5. Invoke the KVM_GET_CLOCK ioctl to record the host TSC (tsc_dest) and
+ kvmclock nanoseconds (guest_dest).
+
+6. Adjust the guest TSC offsets for every vCPU to account for (1) time
+ elapsed since recording state and (2) difference in TSCs between the
+ source and destination machine:
+
+ ofs_dst[i] = ofs_src[i] -
+ (guest_src - guest_dest) * freq +
+ (tsc_src - tsc_dest)
+
+ ("ofs[i] + tsc - guest * freq" is the guest TSC value corresponding to
+ a time of 0 in kvmclock. The above formula ensures that it is the
+ same on the destination as it was on the source).
+
+7. Write the KVM_VCPU_TSC_OFFSET attribute for every vCPU with the
+ respective value derived in the previous step.
diff --git a/Documentation/virt/kvm/devices/vfio.rst b/Documentation/virt/kvm/devices/vfio.rst
index 2d20dc561069..c549143bb891 100644
--- a/Documentation/virt/kvm/devices/vfio.rst
+++ b/Documentation/virt/kvm/devices/vfio.rst
@@ -9,22 +9,34 @@ Device types supported:
- KVM_DEV_TYPE_VFIO
Only one VFIO instance may be created per VM. The created device
-tracks VFIO groups in use by the VM and features of those groups
-important to the correctness and acceleration of the VM. As groups
-are enabled and disabled for use by the VM, KVM should be updated
-about their presence. When registered with KVM, a reference to the
-VFIO-group is held by KVM.
+tracks VFIO files (group or device) in use by the VM and features
+of those groups/devices important to the correctness and acceleration
+of the VM. As groups/devices are enabled and disabled for use by the
+VM, KVM should be updated about their presence. When registered with
+KVM, a reference to the VFIO file is held by KVM.
Groups:
- KVM_DEV_VFIO_GROUP
-
-KVM_DEV_VFIO_GROUP attributes:
- KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device tracking
- kvm_device_attr.addr points to an int32_t file descriptor
- for the VFIO group.
- KVM_DEV_VFIO_GROUP_DEL: Remove a VFIO group from VFIO-KVM device tracking
- kvm_device_attr.addr points to an int32_t file descriptor
- for the VFIO group.
+ KVM_DEV_VFIO_FILE
+ alias: KVM_DEV_VFIO_GROUP
+
+KVM_DEV_VFIO_FILE attributes:
+ KVM_DEV_VFIO_FILE_ADD: Add a VFIO file (group/device) to VFIO-KVM device
+ tracking
+
+ kvm_device_attr.addr points to an int32_t file descriptor for the
+ VFIO file.
+
+ KVM_DEV_VFIO_FILE_DEL: Remove a VFIO file (group/device) from VFIO-KVM
+ device tracking
+
+ kvm_device_attr.addr points to an int32_t file descriptor for the
+ VFIO file.
+
+KVM_DEV_VFIO_GROUP (legacy kvm device group restricted to the handling of VFIO group fd):
+ KVM_DEV_VFIO_GROUP_ADD: same as KVM_DEV_VFIO_FILE_ADD for group fd only
+
+ KVM_DEV_VFIO_GROUP_DEL: same as KVM_DEV_VFIO_FILE_DEL for group fd only
+
KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE: attaches a guest visible TCE table
allocated by sPAPR KVM.
kvm_device_attr.addr points to a struct::
@@ -39,3 +51,11 @@ KVM_DEV_VFIO_GROUP attributes:
- @groupfd is a file descriptor for a VFIO group;
- @tablefd is a file descriptor for a TCE table allocated via
KVM_CREATE_SPAPR_TCE.
+
+The FILE/GROUP_ADD operation above should be invoked prior to accessing the
+device file descriptor via VFIO_GROUP_GET_DEVICE_FD in order to support
+drivers which require a kvm pointer to be set in their .open_device()
+callback. It is the same for device file descriptor via character device
+open which gets device access via VFIO_DEVICE_BIND_IOMMUFD. For such file
+descriptors, FILE_ADD should be invoked before VFIO_DEVICE_BIND_IOMMUFD
+to support the drivers mentioned in prior sentence as well.
diff --git a/Documentation/virt/kvm/devices/vm.rst b/Documentation/virt/kvm/devices/vm.rst
index 0aa5b1cfd700..a4d39fa1b083 100644
--- a/Documentation/virt/kvm/devices/vm.rst
+++ b/Documentation/virt/kvm/devices/vm.rst
@@ -92,7 +92,7 @@ Allows user space to retrieve or request to change cpu related information for a
KVM does not enforce or limit the cpu model data in any form. Take the information
retrieved by means of KVM_S390_VM_CPU_MACHINE as hint for reasonable configuration
setups. Instruction interceptions triggered by additionally set facility bits that
-are not handled by KVM need to by imlemented in the VM driver code.
+are not handled by KVM need to by implemented in the VM driver code.
:Parameters: address of buffer to store/set the processor related cpu
data of type struct kvm_s390_vm_cpu_processor*.
@@ -215,6 +215,7 @@ KVM_S390_VM_TOD_EXT).
:Parameters: address of a buffer in user space to store the data (u8) to
:Returns: -EFAULT if the given address is not accessible from kernel space;
-EINVAL if setting the TOD clock extension to != 0 is not supported
+ -EOPNOTSUPP for a PV guest (TOD managed by the ultravisor)
3.2. ATTRIBUTE: KVM_S390_VM_TOD_LOW
-----------------------------------
@@ -224,6 +225,7 @@ the POP (u64).
:Parameters: address of a buffer in user space to store the data (u64) to
:Returns: -EFAULT if the given address is not accessible from kernel space
+ -EOPNOTSUPP for a PV guest (TOD managed by the ultravisor)
3.3. ATTRIBUTE: KVM_S390_VM_TOD_EXT
-----------------------------------
@@ -237,6 +239,7 @@ it, it is stored as 0 and not allowed to be set to a value != 0.
(kvm_s390_vm_tod_clock) to
:Returns: -EFAULT if the given address is not accessible from kernel space;
-EINVAL if setting the TOD clock extension to != 0 is not supported
+ -EOPNOTSUPP for a PV guest (TOD managed by the ultravisor)
4. GROUP: KVM_S390_VM_CRYPTO
============================
@@ -299,6 +302,10 @@ Allows userspace to start migration mode, needed for PGSTE migration.
Setting this attribute when migration mode is already active will have
no effects.
+Dirty tracking must be enabled on all memslots, else -EINVAL is returned. When
+dirty tracking is disabled on any memslot, migration mode is automatically
+stopped.
+
:Parameters: none
:Returns: -ENOMEM if there is not enough free memory to start migration mode;
-EINVAL if the state of the VM is invalid (e.g. no memory defined);
@@ -314,3 +321,82 @@ Allows userspace to query the status of migration mode.
if it is enabled
:Returns: -EFAULT if the given address is not accessible from kernel space;
0 in case of success.
+
+6. GROUP: KVM_ARM_VM_SMCCC_CTRL
+===============================
+
+:Architectures: arm64
+
+6.1. ATTRIBUTE: KVM_ARM_VM_SMCCC_FILTER (w/o)
+---------------------------------------------
+
+:Parameters: Pointer to a ``struct kvm_smccc_filter``
+
+:Returns:
+
+ ====== ===========================================
+ EEXIST Range intersects with a previously inserted
+ or reserved range
+ EBUSY A vCPU in the VM has already run
+ EINVAL Invalid filter configuration
+ ENOMEM Failed to allocate memory for the in-kernel
+ representation of the SMCCC filter
+ ====== ===========================================
+
+Requests the installation of an SMCCC call filter described as follows::
+
+ enum kvm_smccc_filter_action {
+ KVM_SMCCC_FILTER_HANDLE = 0,
+ KVM_SMCCC_FILTER_DENY,
+ KVM_SMCCC_FILTER_FWD_TO_USER,
+ };
+
+ struct kvm_smccc_filter {
+ __u32 base;
+ __u32 nr_functions;
+ __u8 action;
+ __u8 pad[15];
+ };
+
+The filter is defined as a set of non-overlapping ranges. Each
+range defines an action to be applied to SMCCC calls within the range.
+Userspace can insert multiple ranges into the filter by using
+successive calls to this attribute.
+
+The default configuration of KVM is such that all implemented SMCCC
+calls are allowed. Thus, the SMCCC filter can be defined sparsely
+by userspace, only describing ranges that modify the default behavior.
+
+The range expressed by ``struct kvm_smccc_filter`` is
+[``base``, ``base + nr_functions``). The range is not allowed to wrap,
+i.e. userspace cannot rely on ``base + nr_functions`` overflowing.
+
+The SMCCC filter applies to both SMC and HVC calls initiated by the
+guest. The SMCCC filter gates the in-kernel emulation of SMCCC calls
+and as such takes effect before other interfaces that interact with
+SMCCC calls (e.g. hypercall bitmap registers).
+
+Actions:
+
+ - ``KVM_SMCCC_FILTER_HANDLE``: Allows the guest SMCCC call to be
+ handled in-kernel. It is strongly recommended that userspace *not*
+ explicitly describe the allowed SMCCC call ranges.
+
+ - ``KVM_SMCCC_FILTER_DENY``: Rejects the guest SMCCC call in-kernel
+ and returns to the guest.
+
+ - ``KVM_SMCCC_FILTER_FWD_TO_USER``: The guest SMCCC call is forwarded
+ to userspace with an exit reason of ``KVM_EXIT_HYPERCALL``.
+
+The ``pad`` field is reserved for future use and must be zero. KVM may
+return ``-EINVAL`` if the field is nonzero.
+
+KVM reserves the 'Arm Architecture Calls' range of function IDs and
+will reject attempts to define a filter for any portion of these ranges:
+
+ =========== ===============
+ Start End (inclusive)
+ =========== ===============
+ 0x8000_0000 0x8000_FFFF
+ 0xC000_0000 0xC000_FFFF
+ =========== ===============
diff --git a/Documentation/virt/kvm/devices/xics.rst b/Documentation/virt/kvm/devices/xics.rst
index 2d6927e0b776..bf32c77174ab 100644
--- a/Documentation/virt/kvm/devices/xics.rst
+++ b/Documentation/virt/kvm/devices/xics.rst
@@ -22,7 +22,7 @@ Groups:
Errors:
======= ==========================================
- -EINVAL Value greater than KVM_MAX_VCPU_ID.
+ -EINVAL Value greater than KVM_MAX_VCPU_IDS.
-EFAULT Invalid user pointer for attr->addr.
-EBUSY A vcpu is already connected to the device.
======= ==========================================
diff --git a/Documentation/virt/kvm/devices/xive.rst b/Documentation/virt/kvm/devices/xive.rst
index 8bdf3dc38f01..a07e16d34006 100644
--- a/Documentation/virt/kvm/devices/xive.rst
+++ b/Documentation/virt/kvm/devices/xive.rst
@@ -50,7 +50,7 @@ the legacy interrupt mode, referred as XICS (POWER7/8).
When a device is passed-through into the guest, the source
interrupts are from a different HW controller (PHB4) and the ESB
- pages exposed to the guest should accommadate this change.
+ pages exposed to the guest should accommodate this change.
The passthru_irq helpers, kvmppc_xive_set_mapped() and
kvmppc_xive_clr_mapped() are called when the device HW irqs are
@@ -91,7 +91,7 @@ the legacy interrupt mode, referred as XICS (POWER7/8).
Errors:
======= ==========================================
- -EINVAL Value greater than KVM_MAX_VCPU_ID.
+ -EINVAL Value greater than KVM_MAX_VCPU_IDS.
-EFAULT Invalid user pointer for attr->addr.
-EBUSY A vCPU is already connected to the device.
======= ==========================================