diff options
Diffstat (limited to 'Documentation')
11 files changed, 707 insertions, 32 deletions
diff --git a/Documentation/arm64/perf.txt b/Documentation/arm64/perf.txt new file mode 100644 index 000000000000..0d6a7d87d49e --- /dev/null +++ b/Documentation/arm64/perf.txt @@ -0,0 +1,85 @@ +Perf Event Attributes +===================== + +Author: Andrew Murray <andrew.murray@arm.com> +Date: 2019-03-06 + +exclude_user +------------ + +This attribute excludes userspace. + +Userspace always runs at EL0 and thus this attribute will exclude EL0. + + +exclude_kernel +-------------- + +This attribute excludes the kernel. + +The kernel runs at EL2 with VHE and EL1 without. Guest kernels always run +at EL1. + +For the host this attribute will exclude EL1 and additionally EL2 on a VHE +system. + +For the guest this attribute will exclude EL1. Please note that EL2 is +never counted within a guest. + + +exclude_hv +---------- + +This attribute excludes the hypervisor. + +For a VHE host this attribute is ignored as we consider the host kernel to +be the hypervisor. + +For a non-VHE host this attribute will exclude EL2 as we consider the +hypervisor to be any code that runs at EL2 which is predominantly used for +guest/host transitions. + +For the guest this attribute has no effect. Please note that EL2 is +never counted within a guest. + + +exclude_host / exclude_guest +---------------------------- + +These attributes exclude the KVM host and guest, respectively. + +The KVM host may run at EL0 (userspace), EL1 (non-VHE kernel) and EL2 (VHE +kernel or non-VHE hypervisor). + +The KVM guest may run at EL0 (userspace) and EL1 (kernel). + +Due to the overlapping exception levels between host and guests we cannot +exclusively rely on the PMU's hardware exception filtering - therefore we +must enable/disable counting on the entry and exit to the guest. This is +performed differently on VHE and non-VHE systems. + +For non-VHE systems we exclude EL2 for exclude_host - upon entering and +exiting the guest we disable/enable the event as appropriate based on the +exclude_host and exclude_guest attributes. + +For VHE systems we exclude EL1 for exclude_guest and exclude both EL0,EL2 +for exclude_host. Upon entering and exiting the guest we modify the event +to include/exclude EL0 as appropriate based on the exclude_host and +exclude_guest attributes. + +The statements above also apply when these attributes are used within a +non-VHE guest however please note that EL2 is never counted within a guest. + + +Accuracy +-------- + +On non-VHE hosts we enable/disable counters on the entry/exit of host/guest +transition at EL2 - however there is a period of time between +enabling/disabling the counters and entering/exiting the guest. We are +able to eliminate counters counting host events on the boundaries of guest +entry/exit when counting guest events by filtering out EL2 for +exclude_host. However when using !exclude_hv there is a small blackout +window at the guest entry/exit where host events are not captured. + +On VHE systems there are no blackout windows. diff --git a/Documentation/arm64/pointer-authentication.txt b/Documentation/arm64/pointer-authentication.txt index 5baca42ba146..fc71b33de87e 100644 --- a/Documentation/arm64/pointer-authentication.txt +++ b/Documentation/arm64/pointer-authentication.txt @@ -87,7 +87,21 @@ used to get and set the keys for a thread. Virtualization -------------- -Pointer authentication is not currently supported in KVM guests. KVM -will mask the feature bits from ID_AA64ISAR1_EL1, and attempted use of -the feature will result in an UNDEFINED exception being injected into -the guest. +Pointer authentication is enabled in KVM guest when each virtual cpu is +initialised by passing flags KVM_ARM_VCPU_PTRAUTH_[ADDRESS/GENERIC] and +requesting these two separate cpu features to be enabled. The current KVM +guest implementation works by enabling both features together, so both +these userspace flags are checked before enabling pointer authentication. +The separate userspace flag will allow to have no userspace ABI changes +if support is added in the future to allow these two features to be +enabled independently of one another. + +As Arm Architecture specifies that Pointer Authentication feature is +implemented along with the VHE feature so KVM arm64 ptrauth code relies +on VHE mode to be present. + +Additionally, when these vcpu feature flags are not set then KVM will +filter out the Pointer Authentication system key registers from +KVM_GET/SET_REG_* ioctls and mask those features from cpufeature ID +register. Any attempt to use the Pointer Authentication instructions will +result in an UNDEFINED exception being injected into the guest. diff --git a/Documentation/devicetree/bindings/arm/atmel-at91.txt b/Documentation/devicetree/bindings/arm/atmel-at91.txt index 4bf1b4da7659..99dee23c74a4 100644 --- a/Documentation/devicetree/bindings/arm/atmel-at91.txt +++ b/Documentation/devicetree/bindings/arm/atmel-at91.txt @@ -25,6 +25,7 @@ compatible: must be one of: o "atmel,at91sam9n12" o "atmel,at91sam9rl" o "atmel,at91sam9xe" + o "microchip,sam9x60" * "atmel,sama5" for SoCs using a Cortex-A5, shall be extended with the specific SoC family: o "atmel,sama5d2" shall be extended with the specific SoC compatible: diff --git a/Documentation/devicetree/bindings/arm/keystone/ti,sci.txt b/Documentation/devicetree/bindings/arm/keystone/ti,sci.txt index b56a02c10ae6..6f0cd31c1520 100644 --- a/Documentation/devicetree/bindings/arm/keystone/ti,sci.txt +++ b/Documentation/devicetree/bindings/arm/keystone/ti,sci.txt @@ -24,7 +24,8 @@ relationship between the TI-SCI parent node to the child node. Required properties: ------------------- -- compatible: should be "ti,k2g-sci" +- compatible: should be "ti,k2g-sci" for TI 66AK2G SoC + should be "ti,am654-sci" for for TI AM654 SoC - mbox-names: "rx" - Mailbox corresponding to receive path "tx" - Mailbox corresponding to transmit path diff --git a/Documentation/devicetree/bindings/interrupt-controller/ti,sci-inta.txt b/Documentation/devicetree/bindings/interrupt-controller/ti,sci-inta.txt new file mode 100644 index 000000000000..7841cb099e13 --- /dev/null +++ b/Documentation/devicetree/bindings/interrupt-controller/ti,sci-inta.txt @@ -0,0 +1,66 @@ +Texas Instruments K3 Interrupt Aggregator +========================================= + +The Interrupt Aggregator (INTA) provides a centralized machine +which handles the termination of system events to that they can +be coherently processed by the host(s) in the system. A maximum +of 64 events can be mapped to a single interrupt. + + + Interrupt Aggregator + +-----------------------------------------+ + | Intmap VINT | + | +--------------+ +------------+ | + m ------>| | vint | bit | | 0 |.....|63| vint0 | + . | +--------------+ +------------+ | +------+ + . | . . | | HOST | +Globalevents ------>| . . |------>| IRQ | + . | . . | | CTRL | + . | . . | +------+ + n ------>| +--------------+ +------------+ | + | | vint | bit | | 0 |.....|63| vintx | + | +--------------+ +------------+ | + | | + +-----------------------------------------+ + +Configuration of these Intmap registers that maps global events to vint is done +by a system controller (like the Device Memory and Security Controller on K3 +AM654 SoC). Driver should request the system controller to get the range +of global events and vints assigned to the requesting host. Management +of these requested resources should be handled by driver and requests +system controller to map specific global event to vint, bit pair. + +Communication between the host processor running an OS and the system +controller happens through a protocol called TI System Control Interface +(TISCI protocol). For more details refer: +Documentation/devicetree/bindings/arm/keystone/ti,sci.txt + +TISCI Interrupt Aggregator Node: +------------------------------- +- compatible: Must be "ti,sci-inta". +- reg: Should contain registers location and length. +- interrupt-controller: Identifies the node as an interrupt controller +- msi-controller: Identifies the node as an MSI controller. +- interrupt-parent: phandle of irq parent. +- ti,sci: Phandle to TI-SCI compatible System controller node. +- ti,sci-dev-id: TISCI device ID of the Interrupt Aggregator. +- ti,sci-rm-range-vint: Array of TISCI subtype ids representing vints(inta + outputs) range within this INTA, assigned to the + requesting host context. +- ti,sci-rm-range-global-event: Array of TISCI subtype ids representing the + global events range reaching this IA and are assigned + to the requesting host context. + +Example: +-------- +main_udmass_inta: interrupt-controller@33d00000 { + compatible = "ti,sci-inta"; + reg = <0x0 0x33d00000 0x0 0x100000>; + interrupt-controller; + msi-controller; + interrupt-parent = <&main_navss_intr>; + ti,sci = <&dmsc>; + ti,sci-dev-id = <179>; + ti,sci-rm-range-vint = <0x0>; + ti,sci-rm-range-global-event = <0x1>; +}; diff --git a/Documentation/devicetree/bindings/interrupt-controller/ti,sci-intr.txt b/Documentation/devicetree/bindings/interrupt-controller/ti,sci-intr.txt new file mode 100644 index 000000000000..1a8718f8855d --- /dev/null +++ b/Documentation/devicetree/bindings/interrupt-controller/ti,sci-intr.txt @@ -0,0 +1,82 @@ +Texas Instruments K3 Interrupt Router +===================================== + +The Interrupt Router (INTR) module provides a mechanism to mux M +interrupt inputs to N interrupt outputs, where all M inputs are selectable +to be driven per N output. An Interrupt Router can either handle edge triggered +or level triggered interrupts and that is fixed in hardware. + + Interrupt Router + +----------------------+ + | Inputs Outputs | + +-------+ | +------+ +-----+ | + | GPIO |----------->| | irq0 | | 0 | | Host IRQ + +-------+ | +------+ +-----+ | controller + | . . | +-------+ + +-------+ | . . |----->| IRQ | + | INTA |----------->| . . | +-------+ + +-------+ | . +-----+ | + | +------+ | N | | + | | irqM | +-----+ | + | +------+ | + | | + +----------------------+ + +There is one register per output (MUXCNTL_N) that controls the selection. +Configuration of these MUXCNTL_N registers is done by a system controller +(like the Device Memory and Security Controller on K3 AM654 SoC). System +controller will keep track of the used and unused registers within the Router. +Driver should request the system controller to get the range of GIC IRQs +assigned to the requesting hosts. It is the drivers responsibility to keep +track of Host IRQs. + +Communication between the host processor running an OS and the system +controller happens through a protocol called TI System Control Interface +(TISCI protocol). For more details refer: +Documentation/devicetree/bindings/arm/keystone/ti,sci.txt + +TISCI Interrupt Router Node: +---------------------------- +Required Properties: +- compatible: Must be "ti,sci-intr". +- ti,intr-trigger-type: Should be one of the following: + 1: If intr supports edge triggered interrupts. + 4: If intr supports level triggered interrupts. +- interrupt-controller: Identifies the node as an interrupt controller +- #interrupt-cells: Specifies the number of cells needed to encode an + interrupt source. The value should be 2. + First cell should contain the TISCI device ID of source + Second cell should contain the interrupt source offset + within the device. +- ti,sci: Phandle to TI-SCI compatible System controller node. +- ti,sci-dst-id: TISCI device ID of the destination IRQ controller. +- ti,sci-rm-range-girq: Array of TISCI subtype ids representing the host irqs + assigned to this interrupt router. Each subtype id + corresponds to a range of host irqs. + +For more details on TISCI IRQ resource management refer: +http://downloads.ti.com/tisci/esd/latest/2_tisci_msgs/rm/rm_irq.html + +Example: +-------- +The following example demonstrates both interrupt router node and the consumer +node(main gpio) on the AM654 SoC: + +main_intr: interrupt-controller0 { + compatible = "ti,sci-intr"; + ti,intr-trigger-type = <1>; + interrupt-controller; + interrupt-parent = <&gic500>; + #interrupt-cells = <2>; + ti,sci = <&dmsc>; + ti,sci-dst-id = <56>; + ti,sci-rm-range-girq = <0x1>; +}; + +main_gpio0: gpio@600000 { + ... + interrupt-parent = <&main_intr>; + interrupts = <57 256>, <57 257>, <57 258>, + <57 259>, <57 260>, <57 261>; + ... +}; diff --git a/Documentation/devicetree/bindings/riscv/sifive-l2-cache.txt b/Documentation/devicetree/bindings/riscv/sifive-l2-cache.txt new file mode 100644 index 000000000000..73d8f19c3bd9 --- /dev/null +++ b/Documentation/devicetree/bindings/riscv/sifive-l2-cache.txt @@ -0,0 +1,51 @@ +SiFive L2 Cache Controller +-------------------------- +The SiFive Level 2 Cache Controller is used to provide access to fast copies +of memory for masters in a Core Complex. The Level 2 Cache Controller also +acts as directory-based coherency manager. +All the properties in ePAPR/DeviceTree specification applies for this platform + +Required Properties: +-------------------- +- compatible: Should be "sifive,fu540-c000-ccache" and "cache" + +- cache-block-size: Specifies the block size in bytes of the cache. + Should be 64 + +- cache-level: Should be set to 2 for a level 2 cache + +- cache-sets: Specifies the number of associativity sets of the cache. + Should be 1024 + +- cache-size: Specifies the size in bytes of the cache. Should be 2097152 + +- cache-unified: Specifies the cache is a unified cache + +- interrupts: Must contain 3 entries (DirError, DataError and DataFail signals) + +- reg: Physical base address and size of L2 cache controller registers map + +Optional Properties: +-------------------- +- next-level-cache: phandle to the next level cache if present. + +- memory-region: reference to the reserved-memory for the L2 Loosely Integrated + Memory region. The reserved memory node should be defined as per the bindings + in reserved-memory.txt + + +Example: + + cache-controller@2010000 { + compatible = "sifive,fu540-c000-ccache", "cache"; + cache-block-size = <64>; + cache-level = <2>; + cache-sets = <1024>; + cache-size = <2097152>; + cache-unified; + interrupt-parent = <&plic0>; + interrupts = <1 2 3>; + reg = <0x0 0x2010000 0x0 0x1000>; + next-level-cache = <&L25 &L40 &L36>; + memory-region = <&l2_lim>; + }; diff --git a/Documentation/devicetree/bindings/timer/allwinner,sun4i-timer.txt b/Documentation/devicetree/bindings/timer/allwinner,sun4i-timer.txt index 5c2e23574ca0..3da9d515c03a 100644 --- a/Documentation/devicetree/bindings/timer/allwinner,sun4i-timer.txt +++ b/Documentation/devicetree/bindings/timer/allwinner,sun4i-timer.txt @@ -2,7 +2,9 @@ Allwinner A1X SoCs Timer Controller Required properties: -- compatible : should be "allwinner,sun4i-a10-timer" +- compatible : should be one of the following: + "allwinner,sun4i-a10-timer" + "allwinner,suniv-f1c100s-timer" - reg : Specifies base physical address and size of the registers. - interrupts : The interrupt of the first timer - clocks: phandle to the source clock (usually a 24 MHz fixed clock) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 64b38dfcc243..ba6c42c576dd 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -69,23 +69,6 @@ by and on behalf of the VM's process may not be freed/unaccounted when the VM is shut down. -It is important to note that althought VM ioctls may only be issued from -the process that created the VM, a VM's lifecycle is associated with its -file descriptor, not its creator (process). In other words, the VM and -its resources, *including the associated address space*, are not freed -until the last reference to the VM's file descriptor has been released. -For example, if fork() is issued after ioctl(KVM_CREATE_VM), the VM will -not be freed until both the parent (original) process and its child have -put their references to the VM's file descriptor. - -Because a VM's resources are not freed until the last reference to its -file descriptor is released, creating additional references to a VM via -via fork(), dup(), etc... without careful consideration is strongly -discouraged and may have unwanted side effects, e.g. memory allocated -by and on behalf of the VM's process may not be freed/unaccounted when -the VM is shut down. - - 3. Extensions ------------- @@ -347,7 +330,7 @@ They must be less than the value that KVM_CHECK_EXTENSION returns for the KVM_CAP_MULTI_ADDRESS_SPACE capability. The bits in the dirty bitmap are cleared before the ioctl returns, unless -KVM_CAP_MANUAL_DIRTY_LOG_PROTECT is enabled. For more information, +KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 is enabled. For more information, see the description of the capability. 4.9 KVM_SET_MEMORY_ALIAS @@ -1117,9 +1100,8 @@ struct kvm_userspace_memory_region { This ioctl allows the user to create, modify or delete a guest physical memory slot. Bits 0-15 of "slot" specify the slot id and this value should be less than the maximum number of user memory slots supported per -VM. The maximum allowed slots can be queried using KVM_CAP_NR_MEMSLOTS, -if this capability is supported by the architecture. Slots may not -overlap in guest physical address space. +VM. The maximum allowed slots can be queried using KVM_CAP_NR_MEMSLOTS. +Slots may not overlap in guest physical address space. If KVM_CAP_MULTI_ADDRESS_SPACE is available, bits 16-31 of "slot" specifies the address space which is being modified. They must be @@ -1901,6 +1883,12 @@ Architectures: all Type: vcpu ioctl Parameters: struct kvm_one_reg (in) Returns: 0 on success, negative value on failure +Errors: + ENOENT: no such register + EINVAL: invalid register ID, or no such register + EPERM: (arm64) register access not allowed before vcpu finalization +(These error codes are indicative only: do not rely on a specific error +code being returned in a specific situation.) struct kvm_one_reg { __u64 id; @@ -1985,6 +1973,7 @@ registers, find a list below: PPC | KVM_REG_PPC_TLB3PS | 32 PPC | KVM_REG_PPC_EPTCFG | 32 PPC | KVM_REG_PPC_ICP_STATE | 64 + PPC | KVM_REG_PPC_VP_STATE | 128 PPC | KVM_REG_PPC_TB_OFFSET | 64 PPC | KVM_REG_PPC_SPMC1 | 32 PPC | KVM_REG_PPC_SPMC2 | 32 @@ -2137,6 +2126,37 @@ contains elements ranging from 32 to 128 bits. The index is a 32bit value in the kvm_regs structure seen as a 32bit array. 0x60x0 0000 0010 <index into the kvm_regs struct:16> +Specifically: + Encoding Register Bits kvm_regs member +---------------------------------------------------------------- + 0x6030 0000 0010 0000 X0 64 regs.regs[0] + 0x6030 0000 0010 0002 X1 64 regs.regs[1] + ... + 0x6030 0000 0010 003c X30 64 regs.regs[30] + 0x6030 0000 0010 003e SP 64 regs.sp + 0x6030 0000 0010 0040 PC 64 regs.pc + 0x6030 0000 0010 0042 PSTATE 64 regs.pstate + 0x6030 0000 0010 0044 SP_EL1 64 sp_el1 + 0x6030 0000 0010 0046 ELR_EL1 64 elr_el1 + 0x6030 0000 0010 0048 SPSR_EL1 64 spsr[KVM_SPSR_EL1] (alias SPSR_SVC) + 0x6030 0000 0010 004a SPSR_ABT 64 spsr[KVM_SPSR_ABT] + 0x6030 0000 0010 004c SPSR_UND 64 spsr[KVM_SPSR_UND] + 0x6030 0000 0010 004e SPSR_IRQ 64 spsr[KVM_SPSR_IRQ] + 0x6060 0000 0010 0050 SPSR_FIQ 64 spsr[KVM_SPSR_FIQ] + 0x6040 0000 0010 0054 V0 128 fp_regs.vregs[0] (*) + 0x6040 0000 0010 0058 V1 128 fp_regs.vregs[1] (*) + ... + 0x6040 0000 0010 00d0 V31 128 fp_regs.vregs[31] (*) + 0x6020 0000 0010 00d4 FPSR 32 fp_regs.fpsr + 0x6020 0000 0010 00d5 FPCR 32 fp_regs.fpcr + +(*) These encodings are not accepted for SVE-enabled vcpus. See + KVM_ARM_VCPU_INIT. + + The equivalent register content can be accessed via bits [127:0] of + the corresponding SVE Zn registers instead for vcpus that have SVE + enabled (see below). + arm64 CCSIDR registers are demultiplexed by CSSELR value: 0x6020 0000 0011 00 <csselr:8> @@ -2146,6 +2166,64 @@ arm64 system registers have the following id bit patterns: arm64 firmware pseudo-registers have the following bit pattern: 0x6030 0000 0014 <regno:16> +arm64 SVE registers have the following bit patterns: + 0x6080 0000 0015 00 <n:5> <slice:5> Zn bits[2048*slice + 2047 : 2048*slice] + 0x6050 0000 0015 04 <n:4> <slice:5> Pn bits[256*slice + 255 : 256*slice] + 0x6050 0000 0015 060 <slice:5> FFR bits[256*slice + 255 : 256*slice] + 0x6060 0000 0015 ffff KVM_REG_ARM64_SVE_VLS pseudo-register + +Access to register IDs where 2048 * slice >= 128 * max_vq will fail with +ENOENT. max_vq is the vcpu's maximum supported vector length in 128-bit +quadwords: see (**) below. + +These registers are only accessible on vcpus for which SVE is enabled. +See KVM_ARM_VCPU_INIT for details. + +In addition, except for KVM_REG_ARM64_SVE_VLS, these registers are not +accessible until the vcpu's SVE configuration has been finalized +using KVM_ARM_VCPU_FINALIZE(KVM_ARM_VCPU_SVE). See KVM_ARM_VCPU_INIT +and KVM_ARM_VCPU_FINALIZE for more information about this procedure. + +KVM_REG_ARM64_SVE_VLS is a pseudo-register that allows the set of vector +lengths supported by the vcpu to be discovered and configured by +userspace. When transferred to or from user memory via KVM_GET_ONE_REG +or KVM_SET_ONE_REG, the value of this register is of type +__u64[KVM_ARM64_SVE_VLS_WORDS], and encodes the set of vector lengths as +follows: + +__u64 vector_lengths[KVM_ARM64_SVE_VLS_WORDS]; + +if (vq >= SVE_VQ_MIN && vq <= SVE_VQ_MAX && + ((vector_lengths[(vq - KVM_ARM64_SVE_VQ_MIN) / 64] >> + ((vq - KVM_ARM64_SVE_VQ_MIN) % 64)) & 1)) + /* Vector length vq * 16 bytes supported */ +else + /* Vector length vq * 16 bytes not supported */ + +(**) The maximum value vq for which the above condition is true is +max_vq. This is the maximum vector length available to the guest on +this vcpu, and determines which register slices are visible through +this ioctl interface. + +(See Documentation/arm64/sve.txt for an explanation of the "vq" +nomenclature.) + +KVM_REG_ARM64_SVE_VLS is only accessible after KVM_ARM_VCPU_INIT. +KVM_ARM_VCPU_INIT initialises it to the best set of vector lengths that +the host supports. + +Userspace may subsequently modify it if desired until the vcpu's SVE +configuration is finalized using KVM_ARM_VCPU_FINALIZE(KVM_ARM_VCPU_SVE). + +Apart from simply removing all vector lengths from the host set that +exceed some value, support for arbitrarily chosen sets of vector lengths +is hardware-dependent and may not be available. Attempting to configure +an invalid set of vector lengths via KVM_SET_ONE_REG will fail with +EINVAL. + +After the vcpu's SVE configuration is finalized, further attempts to +write this register will fail with EPERM. + MIPS registers are mapped using the lower 32 bits. The upper 16 of that is the register group type: @@ -2198,6 +2276,12 @@ Architectures: all Type: vcpu ioctl Parameters: struct kvm_one_reg (in and out) Returns: 0 on success, negative value on failure +Errors include: + ENOENT: no such register + EINVAL: invalid register ID, or no such register + EPERM: (arm64) register access not allowed before vcpu finalization +(These error codes are indicative only: do not rely on a specific error +code being returned in a specific situation.) This ioctl allows to receive the value of a single register implemented in a vcpu. The register to read is indicated by the "id" field of the @@ -2690,6 +2774,49 @@ Possible features: - KVM_ARM_VCPU_PMU_V3: Emulate PMUv3 for the CPU. Depends on KVM_CAP_ARM_PMU_V3. + - KVM_ARM_VCPU_PTRAUTH_ADDRESS: Enables Address Pointer authentication + for arm64 only. + Depends on KVM_CAP_ARM_PTRAUTH_ADDRESS. + If KVM_CAP_ARM_PTRAUTH_ADDRESS and KVM_CAP_ARM_PTRAUTH_GENERIC are + both present, then both KVM_ARM_VCPU_PTRAUTH_ADDRESS and + KVM_ARM_VCPU_PTRAUTH_GENERIC must be requested or neither must be + requested. + + - KVM_ARM_VCPU_PTRAUTH_GENERIC: Enables Generic Pointer authentication + for arm64 only. + Depends on KVM_CAP_ARM_PTRAUTH_GENERIC. + If KVM_CAP_ARM_PTRAUTH_ADDRESS and KVM_CAP_ARM_PTRAUTH_GENERIC are + both present, then both KVM_ARM_VCPU_PTRAUTH_ADDRESS and + KVM_ARM_VCPU_PTRAUTH_GENERIC must be requested or neither must be + requested. + + - KVM_ARM_VCPU_SVE: Enables SVE for the CPU (arm64 only). + Depends on KVM_CAP_ARM_SVE. + Requires KVM_ARM_VCPU_FINALIZE(KVM_ARM_VCPU_SVE): + + * After KVM_ARM_VCPU_INIT: + + - KVM_REG_ARM64_SVE_VLS may be read using KVM_GET_ONE_REG: the + initial value of this pseudo-register indicates the best set of + vector lengths possible for a vcpu on this host. + + * Before KVM_ARM_VCPU_FINALIZE(KVM_ARM_VCPU_SVE): + + - KVM_RUN and KVM_GET_REG_LIST are not available; + + - KVM_GET_ONE_REG and KVM_SET_ONE_REG cannot be used to access + the scalable archietctural SVE registers + KVM_REG_ARM64_SVE_ZREG(), KVM_REG_ARM64_SVE_PREG() or + KVM_REG_ARM64_SVE_FFR; + + - KVM_REG_ARM64_SVE_VLS may optionally be written using + KVM_SET_ONE_REG, to modify the set of vector lengths available + for the vcpu. + + * After KVM_ARM_VCPU_FINALIZE(KVM_ARM_VCPU_SVE): + + - the KVM_REG_ARM64_SVE_VLS pseudo-register is immutable, and can + no longer be written using KVM_SET_ONE_REG. 4.83 KVM_ARM_PREFERRED_TARGET @@ -3809,7 +3936,7 @@ to I/O ports. 4.117 KVM_CLEAR_DIRTY_LOG (vm ioctl) -Capability: KVM_CAP_MANUAL_DIRTY_LOG_PROTECT +Capability: KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 Architectures: x86, arm, arm64, mips Type: vm ioctl Parameters: struct kvm_dirty_log (in) @@ -3842,10 +3969,10 @@ the address space for which you want to return the dirty bitmap. They must be less than the value that KVM_CHECK_EXTENSION returns for the KVM_CAP_MULTI_ADDRESS_SPACE capability. -This ioctl is mostly useful when KVM_CAP_MANUAL_DIRTY_LOG_PROTECT +This ioctl is mostly useful when KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 is enabled; for more information, see the description of the capability. However, it can always be used as long as KVM_CHECK_EXTENSION confirms -that KVM_CAP_MANUAL_DIRTY_LOG_PROTECT is present. +that KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 is present. 4.118 KVM_GET_SUPPORTED_HV_CPUID @@ -3904,6 +4031,40 @@ number of valid entries in the 'entries' array, which is then filled. 'index' and 'flags' fields in 'struct kvm_cpuid_entry2' are currently reserved, userspace should not expect to get any particular value there. +4.119 KVM_ARM_VCPU_FINALIZE + +Architectures: arm, arm64 +Type: vcpu ioctl +Parameters: int feature (in) +Returns: 0 on success, -1 on error +Errors: + EPERM: feature not enabled, needs configuration, or already finalized + EINVAL: feature unknown or not present + +Recognised values for feature: + arm64 KVM_ARM_VCPU_SVE (requires KVM_CAP_ARM_SVE) + +Finalizes the configuration of the specified vcpu feature. + +The vcpu must already have been initialised, enabling the affected feature, by +means of a successful KVM_ARM_VCPU_INIT call with the appropriate flag set in +features[]. + +For affected vcpu features, this is a mandatory step that must be performed +before the vcpu is fully usable. + +Between KVM_ARM_VCPU_INIT and KVM_ARM_VCPU_FINALIZE, the feature may be +configured by use of ioctls such as KVM_SET_ONE_REG. The exact configuration +that should be performaned and how to do it are feature-dependent. + +Other calls that depend on a particular feature being finalized, such as +KVM_RUN, KVM_GET_REG_LIST, KVM_GET_ONE_REG and KVM_SET_ONE_REG, will fail with +-EPERM unless the feature has already been finalized by means of a +KVM_ARM_VCPU_FINALIZE call. + +See KVM_ARM_VCPU_INIT for details of vcpu features that require finalization +using this ioctl. + 5. The kvm_run structure ------------------------ @@ -4505,6 +4666,15 @@ struct kvm_sync_regs { struct kvm_vcpu_events events; }; +6.75 KVM_CAP_PPC_IRQ_XIVE + +Architectures: ppc +Target: vcpu +Parameters: args[0] is the XIVE device fd + args[1] is the XIVE CPU number (server ID) for this vcpu + +This capability connects the vcpu to an in-kernel XIVE device. + 7. Capabilities that can be enabled on VMs ------------------------------------------ @@ -4798,7 +4968,7 @@ and injected exceptions. * For the new DR6 bits, note that bit 16 is set iff the #DB exception will clear DR6.RTM. -7.18 KVM_CAP_MANUAL_DIRTY_LOG_PROTECT +7.18 KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 Architectures: x86, arm, arm64, mips Parameters: args[0] whether feature should be enabled or not @@ -4821,6 +4991,11 @@ while userspace can see false reports of dirty pages. Manual reprotection helps reducing this time, improving guest performance and reducing the number of dirty log false positives. +KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 was previously available under the name +KVM_CAP_MANUAL_DIRTY_LOG_PROTECT, but the implementation had bugs that make +it hard or impossible to use it correctly. The availability of +KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 signals that those bugs are fixed. +Userspace should not try to use KVM_CAP_MANUAL_DIRTY_LOG_PROTECT. 8. Other capabilities. ---------------------- diff --git a/Documentation/virtual/kvm/devices/vm.txt b/Documentation/virtual/kvm/devices/vm.txt index 95ca68d663a4..4ffb82b02468 100644 --- a/Documentation/virtual/kvm/devices/vm.txt +++ b/Documentation/virtual/kvm/devices/vm.txt @@ -141,7 +141,8 @@ struct kvm_s390_vm_cpu_subfunc { u8 pcc[16]; # valid with Message-Security-Assist-Extension 4 u8 ppno[16]; # valid with Message-Security-Assist-Extension 5 u8 kma[16]; # valid with Message-Security-Assist-Extension 8 - u8 reserved[1808]; # reserved for future instructions + u8 kdsa[16]; # valid with Message-Security-Assist-Extension 9 + u8 reserved[1792]; # reserved for future instructions }; Parameters: address of a buffer to load the subfunction blocks from. diff --git a/Documentation/virtual/kvm/devices/xive.txt b/Documentation/virtual/kvm/devices/xive.txt new file mode 100644 index 000000000000..9a24a4525253 --- /dev/null +++ b/Documentation/virtual/kvm/devices/xive.txt @@ -0,0 +1,197 @@ +POWER9 eXternal Interrupt Virtualization Engine (XIVE Gen1) +========================================================== + +Device types supported: + KVM_DEV_TYPE_XIVE POWER9 XIVE Interrupt Controller generation 1 + +This device acts as a VM interrupt controller. It provides the KVM +interface to configure the interrupt sources of a VM in the underlying +POWER9 XIVE interrupt controller. + +Only one XIVE instance may be instantiated. A guest XIVE device +requires a POWER9 host and the guest OS should have support for the +XIVE native exploitation interrupt mode. If not, it should run using +the legacy interrupt mode, referred as XICS (POWER7/8). + +* Device Mappings + + The KVM device exposes different MMIO ranges of the XIVE HW which + are required for interrupt management. These are exposed to the + guest in VMAs populated with a custom VM fault handler. + + 1. Thread Interrupt Management Area (TIMA) + + Each thread has an associated Thread Interrupt Management context + composed of a set of registers. These registers let the thread + handle priority management and interrupt acknowledgment. The most + important are : + + - Interrupt Pending Buffer (IPB) + - Current Processor Priority (CPPR) + - Notification Source Register (NSR) + + They are exposed to software in four different pages each proposing + a view with a different privilege. The first page is for the + physical thread context and the second for the hypervisor. Only the + third (operating system) and the fourth (user level) are exposed the + guest. + + 2. Event State Buffer (ESB) + + Each source is associated with an Event State Buffer (ESB) with + either a pair of even/odd pair of pages which provides commands to + manage the source: to trigger, to EOI, to turn off the source for + instance. + + 3. Device pass-through + + When a device is passed-through into the guest, the source + interrupts are from a different HW controller (PHB4) and the ESB + pages exposed to the guest should accommadate this change. + + The passthru_irq helpers, kvmppc_xive_set_mapped() and + kvmppc_xive_clr_mapped() are called when the device HW irqs are + mapped into or unmapped from the guest IRQ number space. The KVM + device extends these helpers to clear the ESB pages of the guest IRQ + number being mapped and then lets the VM fault handler repopulate. + The handler will insert the ESB page corresponding to the HW + interrupt of the device being passed-through or the initial IPI ESB + page if the device has being removed. + + The ESB remapping is fully transparent to the guest and the OS + device driver. All handling is done within VFIO and the above + helpers in KVM-PPC. + +* Groups: + + 1. KVM_DEV_XIVE_GRP_CTRL + Provides global controls on the device + Attributes: + 1.1 KVM_DEV_XIVE_RESET (write only) + Resets the interrupt controller configuration for sources and event + queues. To be used by kexec and kdump. + Errors: none + + 1.2 KVM_DEV_XIVE_EQ_SYNC (write only) + Sync all the sources and queues and mark the EQ pages dirty. This + to make sure that a consistent memory state is captured when + migrating the VM. + Errors: none + + 2. KVM_DEV_XIVE_GRP_SOURCE (write only) + Initializes a new source in the XIVE device and mask it. + Attributes: + Interrupt source number (64-bit) + The kvm_device_attr.addr points to a __u64 value: + bits: | 63 .... 2 | 1 | 0 + values: | unused | level | type + - type: 0:MSI 1:LSI + - level: assertion level in case of an LSI. + Errors: + -E2BIG: Interrupt source number is out of range + -ENOMEM: Could not create a new source block + -EFAULT: Invalid user pointer for attr->addr. + -ENXIO: Could not allocate underlying HW interrupt + + 3. KVM_DEV_XIVE_GRP_SOURCE_CONFIG (write only) + Configures source targeting + Attributes: + Interrupt source number (64-bit) + The kvm_device_attr.addr points to a __u64 value: + bits: | 63 .... 33 | 32 | 31 .. 3 | 2 .. 0 + values: | eisn | mask | server | priority + - priority: 0-7 interrupt priority level + - server: CPU number chosen to handle the interrupt + - mask: mask flag (unused) + - eisn: Effective Interrupt Source Number + Errors: + -ENOENT: Unknown source number + -EINVAL: Not initialized source number + -EINVAL: Invalid priority + -EINVAL: Invalid CPU number. + -EFAULT: Invalid user pointer for attr->addr. + -ENXIO: CPU event queues not configured or configuration of the + underlying HW interrupt failed + -EBUSY: No CPU available to serve interrupt + + 4. KVM_DEV_XIVE_GRP_EQ_CONFIG (read-write) + Configures an event queue of a CPU + Attributes: + EQ descriptor identifier (64-bit) + The EQ descriptor identifier is a tuple (server, priority) : + bits: | 63 .... 32 | 31 .. 3 | 2 .. 0 + values: | unused | server | priority + The kvm_device_attr.addr points to : + struct kvm_ppc_xive_eq { + __u32 flags; + __u32 qshift; + __u64 qaddr; + __u32 qtoggle; + __u32 qindex; + __u8 pad[40]; + }; + - flags: queue flags + KVM_XIVE_EQ_ALWAYS_NOTIFY (required) + forces notification without using the coalescing mechanism + provided by the XIVE END ESBs. + - qshift: queue size (power of 2) + - qaddr: real address of queue + - qtoggle: current queue toggle bit + - qindex: current queue index + - pad: reserved for future use + Errors: + -ENOENT: Invalid CPU number + -EINVAL: Invalid priority + -EINVAL: Invalid flags + -EINVAL: Invalid queue size + -EINVAL: Invalid queue address + -EFAULT: Invalid user pointer for attr->addr. + -EIO: Configuration of the underlying HW failed + + 5. KVM_DEV_XIVE_GRP_SOURCE_SYNC (write only) + Synchronize the source to flush event notifications + Attributes: + Interrupt source number (64-bit) + Errors: + -ENOENT: Unknown source number + -EINVAL: Not initialized source number + +* VCPU state + + The XIVE IC maintains VP interrupt state in an internal structure + called the NVT. When a VP is not dispatched on a HW processor + thread, this structure can be updated by HW if the VP is the target + of an event notification. + + It is important for migration to capture the cached IPB from the NVT + as it synthesizes the priorities of the pending interrupts. We + capture a bit more to report debug information. + + KVM_REG_PPC_VP_STATE (2 * 64bits) + bits: | 63 .... 32 | 31 .... 0 | + values: | TIMA word0 | TIMA word1 | + bits: | 127 .......... 64 | + values: | unused | + +* Migration: + + Saving the state of a VM using the XIVE native exploitation mode + should follow a specific sequence. When the VM is stopped : + + 1. Mask all sources (PQ=01) to stop the flow of events. + + 2. Sync the XIVE device with the KVM control KVM_DEV_XIVE_EQ_SYNC to + flush any in-flight event notification and to stabilize the EQs. At + this stage, the EQ pages are marked dirty to make sure they are + transferred in the migration sequence. + + 3. Capture the state of the source targeting, the EQs configuration + and the state of thread interrupt context registers. + + Restore is similar : + + 1. Restore the EQ configuration. As targeting depends on it. + 2. Restore targeting + 3. Restore the thread interrupt contexts + 4. Restore the source states + 5. Let the vCPU run |