diff options
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/virt/kvm/api.rst | 93 |
1 files changed, 93 insertions, 0 deletions
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 81d54fe76a2d..e264ebc35e27 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -262,6 +262,18 @@ The KVM_RUN ioctl (cf.) communicates with userspace via a shared memory region. This ioctl returns the size of that region. See the KVM_RUN documentation for details. +Besides the size of the KVM_RUN communication region, other areas of +the VCPU file descriptor can be mmap-ed, including: + +- if KVM_CAP_COALESCED_MMIO is available, a page at + KVM_COALESCED_MMIO_PAGE_OFFSET * PAGE_SIZE; for historical reasons, + this page is included in the result of KVM_GET_VCPU_MMAP_SIZE. + KVM_CAP_COALESCED_MMIO is not documented yet. + +- if KVM_CAP_DIRTY_LOG_RING is available, a number of pages at + KVM_DIRTY_LOG_PAGE_OFFSET * PAGE_SIZE. For more information on + KVM_CAP_DIRTY_LOG_RING, see section 8.3. + 4.6 KVM_SET_MEMORY_REGION ------------------------- @@ -6396,3 +6408,84 @@ When enabled, KVM will disable paravirtual features provided to the guest according to the bits in the KVM_CPUID_FEATURES CPUID leaf (0x40000001). Otherwise, a guest may use the paravirtual features regardless of what has actually been exposed through the CPUID leaf. + + +8.29 KVM_CAP_DIRTY_LOG_RING +--------------------------- + +:Architectures: x86 +:Parameters: args[0] - size of the dirty log ring + +KVM is capable of tracking dirty memory using ring buffers that are +mmaped into userspace; there is one dirty ring per vcpu. + +The dirty ring is available to userspace as an array of +``struct kvm_dirty_gfn``. Each dirty entry it's defined as:: + + struct kvm_dirty_gfn { + __u32 flags; + __u32 slot; /* as_id | slot_id */ + __u64 offset; + }; + +The following values are defined for the flags field to define the +current state of the entry:: + + #define KVM_DIRTY_GFN_F_DIRTY BIT(0) + #define KVM_DIRTY_GFN_F_RESET BIT(1) + #define KVM_DIRTY_GFN_F_MASK 0x3 + +Userspace should call KVM_ENABLE_CAP ioctl right after KVM_CREATE_VM +ioctl to enable this capability for the new guest and set the size of +the rings. Enabling the capability is only allowed before creating any +vCPU, and the size of the ring must be a power of two. The larger the +ring buffer, the less likely the ring is full and the VM is forced to +exit to userspace. The optimal size depends on the workload, but it is +recommended that it be at least 64 KiB (4096 entries). + +Just like for dirty page bitmaps, the buffer tracks writes to +all user memory regions for which the KVM_MEM_LOG_DIRTY_PAGES flag was +set in KVM_SET_USER_MEMORY_REGION. Once a memory region is registered +with the flag set, userspace can start harvesting dirty pages from the +ring buffer. + +An entry in the ring buffer can be unused (flag bits ``00``), +dirty (flag bits ``01``) or harvested (flag bits ``1X``). The +state machine for the entry is as follows:: + + dirtied harvested reset + 00 -----------> 01 -------------> 1X -------+ + ^ | + | | + +------------------------------------------+ + +To harvest the dirty pages, userspace accesses the mmaped ring buffer +to read the dirty GFNs. If the flags has the DIRTY bit set (at this stage +the RESET bit must be cleared), then it means this GFN is a dirty GFN. +The userspace should harvest this GFN and mark the flags from state +``01b`` to ``1Xb`` (bit 0 will be ignored by KVM, but bit 1 must be set +to show that this GFN is harvested and waiting for a reset), and move +on to the next GFN. The userspace should continue to do this until the +flags of a GFN have the DIRTY bit cleared, meaning that it has harvested +all the dirty GFNs that were available. + +It's not necessary for userspace to harvest the all dirty GFNs at once. +However it must collect the dirty GFNs in sequence, i.e., the userspace +program cannot skip one dirty GFN to collect the one next to it. + +After processing one or more entries in the ring buffer, userspace +calls the VM ioctl KVM_RESET_DIRTY_RINGS to notify the kernel about +it, so that the kernel will reprotect those collected GFNs. +Therefore, the ioctl must be called *before* reading the content of +the dirty pages. + +The dirty ring can get full. When it happens, the KVM_RUN of the +vcpu will return with exit reason KVM_EXIT_DIRTY_LOG_FULL. + +The dirty ring interface has a major difference comparing to the +KVM_GET_DIRTY_LOG interface in that, when reading the dirty ring from +userspace, it's still possible that the kernel has not yet flushed the +processor's dirty page buffers into the kernel buffer (with dirty bitmaps, the +flushing is done by the KVM_GET_DIRTY_LOG ioctl). To achieve that, one +needs to kick the vcpu out of KVM_RUN using a signal. The resulting +vmexit ensures that all dirty GFNs are flushed to the dirty rings. |