diff options
Diffstat (limited to 'Documentation/virtual/kvm/vcpu-requests.rst')
-rw-r--r-- | Documentation/virtual/kvm/vcpu-requests.rst | 307 |
1 files changed, 0 insertions, 307 deletions
diff --git a/Documentation/virtual/kvm/vcpu-requests.rst b/Documentation/virtual/kvm/vcpu-requests.rst deleted file mode 100644 index 5feb3706a7ae..000000000000 --- a/Documentation/virtual/kvm/vcpu-requests.rst +++ /dev/null @@ -1,307 +0,0 @@ -================= -KVM VCPU Requests -================= - -Overview -======== - -KVM supports an internal API enabling threads to request a VCPU thread to -perform some activity. For example, a thread may request a VCPU to flush -its TLB with a VCPU request. The API consists of the following functions:: - - /* Check if any requests are pending for VCPU @vcpu. */ - bool kvm_request_pending(struct kvm_vcpu *vcpu); - - /* Check if VCPU @vcpu has request @req pending. */ - bool kvm_test_request(int req, struct kvm_vcpu *vcpu); - - /* Clear request @req for VCPU @vcpu. */ - void kvm_clear_request(int req, struct kvm_vcpu *vcpu); - - /* - * Check if VCPU @vcpu has request @req pending. When the request is - * pending it will be cleared and a memory barrier, which pairs with - * another in kvm_make_request(), will be issued. - */ - bool kvm_check_request(int req, struct kvm_vcpu *vcpu); - - /* - * Make request @req of VCPU @vcpu. Issues a memory barrier, which pairs - * with another in kvm_check_request(), prior to setting the request. - */ - void kvm_make_request(int req, struct kvm_vcpu *vcpu); - - /* Make request @req of all VCPUs of the VM with struct kvm @kvm. */ - bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req); - -Typically a requester wants the VCPU to perform the activity as soon -as possible after making the request. This means most requests -(kvm_make_request() calls) are followed by a call to kvm_vcpu_kick(), -and kvm_make_all_cpus_request() has the kicking of all VCPUs built -into it. - -VCPU Kicks ----------- - -The goal of a VCPU kick is to bring a VCPU thread out of guest mode in -order to perform some KVM maintenance. To do so, an IPI is sent, forcing -a guest mode exit. However, a VCPU thread may not be in guest mode at the -time of the kick. Therefore, depending on the mode and state of the VCPU -thread, there are two other actions a kick may take. All three actions -are listed below: - -1) Send an IPI. This forces a guest mode exit. -2) Waking a sleeping VCPU. Sleeping VCPUs are VCPU threads outside guest - mode that wait on waitqueues. Waking them removes the threads from - the waitqueues, allowing the threads to run again. This behavior - may be suppressed, see KVM_REQUEST_NO_WAKEUP below. -3) Nothing. When the VCPU is not in guest mode and the VCPU thread is not - sleeping, then there is nothing to do. - -VCPU Mode ---------- - -VCPUs have a mode state, ``vcpu->mode``, that is used to track whether the -guest is running in guest mode or not, as well as some specific -outside guest mode states. The architecture may use ``vcpu->mode`` to -ensure VCPU requests are seen by VCPUs (see "Ensuring Requests Are Seen"), -as well as to avoid sending unnecessary IPIs (see "IPI Reduction"), and -even to ensure IPI acknowledgements are waited upon (see "Waiting for -Acknowledgements"). The following modes are defined: - -OUTSIDE_GUEST_MODE - - The VCPU thread is outside guest mode. - -IN_GUEST_MODE - - The VCPU thread is in guest mode. - -EXITING_GUEST_MODE - - The VCPU thread is transitioning from IN_GUEST_MODE to - OUTSIDE_GUEST_MODE. - -READING_SHADOW_PAGE_TABLES - - The VCPU thread is outside guest mode, but it wants the sender of - certain VCPU requests, namely KVM_REQ_TLB_FLUSH, to wait until the VCPU - thread is done reading the page tables. - -VCPU Request Internals -====================== - -VCPU requests are simply bit indices of the ``vcpu->requests`` bitmap. -This means general bitops, like those documented in [atomic-ops]_ could -also be used, e.g. :: - - clear_bit(KVM_REQ_UNHALT & KVM_REQUEST_MASK, &vcpu->requests); - -However, VCPU request users should refrain from doing so, as it would -break the abstraction. The first 8 bits are reserved for architecture -independent requests, all additional bits are available for architecture -dependent requests. - -Architecture Independent Requests ---------------------------------- - -KVM_REQ_TLB_FLUSH - - KVM's common MMU notifier may need to flush all of a guest's TLB - entries, calling kvm_flush_remote_tlbs() to do so. Architectures that - choose to use the common kvm_flush_remote_tlbs() implementation will - need to handle this VCPU request. - -KVM_REQ_MMU_RELOAD - - When shadow page tables are used and memory slots are removed it's - necessary to inform each VCPU to completely refresh the tables. This - request is used for that. - -KVM_REQ_PENDING_TIMER - - This request may be made from a timer handler run on the host on behalf - of a VCPU. It informs the VCPU thread to inject a timer interrupt. - -KVM_REQ_UNHALT - - This request may be made from the KVM common function kvm_vcpu_block(), - which is used to emulate an instruction that causes a CPU to halt until - one of an architectural specific set of events and/or interrupts is - received (determined by checking kvm_arch_vcpu_runnable()). When that - event or interrupt arrives kvm_vcpu_block() makes the request. This is - in contrast to when kvm_vcpu_block() returns due to any other reason, - such as a pending signal, which does not indicate the VCPU's halt - emulation should stop, and therefore does not make the request. - -KVM_REQUEST_MASK ----------------- - -VCPU requests should be masked by KVM_REQUEST_MASK before using them with -bitops. This is because only the lower 8 bits are used to represent the -request's number. The upper bits are used as flags. Currently only two -flags are defined. - -VCPU Request Flags ------------------- - -KVM_REQUEST_NO_WAKEUP - - This flag is applied to requests that only need immediate attention - from VCPUs running in guest mode. That is, sleeping VCPUs do not need - to be awaken for these requests. Sleeping VCPUs will handle the - requests when they are awaken later for some other reason. - -KVM_REQUEST_WAIT - - When requests with this flag are made with kvm_make_all_cpus_request(), - then the caller will wait for each VCPU to acknowledge its IPI before - proceeding. This flag only applies to VCPUs that would receive IPIs. - If, for example, the VCPU is sleeping, so no IPI is necessary, then - the requesting thread does not wait. This means that this flag may be - safely combined with KVM_REQUEST_NO_WAKEUP. See "Waiting for - Acknowledgements" for more information about requests with - KVM_REQUEST_WAIT. - -VCPU Requests with Associated State -=================================== - -Requesters that want the receiving VCPU to handle new state need to ensure -the newly written state is observable to the receiving VCPU thread's CPU -by the time it observes the request. This means a write memory barrier -must be inserted after writing the new state and before setting the VCPU -request bit. Additionally, on the receiving VCPU thread's side, a -corresponding read barrier must be inserted after reading the request bit -and before proceeding to read the new state associated with it. See -scenario 3, Message and Flag, of [lwn-mb]_ and the kernel documentation -[memory-barriers]_. - -The pair of functions, kvm_check_request() and kvm_make_request(), provide -the memory barriers, allowing this requirement to be handled internally by -the API. - -Ensuring Requests Are Seen -========================== - -When making requests to VCPUs, we want to avoid the receiving VCPU -executing in guest mode for an arbitrary long time without handling the -request. We can be sure this won't happen as long as we ensure the VCPU -thread checks kvm_request_pending() before entering guest mode and that a -kick will send an IPI to force an exit from guest mode when necessary. -Extra care must be taken to cover the period after the VCPU thread's last -kvm_request_pending() check and before it has entered guest mode, as kick -IPIs will only trigger guest mode exits for VCPU threads that are in guest -mode or at least have already disabled interrupts in order to prepare to -enter guest mode. This means that an optimized implementation (see "IPI -Reduction") must be certain when it's safe to not send the IPI. One -solution, which all architectures except s390 apply, is to: - -- set ``vcpu->mode`` to IN_GUEST_MODE between disabling the interrupts and - the last kvm_request_pending() check; -- enable interrupts atomically when entering the guest. - -This solution also requires memory barriers to be placed carefully in both -the requesting thread and the receiving VCPU. With the memory barriers we -can exclude the possibility of a VCPU thread observing -!kvm_request_pending() on its last check and then not receiving an IPI for -the next request made of it, even if the request is made immediately after -the check. This is done by way of the Dekker memory barrier pattern -(scenario 10 of [lwn-mb]_). As the Dekker pattern requires two variables, -this solution pairs ``vcpu->mode`` with ``vcpu->requests``. Substituting -them into the pattern gives:: - - CPU1 CPU2 - ================= ================= - local_irq_disable(); - WRITE_ONCE(vcpu->mode, IN_GUEST_MODE); kvm_make_request(REQ, vcpu); - smp_mb(); smp_mb(); - if (kvm_request_pending(vcpu)) { if (READ_ONCE(vcpu->mode) == - IN_GUEST_MODE) { - ...abort guest entry... ...send IPI... - } } - -As stated above, the IPI is only useful for VCPU threads in guest mode or -that have already disabled interrupts. This is why this specific case of -the Dekker pattern has been extended to disable interrupts before setting -``vcpu->mode`` to IN_GUEST_MODE. WRITE_ONCE() and READ_ONCE() are used to -pedantically implement the memory barrier pattern, guaranteeing the -compiler doesn't interfere with ``vcpu->mode``'s carefully planned -accesses. - -IPI Reduction -------------- - -As only one IPI is needed to get a VCPU to check for any/all requests, -then they may be coalesced. This is easily done by having the first IPI -sending kick also change the VCPU mode to something !IN_GUEST_MODE. The -transitional state, EXITING_GUEST_MODE, is used for this purpose. - -Waiting for Acknowledgements ----------------------------- - -Some requests, those with the KVM_REQUEST_WAIT flag set, require IPIs to -be sent, and the acknowledgements to be waited upon, even when the target -VCPU threads are in modes other than IN_GUEST_MODE. For example, one case -is when a target VCPU thread is in READING_SHADOW_PAGE_TABLES mode, which -is set after disabling interrupts. To support these cases, the -KVM_REQUEST_WAIT flag changes the condition for sending an IPI from -checking that the VCPU is IN_GUEST_MODE to checking that it is not -OUTSIDE_GUEST_MODE. - -Request-less VCPU Kicks ------------------------ - -As the determination of whether or not to send an IPI depends on the -two-variable Dekker memory barrier pattern, then it's clear that -request-less VCPU kicks are almost never correct. Without the assurance -that a non-IPI generating kick will still result in an action by the -receiving VCPU, as the final kvm_request_pending() check does for -request-accompanying kicks, then the kick may not do anything useful at -all. If, for instance, a request-less kick was made to a VCPU that was -just about to set its mode to IN_GUEST_MODE, meaning no IPI is sent, then -the VCPU thread may continue its entry without actually having done -whatever it was the kick was meant to initiate. - -One exception is x86's posted interrupt mechanism. In this case, however, -even the request-less VCPU kick is coupled with the same -local_irq_disable() + smp_mb() pattern described above; the ON bit -(Outstanding Notification) in the posted interrupt descriptor takes the -role of ``vcpu->requests``. When sending a posted interrupt, PIR.ON is -set before reading ``vcpu->mode``; dually, in the VCPU thread, -vmx_sync_pir_to_irr() reads PIR after setting ``vcpu->mode`` to -IN_GUEST_MODE. - -Additional Considerations -========================= - -Sleeping VCPUs --------------- - -VCPU threads may need to consider requests before and/or after calling -functions that may put them to sleep, e.g. kvm_vcpu_block(). Whether they -do or not, and, if they do, which requests need consideration, is -architecture dependent. kvm_vcpu_block() calls kvm_arch_vcpu_runnable() -to check if it should awaken. One reason to do so is to provide -architectures a function where requests may be checked if necessary. - -Clearing Requests ------------------ - -Generally it only makes sense for the receiving VCPU thread to clear a -request. However, in some circumstances, such as when the requesting -thread and the receiving VCPU thread are executed serially, such as when -they are the same thread, or when they are using some form of concurrency -control to temporarily execute synchronously, then it's possible to know -that the request may be cleared immediately, rather than waiting for the -receiving VCPU thread to handle the request in VCPU RUN. The only current -examples of this are kvm_vcpu_block() calls made by VCPUs to block -themselves. A possible side-effect of that call is to make the -KVM_REQ_UNHALT request, which may then be cleared immediately when the -VCPU returns from the call. - -References -========== - -.. [atomic-ops] Documentation/core-api/atomic_ops.rst -.. [memory-barriers] Documentation/memory-barriers.txt -.. [lwn-mb] https://lwn.net/Articles/573436/ |