<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/arch/x86, branch v5.11-rc3</title>
<subtitle>Linux Kernel (branches are rebased on master from time to time)</subtitle>
<id>https://sre.ring0.de/linux/atom?h=v5.11-rc3</id>
<link rel='self' href='https://sre.ring0.de/linux/atom?h=v5.11-rc3'/>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/'/>
<updated>2021-01-10T19:31:17Z</updated>
<entry>
<title>Merge tag 'x86_urgent_for_v5.11_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2021-01-10T19:31:17Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2021-01-10T19:31:17Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=a440e4d7618cbe232e4f96dea805bcb89f79b18c'/>
<id>urn:sha1:a440e4d7618cbe232e4f96dea805bcb89f79b18c</id>
<content type='text'>
Pull x86 fixes from Borislav Petkov:
 "As expected, fixes started trickling in after the holidays so here is
  the accumulated pile of x86 fixes for 5.11:

   - A fix for fanotify_mark() missing the conversion of x86_32 native
     syscalls which take 64-bit arguments to the compat handlers due to
     former having a general compat handler. (Brian Gerst)

   - Add a forgotten pmd page destructor call to pud_free_pmd_page()
     where a pmd page is freed. (Dan Williams)

   - Make IN/OUT insns with an u8 immediate port operand handling for
     SEV-ES guests more precise by using only the single port byte and
     not the whole s32 value of the insn decoder. (Peter Gonda)

   - Correct a straddling end range check before returning the proper
     MTRR type, when the end address is the same as top of memory.
     (Ying-Tsun Huang)

   - Change PQR_ASSOC MSR update scheme when moving a task to a resctrl
     resource group to avoid significant performance overhead with some
     resctrl workloads. (Fenghua Yu)

   - Avoid the actual task move overhead when the task is already in the
     resource group. (Fenghua Yu)"

* tag 'x86_urgent_for_v5.11_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/resctrl: Don't move a task to the same resource group
  x86/resctrl: Use an IPI instead of task_work_add() to update PQR_ASSOC MSR
  x86/mtrr: Correct the range check before performing MTRR type lookups
  x86/sev-es: Fix SEV-ES OUT/IN immediate opcode vc handling
  x86/mm: Fix leak of pmd ptlock
  fanotify: Fix sys_fanotify_mark() on native x86-32
</content>
</entry>
<entry>
<title>Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm</title>
<updated>2021-01-08T23:06:02Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2021-01-08T23:06:02Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=2a190b22aa1149cda804527aa603db45f75439c3'/>
<id>urn:sha1:2a190b22aa1149cda804527aa603db45f75439c3</id>
<content type='text'>
Pull kvm fixes from Paolo Bonzini:
 "x86:
   - Fixes for the new scalable MMU
   - Fixes for migration of nested hypervisors on AMD
   - Fix for clang integrated assembler
   - Fix for left shift by 64 (UBSAN)
   - Small cleanups
   - Straggler SEV-ES patch

  ARM:
   - VM init cleanups
   - PSCI relay cleanups
   - Kill CONFIG_KVM_ARM_PMU
   - Fixup __init annotations
   - Fixup reg_to_encoding()
   - Fix spurious PMCR_EL0 access

  Misc:
   - selftests cleanups"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (38 commits)
  KVM: x86: __kvm_vcpu_halt can be static
  KVM: SVM: Add support for booting APs in an SEV-ES guest
  KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES on nested vmexit
  KVM: nSVM: mark vmcb as dirty when forcingly leaving the guest mode
  KVM: nSVM: correctly restore nested_run_pending on migration
  KVM: x86/mmu: Clarify TDP MMU page list invariants
  KVM: x86/mmu: Ensure TDP MMU roots are freed after yield
  kvm: check tlbs_dirty directly
  KVM: x86: change in pv_eoi_get_pending() to make code more readable
  MAINTAINERS: Really update email address for Sean Christopherson
  KVM: x86: fix shift out of bounds reported by UBSAN
  KVM: selftests: Implement perf_test_util more conventionally
  KVM: selftests: Use vm_create_with_vcpus in create_vm
  KVM: selftests: Factor out guest mode code
  KVM/SVM: Remove leftover __svm_vcpu_run prototype from svm.c
  KVM: SVM: Add register operand to vmsave call in sev_es_vcpu_load
  KVM: x86/mmu: Optimize not-present/MMIO SPTE check in get_mmio_spte()
  KVM: x86/mmu: Use raw level to index into MMIO walks' sptes array
  KVM: x86/mmu: Get root level from walkers when retrieving MMIO SPTE
  KVM: x86/mmu: Use -1 to flag an undefined spte in get_mmio_spte()
  ...
</content>
</entry>
<entry>
<title>KVM: x86: __kvm_vcpu_halt can be static</title>
<updated>2021-01-08T10:54:44Z</updated>
<author>
<name>Paolo Bonzini</name>
<email>pbonzini@redhat.com</email>
</author>
<published>2021-01-08T10:54:44Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=872f36eb0b0f4f0e3a81ea1e51a6bdf58ccfdc6e'/>
<id>urn:sha1:872f36eb0b0f4f0e3a81ea1e51a6bdf58ccfdc6e</id>
<content type='text'>
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>x86/resctrl: Don't move a task to the same resource group</title>
<updated>2021-01-08T08:08:03Z</updated>
<author>
<name>Fenghua Yu</name>
<email>fenghua.yu@intel.com</email>
</author>
<published>2020-12-17T22:31:19Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=a0195f314a25582b38993bf30db11c300f4f4611'/>
<id>urn:sha1:a0195f314a25582b38993bf30db11c300f4f4611</id>
<content type='text'>
Shakeel Butt reported in [1] that a user can request a task to be moved
to a resource group even if the task is already in the group. It just
wastes time to do the move operation which could be costly to send IPI
to a different CPU.

Add a sanity check to ensure that the move operation only happens when
the task is not already in the resource group.

[1] https://lore.kernel.org/lkml/CALvZod7E9zzHwenzf7objzGKsdBmVwTgEJ0nPgs0LUFU3SN5Pw@mail.gmail.com/

Fixes: e02737d5b826 ("x86/intel_rdt: Add tasks files")
Reported-by: Shakeel Butt &lt;shakeelb@google.com&gt;
Signed-off-by: Fenghua Yu &lt;fenghua.yu@intel.com&gt;
Signed-off-by: Reinette Chatre &lt;reinette.chatre@intel.com&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Reviewed-by: Tony Luck &lt;tony.luck@intel.com&gt;
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/962ede65d8e95be793cb61102cca37f7bb018e66.1608243147.git.reinette.chatre@intel.com
</content>
</entry>
<entry>
<title>x86/resctrl: Use an IPI instead of task_work_add() to update PQR_ASSOC MSR</title>
<updated>2021-01-08T08:03:36Z</updated>
<author>
<name>Fenghua Yu</name>
<email>fenghua.yu@intel.com</email>
</author>
<published>2020-12-17T22:31:18Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=ae28d1aae48a1258bd09a6f707ebb4231d79a761'/>
<id>urn:sha1:ae28d1aae48a1258bd09a6f707ebb4231d79a761</id>
<content type='text'>
Currently, when moving a task to a resource group the PQR_ASSOC MSR is
updated with the new closid and rmid in an added task callback. If the
task is running, the work is run as soon as possible. If the task is not
running, the work is executed later in the kernel exit path when the
kernel returns to the task again.

Updating the PQR_ASSOC MSR as soon as possible on the CPU a moved task
is running is the right thing to do. Queueing work for a task that is
not running is unnecessary (the PQR_ASSOC MSR is already updated when
the task is scheduled in) and causing system resource waste with the way
in which it is implemented: Work to update the PQR_ASSOC register is
queued every time the user writes a task id to the "tasks" file, even if
the task already belongs to the resource group.

This could result in multiple pending work items associated with a
single task even if they are all identical and even though only a single
update with most recent values is needed. Specifically, even if a task
is moved between different resource groups while it is sleeping then it
is only the last move that is relevant but yet a work item is queued
during each move.

This unnecessary queueing of work items could result in significant
system resource waste, especially on tasks sleeping for a long time.
For example, as demonstrated by Shakeel Butt in [1] writing the same
task id to the "tasks" file can quickly consume significant memory. The
same problem (wasted system resources) occurs when moving a task between
different resource groups.

As pointed out by Valentin Schneider in [2] there is an additional issue
with the way in which the queueing of work is done in that the task_struct
update is currently done after the work is queued, resulting in a race with
the register update possibly done before the data needed by the update is
available.

To solve these issues, update the PQR_ASSOC MSR in a synchronous way
right after the new closid and rmid are ready during the task movement,
only if the task is running. If a moved task is not running nothing
is done since the PQR_ASSOC MSR will be updated next time the task is
scheduled. This is the same way used to update the register when tasks
are moved as part of resource group removal.

[1] https://lore.kernel.org/lkml/CALvZod7E9zzHwenzf7objzGKsdBmVwTgEJ0nPgs0LUFU3SN5Pw@mail.gmail.com/
[2] https://lore.kernel.org/lkml/20201123022433.17905-1-valentin.schneider@arm.com

 [ bp: Massage commit message and drop the two update_task_closid_rmid()
   variants. ]

Fixes: e02737d5b826 ("x86/intel_rdt: Add tasks files")
Reported-by: Shakeel Butt &lt;shakeelb@google.com&gt;
Reported-by: Valentin Schneider &lt;valentin.schneider@arm.com&gt;
Signed-off-by: Fenghua Yu &lt;fenghua.yu@intel.com&gt;
Signed-off-by: Reinette Chatre &lt;reinette.chatre@intel.com&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Reviewed-by: Tony Luck &lt;tony.luck@intel.com&gt;
Reviewed-by: James Morse &lt;james.morse@arm.com&gt;
Reviewed-by: Valentin Schneider &lt;valentin.schneider@arm.com&gt;
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/17aa2fb38fc12ce7bb710106b3e7c7b45acb9e94.1608243147.git.reinette.chatre@intel.com
</content>
</entry>
<entry>
<title>KVM: SVM: Add support for booting APs in an SEV-ES guest</title>
<updated>2021-01-07T23:11:37Z</updated>
<author>
<name>Tom Lendacky</name>
<email>thomas.lendacky@amd.com</email>
</author>
<published>2021-01-04T20:20:01Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=647daca25d24fb6eadc7b6cd680ad3e6eed0f3d5'/>
<id>urn:sha1:647daca25d24fb6eadc7b6cd680ad3e6eed0f3d5</id>
<content type='text'>
Typically under KVM, an AP is booted using the INIT-SIPI-SIPI sequence,
where the guest vCPU register state is updated and then the vCPU is VMRUN
to begin execution of the AP. For an SEV-ES guest, this won't work because
the guest register state is encrypted.

Following the GHCB specification, the hypervisor must not alter the guest
register state, so KVM must track an AP/vCPU boot. Should the guest want
to park the AP, it must use the AP Reset Hold exit event in place of, for
example, a HLT loop.

First AP boot (first INIT-SIPI-SIPI sequence):
  Execute the AP (vCPU) as it was initialized and measured by the SEV-ES
  support. It is up to the guest to transfer control of the AP to the
  proper location.

Subsequent AP boot:
  KVM will expect to receive an AP Reset Hold exit event indicating that
  the vCPU is being parked and will require an INIT-SIPI-SIPI sequence to
  awaken it. When the AP Reset Hold exit event is received, KVM will place
  the vCPU into a simulated HLT mode. Upon receiving the INIT-SIPI-SIPI
  sequence, KVM will make the vCPU runnable. It is again up to the guest
  to then transfer control of the AP to the proper location.

  To differentiate between an actual HLT and an AP Reset Hold, a new MP
  state is introduced, KVM_MP_STATE_AP_RESET_HOLD, which the vCPU is
  placed in upon receiving the AP Reset Hold exit event. Additionally, to
  communicate the AP Reset Hold exit event up to userspace (if needed), a
  new exit reason is introduced, KVM_EXIT_AP_RESET_HOLD.

A new x86 ops function is introduced, vcpu_deliver_sipi_vector, in order
to accomplish AP booting. For VMX, vcpu_deliver_sipi_vector is set to the
original SIPI delivery function, kvm_vcpu_deliver_sipi_vector(). SVM adds
a new function that, for non SEV-ES guests, invokes the original SIPI
delivery function, kvm_vcpu_deliver_sipi_vector(), but for SEV-ES guests,
implements the logic above.

Signed-off-by: Tom Lendacky &lt;thomas.lendacky@amd.com&gt;
Message-Id: &lt;e8fbebe8eb161ceaabdad7c01a5859a78b424d5e.1609791600.git.thomas.lendacky@amd.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES on nested vmexit</title>
<updated>2021-01-07T23:11:35Z</updated>
<author>
<name>Maxim Levitsky</name>
<email>mlevitsk@redhat.com</email>
</author>
<published>2021-01-07T09:38:51Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=f2c7ef3ba9556d62a7e2bb23b563c6510007d55c'/>
<id>urn:sha1:f2c7ef3ba9556d62a7e2bb23b563c6510007d55c</id>
<content type='text'>
It is possible to exit the nested guest mode, entered by
svm_set_nested_state prior to first vm entry to it (e.g due to pending event)
if the nested run was not pending during the migration.

In this case we must not switch to the nested msr permission bitmap.
Also add a warning to catch similar cases in the future.

Fixes: a7d5c7ce41ac1 ("KVM: nSVM: delay MSR permission processing to first nested VM run")

Signed-off-by: Maxim Levitsky &lt;mlevitsk@redhat.com&gt;
Message-Id: &lt;20210107093854.882483-2-mlevitsk@redhat.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>KVM: nSVM: mark vmcb as dirty when forcingly leaving the guest mode</title>
<updated>2021-01-07T23:11:34Z</updated>
<author>
<name>Maxim Levitsky</name>
<email>mlevitsk@redhat.com</email>
</author>
<published>2021-01-07T09:38:54Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=56fe28de8c4f0167275c411c0daa5709e9a47bd7'/>
<id>urn:sha1:56fe28de8c4f0167275c411c0daa5709e9a47bd7</id>
<content type='text'>
We overwrite most of vmcb fields while doing so, so we must
mark it as dirty.

Signed-off-by: Maxim Levitsky &lt;mlevitsk@redhat.com&gt;
Message-Id: &lt;20210107093854.882483-5-mlevitsk@redhat.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>KVM: nSVM: correctly restore nested_run_pending on migration</title>
<updated>2021-01-07T23:11:33Z</updated>
<author>
<name>Maxim Levitsky</name>
<email>mlevitsk@redhat.com</email>
</author>
<published>2021-01-07T09:38:52Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=81f76adad560dfc39cb9625cf1e00a7e2b7b88df'/>
<id>urn:sha1:81f76adad560dfc39cb9625cf1e00a7e2b7b88df</id>
<content type='text'>
The code to store it on the migration exists, but no code was restoring it.

One of the side effects of fixing this is that L1-&gt;L2 injected events
are no longer lost when migration happens with nested run pending.

Signed-off-by: Maxim Levitsky &lt;mlevitsk@redhat.com&gt;
Message-Id: &lt;20210107093854.882483-3-mlevitsk@redhat.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>KVM: x86/mmu: Clarify TDP MMU page list invariants</title>
<updated>2021-01-07T23:11:32Z</updated>
<author>
<name>Ben Gardon</name>
<email>bgardon@google.com</email>
</author>
<published>2021-01-07T00:19:35Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=c0dba6e46825716db15c4b3a8f05c85b4a59edda'/>
<id>urn:sha1:c0dba6e46825716db15c4b3a8f05c85b4a59edda</id>
<content type='text'>
The tdp_mmu_roots and tdp_mmu_pages in struct kvm_arch should only contain
pages with tdp_mmu_page set to true. tdp_mmu_pages should not contain any
pages with a non-zero root_count and tdp_mmu_roots should only contain
pages with a positive root_count, unless a thread holds the MMU lock and
is in the process of modifying the list. Various functions expect these
invariants to be maintained, but they are not explictily documented. Add
to the comments on both fields to document the above invariants.

Signed-off-by: Ben Gardon &lt;bgardon@google.com&gt;
Message-Id: &lt;20210107001935.3732070-2-bgardon@google.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
</feed>
