summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2015-05-26KVM: const-ify uses of struct kvm_userspace_memory_regionPaolo Bonzini12-37/+37
Architecture-specific helpers are not supposed to muck with struct kvm_userspace_memory_region contents. Add const to enforce this. In order to eliminate the only write in __kvm_set_memory_region, the cleaning of deleted slots is pulled up from update_memslots to __kvm_set_memory_region. Reviewed-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Reviewed-by: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-26KVM: use kvm_memslots whenever possiblePaolo Bonzini8-14/+31
kvm_memslots provides lockdep checking. Use it consistently instead of explicit dereferencing of kvm->memslots. Reviewed-by: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-26KVM: introduce kvm_alloc/free_memslotsPaolo Bonzini1-49/+55
kvm_alloc_memslots is extracted out of previously scattered code that was in kvm_init_memslots_id and kvm_create_vm. kvm_free_memslot and kvm_free_memslots are new names of kvm_free_physmem and kvm_free_physmem_slot, but they also take an explicit pointer to struct kvm_memslots. This will simplify the transition to multiple address spaces, each represented by one pointer to struct kvm_memslots. Reviewed-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Reviewed-by: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-20Merge branch 'kvm-master' into kvm-nextPaolo Bonzini7-4/+33
Grab MPX bugfix, and fix conflicts against Rik's adaptive FPU deactivation patch. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-20kvm/fpu: Enable eager restore kvm FPU for MPXLiang Li4-2/+27
The MPX feature requires eager KVM FPU restore support. We have verified that MPX cannot work correctly with the current lazy KVM FPU restore mechanism. Eager KVM FPU restore should be enabled if the MPX feature is exposed to VM. Signed-off-by: Yang Zhang <yang.z.zhang@intel.com> Signed-off-by: Liang Li <liang.z.li@intel.com> [Also activate the FPU on AMD processors. - Paolo] Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-20Revert "KVM: x86: drop fpu_activate hook"Paolo Bonzini3-0/+3
This reverts commit 4473b570a7ebb502f63f292ccfba7df622e5fdd3. We'll use the hook again. Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-20kvm: fix crash in kvm_vcpu_reload_apic_access_pageAndrea Arcangeli1-0/+2
memslot->userfault_addr is set by the kernel with a mmap executed from the kernel but the userland can still munmap it and lead to the below oops after memslot->userfault_addr points to a host virtual address that has no vma or mapping. [ 327.538306] BUG: unable to handle kernel paging request at fffffffffffffffe [ 327.538407] IP: [<ffffffff811a7b55>] put_page+0x5/0x50 [ 327.538474] PGD 1a01067 PUD 1a03067 PMD 0 [ 327.538529] Oops: 0000 [#1] SMP [ 327.538574] Modules linked in: macvtap macvlan xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT iptable_filter ip_tables tun bridge stp llc rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache xprtrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ipmi_devintf iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp dcdbas intel_rapl kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr sb_edac edac_core ipmi_si ipmi_msghandler acpi_pad wmi acpi_power_meter lpc_ich mfd_core mei_me [ 327.539488] mei shpchp nfsd auth_rpcgss nfs_acl lockd grace sunrpc mlx4_ib ib_sa ib_mad ib_core mlx4_en vxlan ib_addr ip_tunnel xfs libcrc32c sd_mod crc_t10dif crct10dif_common crc32c_intel mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ttm drm ahci i2c_core libahci mlx4_core libata tg3 ptp pps_core megaraid_sas ntb dm_mirror dm_region_hash dm_log dm_mod [ 327.539956] CPU: 3 PID: 3161 Comm: qemu-kvm Not tainted 3.10.0-240.el7.userfault19.4ca4011.x86_64.debug #1 [ 327.540045] Hardware name: Dell Inc. PowerEdge R420/0CN7CM, BIOS 2.1.2 01/20/2014 [ 327.540115] task: ffff8803280ccf00 ti: ffff880317c58000 task.ti: ffff880317c58000 [ 327.540184] RIP: 0010:[<ffffffff811a7b55>] [<ffffffff811a7b55>] put_page+0x5/0x50 [ 327.540261] RSP: 0018:ffff880317c5bcf8 EFLAGS: 00010246 [ 327.540313] RAX: 00057ffffffff000 RBX: ffff880616a20000 RCX: 0000000000000000 [ 327.540379] RDX: 0000000000002014 RSI: 00057ffffffff000 RDI: fffffffffffffffe [ 327.540445] RBP: ffff880317c5bd10 R08: 0000000000000103 R09: 0000000000000000 [ 327.540511] R10: 0000000000000000 R11: 0000000000000000 R12: fffffffffffffffe [ 327.540576] R13: 0000000000000000 R14: ffff880317c5bd70 R15: ffff880317c5bd50 [ 327.540643] FS: 00007fd230b7f700(0000) GS:ffff880630800000(0000) knlGS:0000000000000000 [ 327.540717] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 327.540771] CR2: fffffffffffffffe CR3: 000000062a2c3000 CR4: 00000000000427e0 [ 327.540837] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 327.540904] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 327.540974] Stack: [ 327.541008] ffffffffa05d6d0c ffff880616a20000 0000000000000000 ffff880317c5bdc0 [ 327.541093] ffffffffa05ddaa2 0000000000000000 00000000002191bf 00000042f3feab2d [ 327.541177] 00000042f3feab2d 0000000000000002 0000000000000001 0321000000000000 [ 327.541261] Call Trace: [ 327.541321] [<ffffffffa05d6d0c>] ? kvm_vcpu_reload_apic_access_page+0x6c/0x80 [kvm] [ 327.543615] [<ffffffffa05ddaa2>] vcpu_enter_guest+0x3f2/0x10f0 [kvm] [ 327.545918] [<ffffffffa05e2f10>] kvm_arch_vcpu_ioctl_run+0x2b0/0x5a0 [kvm] [ 327.548211] [<ffffffffa05e2d02>] ? kvm_arch_vcpu_ioctl_run+0xa2/0x5a0 [kvm] [ 327.550500] [<ffffffffa05ca845>] kvm_vcpu_ioctl+0x2b5/0x680 [kvm] [ 327.552768] [<ffffffff810b8d12>] ? creds_are_invalid.part.1+0x12/0x50 [ 327.555069] [<ffffffff810b8d71>] ? creds_are_invalid+0x21/0x30 [ 327.557373] [<ffffffff812d6066>] ? inode_has_perm.isra.49.constprop.65+0x26/0x80 [ 327.559663] [<ffffffff8122d985>] do_vfs_ioctl+0x305/0x530 [ 327.561917] [<ffffffff8122dc51>] SyS_ioctl+0xa1/0xc0 [ 327.564185] [<ffffffff816de829>] system_call_fastpath+0x16/0x1b [ 327.566480] Code: 0b 31 f6 4c 89 e7 e8 4b 7f ff ff 0f 0b e8 24 fd ff ff e9 a9 fd ff ff 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 <48> f7 07 00 c0 00 00 55 48 89 e5 75 2a 8b 47 1c 85 c0 74 1e f0 Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-20kvm: x86: Make functions that have no external callers staticNicholas Krause1-2/+2
This makes the functions kvm_guest_cpu_init and kvm_init_debugfs static now due to having no external callers outside their declarations in the file, kvm.c. Signed-off-by: Nicholas Krause <xerofoify@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-19KVM: export __gfn_to_pfn_memslot, drop gfn_to_pfn_asyncPaolo Bonzini3-24/+15
gfn_to_pfn_async is used in just one place, and because of x86-specific treatment that place will need to look at the memory slot. Hence inline it into try_async_pf and export __gfn_to_pfn_memslot. The patch also switches the subsequent call to gfn_to_pfn_prot to use __gfn_to_pfn_memslot. This is a small optimization. Finally, remove the now-unused async argument of __gfn_to_pfn. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-19KVM: mips: use id_to_memslot correctlyPaolo Bonzini1-1/+1
The argument to KVM_GET_DIRTY_LOG is a memslot id; it may not match the position in the memslots array, which is sorted by gfn. Cc: stable@vger.kernel.org Reviewed-by: James Hogan <james.hogan@imgtec.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-19KVM: x86: do not reset mmu if CR0.CD and CR0.NW are changedXiao Guangrong1-2/+1
CR0.CD and CR0.NW are not used by shadow page table so that need not adjust mmu if these two bit are changed Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-19KVM: MMU: fix MTRR updateXiao Guangrong3-1/+83
Currently, whenever guest MTRR registers are changed kvm_mmu_reset_context is called to switch to the new root shadow page table, however, it's useless since: 1) the cache type is not cached into shadow page's attribute so that the original root shadow page will be reused 2) the cache type is set on the last spte, that means we should sync the last sptes when MTRR is changed This patch fixs this issue by drop all the spte in the gfn range which is being updated by MTRR Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-19KVM: MMU: fix decoding cache type from MTRRXiao Guangrong1-8/+6
There are some bugs in current get_mtrr_type(); 1: bit 1 of mtrr_state->enabled is corresponding bit 11 of IA32_MTRR_DEF_TYPE MSR which completely control MTRR's enablement that means other bits are ignored if it is cleared 2: the fixed MTRR ranges are controlled by bit 0 of mtrr_state->enabled (bit 10 of IA32_MTRR_DEF_TYPE) 3: if MTRR is disabled, UC is applied to all of physical memory rather than mtrr_state->def_type Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Reviewed-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-19KVM: MMU: introduce kvm_zap_rmappXiao Guangrong1-8/+12
Split kvm_unmap_rmapp and introduce kvm_zap_rmapp which will be used in the later patch Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-19KVM: MMU: use slot_handle_level and its helper to clean up the codeXiao Guangrong1-112/+16
slot_handle_level and its helper functions are ready now, use them to clean up the code Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-19KVM: MMU: introduce slot_handle_level_range() and its helpersXiao Guangrong1-0/+69
There are several places walking all rmaps for the memslot so that introduce common functions to cleanup the code Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-19KVM: MMU: introduce for_each_slot_rmap_rangeXiao Guangrong1-22/+75
It's used to abstract the code from kvm_handle_hva_range and it will be used by later patch Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-19KVM: MMU: introduce PT_MAX_HUGEPAGE_LEVELXiao Guangrong2-15/+10
Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-19KVM: MMU: introduce for_each_rmap_spte()Xiao Guangrong2-38/+19
It's used to walk all the sptes on the rmap to clean up the code Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-19Revert "kvmclock: set scheduler clock stable"Paolo Bonzini1-3/+0
This reverts commit ff7bbb9c6ab6e6620429daeff39424bbde1a94b4. Sasha Levin is seeing odd jump in time values during boot of a KVM guest: [...] [ 0.000000] tsc: Detected 2260.998 MHz processor [3376355.247558] Calibrating delay loop (skipped) preset value.. [...] and bisected them to this commit. Reported-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-19KVM: MMU: fix SMAP virtualizationXiao Guangrong5-15/+30
KVM may turn a user page to a kernel page when kernel writes a readonly user page if CR0.WP = 1. This shadow page entry will be reused after SMAP is enabled so that kernel is allowed to access this user page Fix it by setting SMAP && !CR0.WP into shadow page's role and reset mmu once CR4.SMAP is updated Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-19KVM: x86: Fix zero iterations REP-stringNadav Amit1-0/+25
When a REP-string is executed in 64-bit mode with an address-size prefix, ECX/EDI/ESI are used as counter and pointers. When ECX is initially zero, Intel CPUs clear the high 32-bits of RCX, and recent Intel CPUs update the high bits of the pointers in MOVS/STOS. This behavior is specific to Intel according to few experiments. As one may guess, this is an undocumented behavior. Yet, it is observable in the guest, since at least VMX traps REP-INS/OUTS even when ECX=0. Note that VMware appears to get it right. The behavior can be observed using the following code: #include <stdio.h> #define LOW_MASK (0xffffffff00000000ull) #define ALL_MASK (0xffffffffffffffffull) #define TEST(opcode) \ do { \ asm volatile(".byte 0xf2 \n\t .byte 0x67 \n\t .byte " opcode "\n\t" \ : "=S"(s), "=c"(c), "=D"(d) \ : "S"(ALL_MASK), "c"(LOW_MASK), "D"(ALL_MASK)); \ printf("opcode %s rcx=%llx rsi=%llx rdi=%llx\n", \ opcode, c, s, d); \ } while(0) void main() { unsigned long long s, d, c; iopl(3); TEST("0x6c"); TEST("0x6d"); TEST("0x6e"); TEST("0x6f"); TEST("0xa4"); TEST("0xa5"); TEST("0xa6"); TEST("0xa7"); TEST("0xaa"); TEST("0xab"); TEST("0xae"); TEST("0xaf"); } Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-19KVM: x86: Fix update RCX/RDI/RSI on REP-stringNadav Amit1-6/+2
When REP-string instruction is preceded with an address-size prefix, ECX/EDI/ESI are used as the operation counter and pointers. When they are updated, the high 32-bits of RCX/RDI/RSI are cleared, similarly to the way they are updated on every 32-bit register operation. Fix it. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-19KVM: x86: Fix DR7 mask on task-switch while debuggingNadav Amit2-4/+5
If the host sets hardware breakpoints to debug the guest, and a task-switch occurs in the guest, the architectural DR7 will not be updated. The effective DR7 would be updated instead. This fix puts the DR7 update during task-switch emulation, so it now uses the standard DR setting mechanism instead of the one that was previously used. As a bonus, the update of DR7 will now be effective for AMD as well. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-11KVM: MMU: fix SMAP virtualizationXiao Guangrong5-15/+30
KVM may turn a user page to a kernel page when kernel writes a readonly user page if CR0.WP = 1. This shadow page entry will be reused after SMAP is enabled so that kernel is allowed to access this user page Fix it by setting SMAP && !CR0.WP into shadow page's role and reset mmu once CR4.SMAP is updated Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-11KVM: MMU: fix CR4.SMEP=1, CR0.WP=0 with shadow pagesPaolo Bonzini1-1/+1
smep_andnot_wp is initialized in kvm_init_shadow_mmu and shadow pages should not be reused for different values of it. Thus, it has to be added to the mask in kvm_mmu_pte_write. Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-11KVM: MMU: fix smap permission checkXiao Guangrong2-0/+9
Current permission check assumes that RSVD bit in PFEC is always zero, however, it is not true since MMIO #PF will use it to quickly identify MMIO access Fix it by clearing the bit if walking guest page table is needed Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-11Merge tag 'kvm-s390-next-20150508' of ↵Paolo Bonzini7-77/+116
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: Fixes and features for 4.2 (kvm/next) Mostly a bunch of fixes, reworks and optimizations for s390. There is one new feature (EDAT-2 inside the guest), which boils down to 2GB pages.
2015-05-10KVM: PPC: Book3S HV: Fix list traversal in error casePaul Mackerras1-2/+3
This fixes a regression introduced in commit 25fedfca94cf, "KVM: PPC: Book3S HV: Move vcore preemption point up into kvmppc_run_vcpu", which leads to a user-triggerable oops. In the case where we try to run a vcore on a physical core that is not in single-threaded mode, or the vcore has too many threads for the physical core, we iterate the list of runnable vcpus to make each one return an EBUSY error to userspace. Since this involves taking each vcpu off the runnable_threads list for the vcore, we need to use list_for_each_entry_safe rather than list_for_each_entry to traverse the list. Otherwise the kernel will crash with an oops message like this: Unable to handle kernel paging request for data at address 0x000fff88 Faulting instruction address: 0xd00000001e635dc8 Oops: Kernel access of bad area, sig: 11 [#2] SMP NR_CPUS=1024 NUMA PowerNV ... CPU: 48 PID: 91256 Comm: qemu-system-ppc Tainted: G D 3.18.0 #1 task: c00000274e507500 ti: c0000027d1924000 task.ti: c0000027d1924000 NIP: d00000001e635dc8 LR: d00000001e635df8 CTR: c00000000011ba50 REGS: c0000027d19275b0 TRAP: 0300 Tainted: G D (3.18.0) MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 22002824 XER: 00000000 CFAR: c000000000008468 DAR: 00000000000fff88 DSISR: 40000000 SOFTE: 1 GPR00: d00000001e635df8 c0000027d1927830 d00000001e64c850 0000000000000001 GPR04: 0000000000000001 0000000000000001 0000000000000000 0000000000000000 GPR08: 0000000000200200 0000000000000000 0000000000000000 d00000001e63e588 GPR12: 0000000000002200 c000000007dbc800 c000000fc7800000 000000000000000a GPR16: fffffffffffffffc c000000fd5439690 c000000fc7801c98 0000000000000001 GPR20: 0000000000000003 c0000027d1927aa8 c000000fd543b348 c000000fd543b350 GPR24: 0000000000000000 c000000fa57f0000 0000000000000030 0000000000000000 GPR28: fffffffffffffff0 c000000fd543b328 00000000000fe468 c000000fd543b300 NIP [d00000001e635dc8] kvmppc_run_core+0x198/0x17c0 [kvm_hv] LR [d00000001e635df8] kvmppc_run_core+0x1c8/0x17c0 [kvm_hv] Call Trace: [c0000027d1927830] [d00000001e635df8] kvmppc_run_core+0x1c8/0x17c0 [kvm_hv] (unreliable) [c0000027d1927a30] [d00000001e638350] kvmppc_vcpu_run_hv+0x5b0/0xdd0 [kvm_hv] [c0000027d1927b70] [d00000001e510504] kvmppc_vcpu_run+0x44/0x60 [kvm] [c0000027d1927ba0] [d00000001e50d4a4] kvm_arch_vcpu_ioctl_run+0x64/0x170 [kvm] [c0000027d1927be0] [d00000001e504be8] kvm_vcpu_ioctl+0x5e8/0x7a0 [kvm] [c0000027d1927d40] [c0000000002d6720] do_vfs_ioctl+0x490/0x780 [c0000027d1927de0] [c0000000002d6ae4] SyS_ioctl+0xd4/0xf0 [c0000027d1927e30] [c000000000009358] syscall_exit+0x0/0x98 Instruction dump: 60000000 60420000 387e1b30 38800003 38a00001 38c00000 480087d9 e8410018 ebde1c98 7fbdf040 3bdee368 419e0048 <813e1b20> 939e1b18 2f890001 409effcc ---[ end trace 8cdf50251cca6680 ]--- Fixes: 25fedfca94cfbf2461314c6c34ef58e74a31b025 Signed-off-by: Paul Mackerras <paulus@samba.org> Reviewed-by: Alexander Graf <agraf@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-08KVM: s390: drop handling of interception code 12David Hildenbrand1-16/+0
Our implementation will never trigger interception code 12 as the responsible setting is never enabled - and never will be. The handler is dead code. Let's get rid of it. Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-05-08KVM: s390: factor out and optimize floating irq VCPU kickDavid Hildenbrand1-28/+46
This patch factors out the search for a floating irq destination VCPU as well as the kicking of the found VCPU. The search is optimized in the following ways: 1. stopped VCPUs can't take any floating interrupts, so try to find an operating one. We have to take care of the special case where all VCPUs are stopped and we don't have any valid destination. 2. use online_vcpus, not KVM_MAX_VCPU. This speeds up the search especially if KVM_MAX_VCPU is increased one day. As these VCPU objects are initialized prior to increasing online_vcpus, we can be sure that they exist. Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Reviewed-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-05-08KVM: s390: optimize interrupt handling round trip timeJens Freimann1-7/+4
We can avoid checking guest control registers and guest PSW as well as all the masking and calculations on the interrupt masks when no interrupts are pending. Also, the check for IRQ_PEND_COUNT can be removed, because we won't enter the while loop if no interrupts are pending and invalid interrupt types can't be injected. Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-05-08KVM: s390: provide functions for blocking all CPUsChristian Borntraeger2-7/+26
Some updates to the control blocks need to be done in a way that ensures that no CPU is within SIE. Provide wrappers around the s390_vcpu_block functions and adopt the TOD migration code to update in a guaranteed fashion. Also rename these functions to have the kvm_s390_ prefix as everything else. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
2015-05-08KVM: s390: make exit_sie_sync more robustChristian Borntraeger4-13/+22
exit_sie_sync is used to kick CPUs out of SIE and prevent reentering at any point in time. This is used to reload the prefix pages and to set the IBS stuff in a way that guarantees that after this function returns we are no longer in SIE. All current users trigger KVM requests. The request must be set before we block the CPUs to avoid races. Let's make this implicit by adding the request into a new function kvm_s390_sync_requests that replaces exit_sie_sync and split out s390_vcpu_block and s390_vcpu_unblock, that can be used to keep CPUs out of SIE independent of requests. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2015-05-08KVM: s390: Enable guest EDAT2 supportGuenther Hutzl3-4/+11
1. Enable EDAT2 in the list of KVM facilities 2. Handle 2G frames in pfmf instruction If we support EDAT2, we may enable handling of 2G frames if not in 24 bit mode. 3. Enable EDAT2 in sie_block If the EDAT2 facility is available we enable GED2 mode control in the sie_block. Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Guenther Hutzl <hutzl@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-05-08KVM: s390: make EDAT1 depend on host supportGuenther Hutzl1-2/+5
We should only enable EDAT1 for the guest if the host actually supports it and the cpu model for the guest has EDAT-1 enabled. Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Guenther Hutzl <hutzl@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-05-08KVM: s390: optimize round trip time in request handlingChristian Borntraeger1-0/+2
The fast path for a sie exit is that no kvm reqest is pending. Make an early check to skip all single bit checks. Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-05-08KVM: s390: fix external call injection without sigp interpretationDavid Hildenbrand1-1/+1
Commit ea5f49692575 ("KVM: s390: only one external call may be pending at a time") introduced a bug on machines that don't have SIGP interpretation facility installed. The injection of an external call will now always fail with -EBUSY (if none is already pending). This leads to the following symptoms: - An external call will be injected but with the wrong "src cpu id", as this id will not be remembered. - The target vcpu will not be woken up, therefore the guest will hang if it cannot deal with unexpected failures of the SIGP EXTERNAL CALL instruction. - If an external call is already pending, -EBUSY will not be reported. Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Cc: stable@vger.kernel.org # v4.0 Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-05-08KVM: MMU: fix CR4.SMEP=1, CR0.WP=0 with shadow pagesPaolo Bonzini1-1/+1
smep_andnot_wp is initialized in kvm_init_shadow_mmu and shadow pages should not be reused for different values of it. Thus, it has to be added to the mask in kvm_mmu_pte_write. Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-08KVM: MMU: fix smap permission checkXiao Guangrong2-0/+9
Current permission check assumes that RSVD bit in PFEC is always zero, however, it is not true since MMIO #PF will use it to quickly identify MMIO access Fix it by clearing the bit if walking guest page table is needed Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-08KVM: remove pointless cpu hotplug messagesHeiko Carstens1-6/+0
On cpu hotplug only KVM emits an unconditional message that its notifier has been called. It certainly can be assumed that calling cpu hotplug notifiers work, therefore there is no added value if KVM prints a message. If an error happens on cpu online KVM will still emit a warning. So let's remove this superfluous message. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-08KVM: nVMX: Fix host crash when loading MSRs with userspace irqchipJan Kiszka1-3/+2
vcpu->arch.apic is NULL when a userspace irqchip is active. But instead of letting the test incorrectly depend on in-kernel irqchip mode, open-code it to catch also userspace x2APICs. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-08KVM: x86: Call-far should not be emulated as stack opNadav Amit1-1/+1
Far call in 64-bit has a 32-bit operand size. Remove the marking of this operation as Stack so it can be emulated correctly in 64-bit. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-08KVM: reuse memslot in kvm_write_guest_pageRadim Krčmář1-2/+4
Caching memslot value and using mark_page_dirty_in_slot() avoids another O(log N) search when dirtying the page. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Message-Id: <1428695247-27603-1-git-send-email-rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-07KVM: x86: dump VMCS on invalid entryPaolo Bonzini1-0/+153
Code and format roughly based on Xen's vmcs_dump_vcpu. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-07x86: kvmclock: drop rdtsc_barrier()Marcelo Tosatti1-1/+0
Drop unnecessary rdtsc_barrier(), as has been determined empirically, see 057e6a8c660e95c3f4e7162e00e2fee1fc90c50d for details. Noticed by Andy Lutomirski. Improves clock_gettime() by approximately 15% on Intel i7-3520M @ 2.90GHz. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-07KVM: x86: drop unneeded null testJulia Lawall1-5/+3
If the null test is needed, the call to cancel_delayed_work_sync would have already crashed. Normally, the destroy function should only be called if the init function has succeeded, in which case ioapic is not null. Problem found using Coccinelle. Suggested-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-07KVM: x86: fix initial PAT valueRadim Krčmář4-11/+7
PAT should be 0007_0406_0007_0406h on RESET and not modified on INIT. VMX used a wrong value (host's PAT) and while SVM used the right one, it never got to arch.pat. This is not an issue with QEMU as it will force the correct value. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-07kvm,x86: load guest FPU context more eagerlyRik van Riel2-2/+14
Currently KVM will clear the FPU bits in CR0.TS in the VMCS, and trap to re-load them every time the guest accesses the FPU after a switch back into the guest from the host. This patch copies the x86 task switch semantics for FPU loading, with the FPU loaded eagerly after first use if the system uses eager fpu mode, or if the guest uses the FPU frequently. In the latter case, after loading the FPU for 255 times, the fpu_counter will roll over, and we will revert to loading the FPU on demand, until it has been established that the guest is still actively using the FPU. This mirrors the x86 task switch policy, which seems to work. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-07kvm: x86: Deliver MSI IRQ to only lowest prio cpu if msi_redir_hint is trueJames Sullivan3-8/+11
An MSI interrupt should only be delivered to the lowest priority CPU when it has RH=1, regardless of the delivery mode. Modified kvm_is_dm_lowest_prio() to check for either irq->delivery_mode == APIC_DM_LOWPRI or irq->msi_redir_hint. Moved kvm_is_dm_lowest_prio() into lapic.h and renamed to kvm_lowest_prio_delivery(). Changed a check in kvm_irq_delivery_to_apic_fast() from irq->delivery_mode == APIC_DM_LOWPRI to kvm_is_dm_lowest_prio(). Signed-off-by: James Sullivan <sullivan.james.f@gmail.com> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>