summaryrefslogtreecommitdiffstats
path: root/virt
AgeCommit message (Collapse)AuthorFilesLines
2015-09-14KVM: fix polling for guest halt continued even if disable itWanpeng Li1-1/+2
If there is already some polling ongoing, it's impossible to disable the polling, since as soon as somebody sets halt_poll_ns to 0, polling will never stop, as grow and shrink are only handled if halt_poll_ns is != 0. This patch fix it by reset vcpu->halt_poll_ns in order to stop polling when polling is disabled. Reported-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-09-10Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-0/+31
Merge third patch-bomb from Andrew Morton: - even more of the rest of MM - lib/ updates - checkpatch updates - small changes to a few scruffy filesystems - kmod fixes/cleanups - kexec updates - a dma-mapping cleanup series from hch * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (81 commits) dma-mapping: consolidate dma_set_mask dma-mapping: consolidate dma_supported dma-mapping: cosolidate dma_mapping_error dma-mapping: consolidate dma_{alloc,free}_noncoherent dma-mapping: consolidate dma_{alloc,free}_{attrs,coherent} mm: use vma_is_anonymous() in create_huge_pmd() and wp_huge_pmd() mm: make sure all file VMAs have ->vm_ops set mm, mpx: add "vm_flags_t vm_flags" arg to do_mmap_pgoff() mm: mark most vm_operations_struct const namei: fix warning while make xmldocs caused by namei.c ipc: convert invalid scenarios to use WARN_ON zlib_deflate/deftree: remove bi_reverse() lib/decompress_unlzma: Do a NULL check for pointer lib/decompressors: use real out buf size for gunzip with kernel fs/affs: make root lookup from blkdev logical size sysctl: fix int -> unsigned long assignments in INT_MIN case kexec: export KERNEL_IMAGE_SIZE to vmcoreinfo kexec: align crash_notes allocation to make it be inside one physical page kexec: remove unnecessary test in kimage_alloc_crash_control_pages() kexec: split kexec_load syscall from kexec core code ...
2015-09-10mmu-notifier: add clear_young callbackVladimir Davydov1-0/+31
In the scope of the idle memory tracking feature, which is introduced by the following patch, we need to clear the referenced/accessed bit not only in primary, but also in secondary ptes. The latter is required in order to estimate wss of KVM VMs. At the same time we want to avoid flushing tlb, because it is quite expensive and it won't really affect the final result. Currently, there is no function for clearing pte young bit that would meet our requirements, so this patch introduces one. To achieve that we have to add a new mmu-notifier callback, clear_young, since there is no method for testing-and-clearing a secondary pte w/o flushing tlb. The new method is not mandatory and currently only implemented by KVM. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Reviewed-by: Andres Lagar-Cavilla <andreslc@google.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Greg Thelen <gthelen@google.com> Cc: Michel Lespinasse <walken@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-08kvm: irqchip: fix memory leakSudip Mukherjee1-2/+6
We were taking the exit path after checking ue->flags and return value of setup_routing_entry(), but 'e' was not freed incase of a failure. Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-09-06KVM: trace kvm_halt_poll_ns grow/shrinkWanpeng Li1-2/+6
Tracepoint for dynamic halt_pool_ns, fired on every potential change. Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-09-06KVM: dynamic halt-pollingWanpeng Li1-2/+51
There is a downside of always-poll since poll is still happened for idle vCPUs which can waste cpu usage. This patchset add the ability to adjust halt_poll_ns dynamically, to grow halt_poll_ns when shot halt is detected, and to shrink halt_poll_ns when long halt is detected. There are two new kernel parameters for changing the halt_poll_ns: halt_poll_ns_grow and halt_poll_ns_shrink. no-poll always-poll dynamic-poll ----------------------------------------------------------------------- Idle (nohz) vCPU %c0 0.15% 0.3% 0.2% Idle (250HZ) vCPU %c0 1.1% 4.6%~14% 1.2% TCP_RR latency 34us 27us 26.7us "Idle (X) vCPU %c0" is the percent of time the physical cpu spent in c0 over 60 seconds (each vCPU is pinned to a pCPU). (nohz) means the guest was tickless. (250HZ) means the guest was ticking at 250HZ. The big win is with ticking operating systems. Running the linux guest with nohz=off (and HZ=250), we save 3.4%~12.8% CPUs/second and get close to no-polling overhead levels by using the dynamic-poll. The savings should be even higher for higher frequency ticks. Suggested-by: David Matlack <dmatlack@google.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> [Simplify the patch. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-09-06KVM: make halt_poll_ns per-vCPUWanpeng Li1-2/+3
Change halt_poll_ns into per-VCPU variable, seeded from module parameter, to allow greater flexibility. Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-08-22Merge tag 'signed-kvm-ppc-next' of git://github.com/agraf/linux-2.6 into ↵Paolo Bonzini2-14/+10
kvm-queue Patch queue for ppc - 2015-08-22 Highlights for KVM PPC this time around: - Book3S: A few bug fixes - Book3S: Allow micro-threading on POWER8
2015-08-12KVM: arm/arm64: timer: Allow the timer to control the active stateMarc Zyngier1-7/+22
In order to remove the crude hack where we sneak the masked bit into the timer's control register, make use of the phys_irq_map API control the active state of the interrupt. This causes some limited changes to allow for potential error propagation. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-08-12KVM: arm/arm64: vgic: Prevent userspace injection of a mapped interruptMarc Zyngier1-33/+70
Virtual interrupts mapped to a HW interrupt should only be triggered from inside the kernel. Otherwise, you could end up confusing the kernel (and the GIC's) state machine. Rearrange the injection path so that kvm_vgic_inject_irq is used for non-mapped interrupts, and kvm_vgic_inject_mapped_irq is used for mapped interrupts. The latter should only be called from inside the kernel (timer, irqfd). Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-08-12KVM: arm/arm64: vgic: Add vgic_{get,set}_phys_irq_activeMarc Zyngier1-0/+24
In order to control the active state of an interrupt, introduce a pair of accessors allowing the state to be set/queried. This only affects the logical state, and the HW state will only be applied at world-switch time. Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-08-12KVM: arm/arm64: vgic: Allow HW interrupts to be queued to a guestMarc Zyngier1-3/+86
To allow a HW interrupt to be injected into a guest, we lookup the guest virtual interrupt in the irq_phys_map list, and if we have a match, encode both interrupts in the LR. We also mark the interrupt as "active" at the host distributor level. On guest EOI on the virtual interrupt, the host interrupt will be deactivated. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-08-12KVM: arm/arm64: vgic: Allow dynamic mapping of physical/virtual interruptsMarc Zyngier1-1/+208
In order to be able to feed physical interrupts to a guest, we need to be able to establish the virtual-physical mapping between the two worlds. The mappings are kept in a set of RCU lists, indexed by virtual interrupts. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-08-12KVM: arm/arm64: vgic: Relax vgic_can_sample_irq for edge IRQsMarc Zyngier1-1/+1
We only set the irq_queued flag for level interrupts, meaning that "!vgic_irq_is_queued(vcpu, irq)" is a good enough predicate for all interrupts. This will allow us to inject edge HW interrupts, for which the state ACTIVE+PENDING is not allowed. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-08-12KVM: arm/arm64: vgic: Allow HW irq to be encoded in LRMarc Zyngier2-4/+33
Now that struct vgic_lr supports the LR_HW bit and carries a hwirq field, we can encode that information into the list registers. This patch provides implementations for both GICv2 and GICv3. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-07-30KVM: document memory barriers for kvm->vcpus/kvm->online_vcpusPaolo Bonzini1-0/+5
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-07-29KVM: move code related to KVM_SET_BOOT_CPU_ID to x86Paolo Bonzini1-14/+0
This is another remnant of ia64 support. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-07-10KVM: count number of assigned devicesPaolo Bonzini1-0/+5
If there are no assigned devices, the guest PAT are not providing any useful information and can be overridden to writeback; VMX always does this because it has the "IPAT" bit in its extended page table entries, but SVM does not have anything similar. Hook into VFIO and legacy device assignment so that they provide this information to KVM. Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-07-03sched, preempt_notifier: separate notifier registration from static_key inc/decPeter Zijlstra1-0/+3
Commit 1cde2930e154 ("sched/preempt: Add static_key() to preempt_notifiers") had two problems. First, the preempt-notifier API needs to sleep with the addition of the static_key, we do however need to hold off preemption while modifying the preempt notifier list, otherwise a preemption could observe an inconsistent list state. KVM correctly registers and unregisters preempt notifiers with preemption disabled, so the sleep caused dmesg splats. Second, KVM registers and unregisters preemption notifiers very often (in vcpu_load/vcpu_put). With a single uniprocessor guest the static key would move between 0 and 1 continuously, hitting the slow path on every userspace exit. To fix this, wrap the static_key inc/dec in a new API, and call it from KVM. Fixes: 1cde2930e154 ("sched/preempt: Add static_key() to preempt_notifiers") Reported-by: Pontus Fuchs <pontus.fuchs@gmail.com> Reported-by: Takashi Iwai <tiwai@suse.de> Tested-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-24Merge tag 'arm64-upstream' of ↵Linus Torvalds1-3/+0
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: "Mostly refactoring/clean-up: - CPU ops and PSCI (Power State Coordination Interface) refactoring following the merging of the arm64 ACPI support, together with handling of Trusted (secure) OS instances - Using fixmap for permanent FDT mapping, removing the initial dtb placement requirements (within 512MB from the start of the kernel image). This required moving the FDT self reservation out of the memreserve processing - Idmap (1:1 mapping used for MMU on/off) handling clean-up - Removing flush_cache_all() - not safe on ARM unless the MMU is off. Last stages of CPU power down/up are handled by firmware already - "Alternatives" (run-time code patching) refactoring and support for immediate branch patching, GICv3 CPU interface access - User faults handling clean-up And some fixes: - Fix for VDSO building with broken ELF toolchains - Fix another case of init_mm.pgd usage for user mappings (during ASID roll-over broadcasting) - Fix for FPSIMD reloading after CPU hotplug - Fix for missing syscall trace exit - Workaround for .inst asm bug - Compat fix for switching the user tls tpidr_el0 register" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (42 commits) arm64: use private ratelimit state along with show_unhandled_signals arm64: show unhandled SP/PC alignment faults arm64: vdso: work-around broken ELF toolchains in Makefile arm64: kernel: rename __cpu_suspend to keep it aligned with arm arm64: compat: print compat_sp instead of sp arm64: mm: Fix freeing of the wrong memmap entries with !SPARSEMEM_VMEMMAP arm64: entry: fix context tracking for el0_sp_pc arm64: defconfig: enable memtest arm64: mm: remove reference to tlb.S from comment block arm64: Do not attempt to use init_mm in reset_context() arm64: KVM: Switch vgic save/restore to alternative_insn arm64: alternative: Introduce feature for GICv3 CPU interface arm64: psci: fix !CONFIG_HOTPLUG_CPU build warning arm64: fix bug for reloading FPSIMD state after CPU hotplug. arm64: kernel thread don't need to save fpsimd context. arm64: fix missing syscall trace exit arm64: alternative: Work around .inst assembler bugs arm64: alternative: Merge alternative-asm.h into alternative.h arm64: alternative: Allow immediate branch as alternative instruction arm64: Rework alternate sequence for ARM erratum 845719 ...
2015-06-19KVM: fix checkpatch.pl errors in kvm/coalesced_mmio.hKevin Mulvey1-2/+2
Tabs rather than spaces Signed-off-by: Kevin Mulvey <kmulvey@linux.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-19KVM: fix checkpatch.pl errors in kvm/async_pf.hKevin Mulvey1-2/+2
fix brace spacing Signed-off-by: Kevin Mulvey <kmulvey@linux.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-19kvm: irqchip: Break up high order allocations of kvm_irq_routing_tableJoerg Roedel1-8/+33
The allocation size of the kvm_irq_routing_table depends on the number of irq routing entries because they are all allocated with one kzalloc call. When the irq routing table gets bigger this requires high order allocations which fail from time to time: qemu-kvm: page allocation failure: order:4, mode:0xd0 This patch fixes this issue by breaking up the allocation of the table and its entries into individual kzalloc calls. These could all be satisfied with order-0 allocations, which are less likely to fail. The downside of this change is the lower performance, because of more calls to kzalloc. But given how often kvm_set_irq_routing is called in the lifetime of a guest, it doesn't really matter much. Signed-off-by: Joerg Roedel <jroedel@suse.de> [Avoid sparse warning through rcu_access_pointer. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-19Merge tag 'kvm-arm-for-4.2' of ↵Paolo Bonzini2-12/+51
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/ARM changes for v4.2: - Proper guest time accounting - FP access fix for 32bit - The usual pile of GIC fixes - PSCI fixes - Random cleanups
2015-06-18KVM: arm/arm64: vgic: Remove useless arm-gic.h #includeMarc Zyngier1-2/+0
Back in the days, vgic.c used to have an intimate knowledge of the actual GICv2. These days, this has been abstracted away into hardware-specific backends. Remove the now useless arm-gic.h #include directive, making it clear that GICv2 specific code doesn't belong here. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-06-17KVM: arm/arm64: vgic: Avoid injecting reserved IRQ numbersMarc Zyngier1-4/+1
Commit fd1d0ddf2ae9 (KVM: arm/arm64: check IRQ number on userland injection) rightly limited the range of interrupts userspace can inject in a guest, but failed to consider the (unlikely) case where a guest is configured with 1024 interrupts. In this case, interrupts ranging from 1020 to 1023 are unuseable, as they have a special meaning for the GIC CPU interface. Make sure that these number cannot be used as an IRQ. Also delete a redundant (and similarily buggy) check in kvm_set_irq. Reported-by: Peter Maydell <peter.maydell@linaro.org> Cc: Andre Przywara <andre.przywara@arm.com> Cc: <stable@vger.kernel.org> # 4.1, 4.0, 3.19, 3.18 Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-06-17KVM: arm: vgic: Drop useless Group0 warningMarc Zyngier1-2/+0
If a GICv3-enabled guest tries to configure Group0, we print a warning on the console (because we don't support Group0 interrupts). This is fairly pointless, and would allow a guest to spam the console. Let's just drop the warning. Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-06-12arm64: KVM: Switch vgic save/restore to alternative_insnMarc Zyngier1-3/+0
So far, we configured the world-switch by having a small array of pointers to the save and restore functions, depending on the GIC used on the platform. Loading these values each time is a bit silly (they never change), and it makes sense to rely on the instruction patching instead. This leads to a nice cleanup of the code. Acked-by: Will Deacon <will.deacon@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-06-09KVM: arm64: add active register handling to GICv3 emulation as wellAndre Przywara1-4/+50
Commit 47a98b15ba7c ("arm/arm64: KVM: support for un-queuing active IRQs") introduced handling of the GICD_I[SC]ACTIVER registers, but only for the GICv2 emulation. For the sake of completeness and as this is a pre-requisite for save/restore of the GICv3 distributor state, we should also emulate their handling in the distributor and redistributor frames of an emulated GICv3. Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-06-05KVM: implement multiple address spacesPaolo Bonzini1-27/+43
Only two ioctls have to be modified; the address space id is placed in the higher 16 bits of their slot id argument. As of this patch, no architecture defines more than one address space; x86 will be the first. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-05KVM: add vcpu-specific functions to read/write/translate GFNsPaolo Bonzini1-13/+154
We need to hide SMRAM from guests not running in SMM. Therefore, all uses of kvm_read_guest* and kvm_write_guest* must be changed to use different address spaces, depending on whether the VCPU is in system management mode. We need to introduce a new family of functions for this purpose. For now, the VCPU-based functions have the same behavior as the existing per-VM ones, they just accept a different type for the first argument. Later however they will be changed to use one of many "struct kvm_memslots" stored in struct kvm, through an architecture hook. VM-based functions will unconditionally use the first memslots pointer. Whenever possible, this patch introduces slot-based functions with an __ prefix, with two wrappers for generic and vcpu-based actions. The exceptions are kvm_read_guest and kvm_write_guest, which are copied into the new functions kvm_vcpu_read_guest and kvm_vcpu_write_guest. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-28KVM: remove unused argument from mark_page_dirty_in_slotPaolo Bonzini1-7/+5
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-28KVM: remove __gfn_to_pfnPaolo Bonzini1-24/+15
Most of the function that wrap it can be rewritten without it, except for gfn_to_pfn_prot. Just inline it into gfn_to_pfn_prot, and rewrite the other function on top of gfn_to_pfn_memslot*. Reviewed-by: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-28KVM: pass kvm_memory_slot to gfn_to_page_many_atomicPaolo Bonzini1-3/+3
The memory slot is already available from gfn_to_memslot_dirty_bitmap. Isn't it a shame to look it up again? Plus, it makes gfn_to_page_many_atomic agnostic of multiple VCPU address spaces. Reviewed-by: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-28KVM: add "new" argument to kvm_arch_commit_memory_regionPaolo Bonzini1-1/+1
This lets the function access the new memory slot without going through kvm_memslots and id_to_memslot. It will simplify the code when more than one address space will be supported. Unfortunately, the "const"ness of the new argument must be casted away in two places. Fixing KVM to accept const struct kvm_memory_slot pointers would require modifications in pretty much all architectures, and is left for later. Reviewed-by: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-26KVM: add memslots argument to kvm_arch_memslots_updatedPaolo Bonzini1-1/+1
Prepare for the case of multiple address spaces. Reviewed-by: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-26KVM: const-ify uses of struct kvm_userspace_memory_regionPaolo Bonzini1-11/+11
Architecture-specific helpers are not supposed to muck with struct kvm_userspace_memory_region contents. Add const to enforce this. In order to eliminate the only write in __kvm_set_memory_region, the cleaning of deleted slots is pulled up from update_memslots to __kvm_set_memory_region. Reviewed-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Reviewed-by: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-26KVM: use kvm_memslots whenever possiblePaolo Bonzini1-6/+10
kvm_memslots provides lockdep checking. Use it consistently instead of explicit dereferencing of kvm->memslots. Reviewed-by: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-26KVM: introduce kvm_alloc/free_memslotsPaolo Bonzini1-49/+55
kvm_alloc_memslots is extracted out of previously scattered code that was in kvm_init_memslots_id and kvm_create_vm. kvm_free_memslot and kvm_free_memslots are new names of kvm_free_physmem and kvm_free_physmem_slot, but they also take an explicit pointer to struct kvm_memslots. This will simplify the transition to multiple address spaces, each represented by one pointer to struct kvm_memslots. Reviewed-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Reviewed-by: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-19KVM: export __gfn_to_pfn_memslot, drop gfn_to_pfn_asyncPaolo Bonzini1-18/+8
gfn_to_pfn_async is used in just one place, and because of x86-specific treatment that place will need to look at the memory slot. Hence inline it into try_async_pf and export __gfn_to_pfn_memslot. The patch also switches the subsequent call to gfn_to_pfn_prot to use __gfn_to_pfn_memslot. This is a small optimization. Finally, remove the now-unused async argument of __gfn_to_pfn. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-08KVM: remove pointless cpu hotplug messagesHeiko Carstens1-6/+0
On cpu hotplug only KVM emits an unconditional message that its notifier has been called. It certainly can be assumed that calling cpu hotplug notifiers work, therefore there is no added value if KVM prints a message. If an error happens on cpu online KVM will still emit a warning. So let's remove this superfluous message. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-08KVM: reuse memslot in kvm_write_guest_pageRadim Krčmář1-2/+4
Caching memslot value and using mark_page_dirty_in_slot() avoids another O(log N) search when dirtying the page. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Message-Id: <1428695247-27603-1-git-send-email-rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-04-22Merge tag 'kvm-arm-for-4.1-take2' of ↵Paolo Bonzini1-1/+4
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master KVM/ARM changes for v4.1, take #2: Rather small this time: - a fix for a nasty bug with virtual IRQ injection - a fix for irqfd
2015-04-22KVM: arm/arm64: check IRQ number on userland injectionAndre Przywara1-0/+3
When userland injects a SPI via the KVM_IRQ_LINE ioctl we currently only check it against a fixed limit, which historically is set to 127. With the new dynamic IRQ allocation the effective limit may actually be smaller (64). So when now a malicious or buggy userland injects a SPI in that range, we spill over on our VGIC bitmaps and bytemaps memory. I could trigger a host kernel NULL pointer dereference with current mainline by injecting some bogus IRQ number from a hacked kvmtool: ----------------- .... DEBUG: kvm_vgic_inject_irq(kvm, cpu=0, irq=114, level=1) DEBUG: vgic_update_irq_pending(kvm, cpu=0, irq=114, level=1) DEBUG: IRQ #114 still in the game, writing to bytemap now... Unable to handle kernel NULL pointer dereference at virtual address 00000000 pgd = ffffffc07652e000 [00000000] *pgd=00000000f658b003, *pud=00000000f658b003, *pmd=0000000000000000 Internal error: Oops: 96000006 [#1] PREEMPT SMP Modules linked in: CPU: 1 PID: 1053 Comm: lkvm-msi-irqinj Not tainted 4.0.0-rc7+ #3027 Hardware name: FVP Base (DT) task: ffffffc0774e9680 ti: ffffffc0765a8000 task.ti: ffffffc0765a8000 PC is at kvm_vgic_inject_irq+0x234/0x310 LR is at kvm_vgic_inject_irq+0x30c/0x310 pc : [<ffffffc0000ae0a8>] lr : [<ffffffc0000ae180>] pstate: 80000145 ..... So this patch fixes this by checking the SPI number against the actual limit. Also we remove the former legacy hard limit of 127 in the ioctl code. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> CC: <stable@vger.kernel.org> # 4.0, 3.19, 3.18 [maz: wrap KVM_ARM_IRQ_GIC_MAX with #ifndef __KERNEL__, as suggested by Christopher Covington] Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-04-22KVM: arm: irqfd: fix value returned by kvm_irq_map_gsiEric Auger1-1/+1
irqfd/arm curently does not support routing. kvm_irq_map_gsi is supposed to return all the routing entries associated with the provided gsi and return the number of those entries. We should return 0 at this point. Signed-off-by: Eric Auger <eric.auger@linaro.org> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-04-21KVM: PPC: Book3S HV: Create debugfs file for each guest's HPTPaul Mackerras1-0/+1
This creates a debugfs directory for each HV guest (assuming debugfs is enabled in the kernel config), and within that directory, a file by which the contents of the guest's HPT (hashed page table) can be read. The directory is named vmnnnn, where nnnn is the PID of the process that created the guest. The file is named "htab". This is intended to help in debugging problems in the host's management of guest memory. The contents of the file consist of a series of lines like this: 3f48 4000d032bf003505 0000000bd7ff1196 00000003b5c71196 The first field is the index of the entry in the HPT, the second and third are the HPT entry, so the third entry contains the real page number that is mapped by the entry if the entry's valid bit is set. The fourth field is the guest's view of the second doubleword of the entry, so it contains the guest physical address. (The format of the second through fourth fields are described in the Power ISA and also in arch/powerpc/include/asm/mmu-hash64.h.) Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2015-04-13Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds10-453/+661
Pull KVM updates from Paolo Bonzini: "First batch of KVM changes for 4.1 The most interesting bit here is irqfd/ioeventfd support for ARM and ARM64. Summary: ARM/ARM64: fixes for live migration, irqfd and ioeventfd support (enabling vhost, too), page aging s390: interrupt handling rework, allowing to inject all local interrupts via new ioctl and to get/set the full local irq state for migration and introspection. New ioctls to access memory by virtual address, and to get/set the guest storage keys. SIMD support. MIPS: FPU and MIPS SIMD Architecture (MSA) support. Includes some patches from Ralf Baechle's MIPS tree. x86: bugfixes (notably for pvclock, the others are small) and cleanups. Another small latency improvement for the TSC deadline timer" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (146 commits) KVM: use slowpath for cross page cached accesses kvm: mmu: lazy collapse small sptes into large sptes KVM: x86: Clear CR2 on VCPU reset KVM: x86: DR0-DR3 are not clear on reset KVM: x86: BSP in MSR_IA32_APICBASE is writable KVM: x86: simplify kvm_apic_map KVM: x86: avoid logical_map when it is invalid KVM: x86: fix mixed APIC mode broadcast KVM: x86: use MDA for interrupt matching kvm/ppc/mpic: drop unused IRQ_testbit KVM: nVMX: remove unnecessary double caching of MAXPHYADDR KVM: nVMX: checks for address bits beyond MAXPHYADDR on VM-entry KVM: x86: cache maxphyaddr CPUID leaf in struct kvm_vcpu KVM: vmx: pass error code with internal error #2 x86: vdso: fix pvclock races with task migration KVM: remove kvm_read_hva and kvm_read_hva_atomic KVM: x86: optimize delivery of TSC deadline timer interrupt KVM: x86: extract blocking logic from __vcpu_run kvm: x86: fix x86 eflags fixed bit KVM: s390: migrate vcpu interrupt state ...
2015-04-10KVM: use slowpath for cross page cached accessesRadim Krčmář1-2/+2
kvm_write_guest_cached() does not mark all written pages as dirty and code comments in kvm_gfn_to_hva_cache_init() talk about NULL memslot with cross page accesses. Fix all the easy way. The check is '<= 1' to have the same result for 'len = 0' cache anywhere in the page. (nr_pages_needed is 0 on page boundary.) Fixes: 8f964525a121 ("KVM: Allow cross page reads and writes from cached translations.") Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Message-Id: <20150408121648.GA3519@potion.brq.redhat.com> Reviewed-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-04-08KVM: remove kvm_read_hva and kvm_read_hva_atomicPaolo Bonzini1-12/+2
The corresponding write functions just use __copy_to_user. Do the same on the read side. This reverts what's left of commit 86ab8cffb498 (KVM: introduce gfn_to_hva_read/kvm_read_hva/kvm_read_hva_atomic, 2012-08-21) Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <1427976500-28533-1-git-send-email-pbonzini@redhat.com>
2015-04-07Merge tag 'kvm-s390-next-20150331' of ↵Paolo Bonzini1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD Features and fixes for 4.1 (kvm/next) 1. Assorted changes 1.1 allow more feature bits for the guest 1.2 Store breaking event address on program interrupts 2. Interrupt handling rework 2.1 Fix copy_to_user while holding a spinlock (cc stable) 2.2 Rework floating interrupts to follow the priorities 2.3 Allow to inject all local interrupts via new ioctl 2.4 allow to get/set the full local irq state, e.g. for migration and introspection