summaryrefslogtreecommitdiffstats
path: root/virt
AgeCommit message (Collapse)AuthorFilesLines
2013-10-15KVM: Drop FOLL_GET in GUP when doing async page faultchai wen1-12/+5
Page pinning is not mandatory in kvm async page fault processing since after async page fault event is delivered to a guest it accesses page once again and does its own GUP. Drop the FOLL_GET flag in GUP in async_pf code, and do some simplifying in check/clear processing. Suggested-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Gu zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: chai wen <chaiw.fnst@cn.fujitsu.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-10-03virt/kvm/iommu.c: Add leading zeros to device's BDF notation in debug messagesAndre Richter1-10/+2
When KVM (de)assigns PCI(e) devices to VMs, a debug message is printed including the BDF notation of the respective device. Currently, the BDF notation does not have the commonly used leading zeros. This produces messages like "assign device 0:1:8.0", which look strange at first sight. The patch fixes this by exchanging the printk(KERN_DEBUG ...) with dev_info() and also inserts "kvm" into the debug message, so that it is obvious where the message comes from. Also reduces LoC. Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Andre Richter <andre.o.richter@gmail.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-10-03Fix NULL dereference in gfn_to_hva_prot()Gleb Natapov1-2/+4
gfn_to_memslot() can return NULL or invalid slot. We need to check slot validity before accessing it. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-09-30KVM: Convert kvm_lock back to non-raw spinlockPaolo Bonzini1-9/+9
In commit e935b8372cf8 ("KVM: Convert kvm_lock to raw_spinlock"), the kvm_lock was made a raw lock. However, the kvm mmu_shrink() function tries to grab the (non-raw) mmu_lock within the scope of the raw locked kvm_lock being held. This leads to the following: BUG: sleeping function called from invalid context at kernel/rtmutex.c:659 in_atomic(): 1, irqs_disabled(): 0, pid: 55, name: kswapd0 Preemption disabled at:[<ffffffffa0376eac>] mmu_shrink+0x5c/0x1b0 [kvm] Pid: 55, comm: kswapd0 Not tainted 3.4.34_preempt-rt Call Trace: [<ffffffff8106f2ad>] __might_sleep+0xfd/0x160 [<ffffffff817d8d64>] rt_spin_lock+0x24/0x50 [<ffffffffa0376f3c>] mmu_shrink+0xec/0x1b0 [kvm] [<ffffffff8111455d>] shrink_slab+0x17d/0x3a0 [<ffffffff81151f00>] ? mem_cgroup_iter+0x130/0x260 [<ffffffff8111824a>] balance_pgdat+0x54a/0x730 [<ffffffff8111fe47>] ? set_pgdat_percpu_threshold+0xa7/0xd0 [<ffffffff811185bf>] kswapd+0x18f/0x490 [<ffffffff81070961>] ? get_parent_ip+0x11/0x50 [<ffffffff81061970>] ? __init_waitqueue_head+0x50/0x50 [<ffffffff81118430>] ? balance_pgdat+0x730/0x730 [<ffffffff81060d2b>] kthread+0xdb/0xe0 [<ffffffff8106e122>] ? finish_task_switch+0x52/0x100 [<ffffffff817e1e94>] kernel_thread_helper+0x4/0x10 [<ffffffff81060c50>] ? __init_kthread_worker+0x After the previous patch, kvm_lock need not be a raw spinlock anymore, so change it back. Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: kvm@vger.kernel.org Cc: gleb@redhat.com Cc: jan.kiszka@siemens.com Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-09-30KVM: protect kvm_usage_count with its own spinlockPaolo Bonzini1-9/+10
The VM list need not be protected by a raw spinlock. Separate the two so that kvm_lock can be made non-raw. Cc: kvm@vger.kernel.org Cc: gleb@redhat.com Cc: jan.kiszka@siemens.com Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-09-30KVM: cleanup (physical) CPU hotplugPaolo Bonzini1-9/+8
Remove the useless argument, and do not do anything if there are no VMs running at the time of the hotplug. Cc: kvm@vger.kernel.org Cc: gleb@redhat.com Cc: jan.kiszka@siemens.com Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-09-24kvm: remove .done from struct kvm_async_pfRadim Krčmář1-4/+1
'.done' is used to mark the completion of 'async_pf_execute()', but 'cancel_work_sync()' returns true when the work was canceled, so we use it instead. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-09-17kvm: free resources after canceling async_pfRadim Krčmář1-1/+4
When we cancel 'async_pf_execute()', we should behave as if the work was never scheduled in 'kvm_setup_async_pf()'. Fixes a bug when we can't unload module because the vm wasn't destroyed. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-09-17KVM: mmu: allow page tables to be in read-only slotsPaolo Bonzini1-5/+9
Page tables in a read-only memory slot will currently cause a triple fault because the page walker uses gfn_to_hva and it fails on such a slot. OVMF uses such a page table; however, real hardware seems to be fine with that as long as the accessed/dirty bits are set. Save whether the slot is readonly, and later check it when updating the accessed and dirty bits. Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-09-05Merge branch 'for-linus' of ↵Linus Torvalds1-10/+10
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs pile 1 from Al Viro: "Unfortunately, this merge window it'll have a be a lot of small piles - my fault, actually, for not keeping #for-next in anything that would resemble a sane shape ;-/ This pile: assorted fixes (the first 3 are -stable fodder, IMO) and cleanups + %pd/%pD formats (dentry/file pathname, up to 4 last components) + several long-standing patches from various folks. There definitely will be a lot more (starting with Miklos' check_submount_and_drop() series)" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits) direct-io: Handle O_(D)SYNC AIO direct-io: Implement generic deferred AIO completions add formats for dentry/file pathnames kvm eventfd: switch to fdget powerpc kvm: use fdget switch fchmod() to fdget switch epoll_ctl() to fdget switch copy_module_from_fd() to fdget git simplify nilfs check for busy subtree ibmasmfs: don't bother passing superblock when not needed don't pass superblock to hypfs_{mkdir,create*} don't pass superblock to hypfs_diag_create_files don't pass superblock to hypfs_vm_create_files() oprofile: get rid of pointless forward declarations of struct super_block oprofilefs_create_...() do not need superblock argument oprofilefs_mkdir() doesn't need superblock argument don't bother with passing superblock to oprofile_create_stats_files() oprofile: don't bother with passing superblock to ->create_files() don't bother passing sb to oprofile_create_files() coh901318: don't open-code simple_read_from_buffer() ...
2013-09-03kvm eventfd: switch to fdgetAl Viro1-10/+10
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-08-30ARM: KVM: Bugfix: vgic_bytemap_get_reg per cpu regsChristoffer Dall1-1/+1
For bytemaps each IRQ field is 1 byte wide, so we pack 4 irq fields in one word and since there are 32 private (per cpu) irqs, we have 8 private u32 fields on the vgic_bytemap struct. We shift the offset from the base of the register group right by 2, giving us the word index instead of the field index. But then there are 8 private words, not 4, which is also why we subtract 8 words from the offset of the shared words. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-08-30ARM: KVM: vgic: fix GICD_ICFGRn accessMarc Zyngier1-2/+6
All the code in handle_mmio_cfg_reg() assumes the offset has been shifted right to accomodate for the 2:1 bit compression, but this is only done when getting the register address. Shift the offset early so the code works mostly unchanged. Reported-by: Zhaobo (Bob, ERC) <zhaobo@huawei.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-08-30ARM: KVM: vgic: simplify vgic_get_target_regMarc Zyngier1-9/+3
vgic_get_target_reg is quite complicated, for no good reason. Actually, it is fairly easy to write it in a much more efficient way by using the target CPU array instead of the bitmap. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-08-28KVM: rename __kvm_io_bus_sort_cmp to kvm_io_bus_cmpPaolo Bonzini1-8/+8
This is the type-safe comparison function, so the double-underscore is not related. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-08-27kvm: optimize away THP checks in kvm_is_mmio_pfn()Andrea Arcangeli1-22/+2
The checks on PG_reserved in the page structure on head and tail pages aren't necessary because split_huge_page wouldn't transfer the PG_reserved bit from head to tail anyway. This was a forward-thinking check done in the case PageReserved was set by a driver-owned page mapped in userland with something like remap_pfn_range in a VM_PFNMAP region, but using hugepmds (not possible right now). It was meant to be very safe, but it's overkill as it's unlikely split_huge_page could ever run without the driver noticing and tearing down the hugepage itself. And if a driver in the future will really want to map a reserved hugepage in userland using an huge pmd it should simply take care of marking all subpages reserved too to keep KVM safe. This of course would require such a hypothetical driver to tear down the huge pmd itself and splitting the hugepage itself, instead of relaying on split_huge_page, but that sounds very reasonable, especially considering split_huge_page wouldn't currently transfer the reserved bit anyway. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-08-26kvm: use anon_inode_getfd() with O_CLOEXEC flagYann Droneaud1-3/+3
KVM uses anon_inode_get() to allocate file descriptors as part of some of its ioctls. But those ioctls are lacking a flag argument allowing userspace to choose options for the newly opened file descriptor. In such case it's advised to use O_CLOEXEC by default so that userspace is allowed to choose, without race, if the file descriptor is going to be inherited across exec(). This patch set O_CLOEXEC flag on all file descriptors created with anon_inode_getfd() to not leak file descriptors across exec(). Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Link: http://lkml.kernel.org/r/cover.1377372576.git.ydroneaud@opteya.com Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-07-29KVM: introduce __kvm_io_bus_sort_cmpPaolo Bonzini1-9/+12
kvm_io_bus_sort_cmp is used also directly, not just as a callback for sort and bsearch. In these cases, it is handy to have a type-safe variant. This patch introduces such a variant, __kvm_io_bus_sort_cmp, and uses it throughout kvm_main.c. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-07-18KVM: Introduce kvm_arch_memslots_updated()Takuya Yoshikawa1-1/+4
This is called right after the memslots is updated, i.e. when the result of update_memslots() gets installed in install_new_memslots(). Since the memslots needs to be updated twice when we delete or move a memslot, kvm_arch_commit_memory_region() does not correspond to this exactly. In the following patch, x86 will use this new API to check if the mmio generation has reached its maximum value, in which case mmio sptes need to be flushed out. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Acked-by: Alexander Graf <agraf@suse.de> Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-07-18KVM: kvm-io: support cookiesCornelia Huck1-16/+92
Add new functions kvm_io_bus_{read,write}_cookie() that allows users of the kvm io infrastructure to use a cookie value to speed up lookup of a device on an io bus. Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-07-03Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds4-1/+1788
Pull KVM fixes from Paolo Bonzini: "On the x86 side, there are some optimizations and documentation updates. The big ARM/KVM change for 3.11, support for AArch64, will come through Catalin Marinas's tree. s390 and PPC have misc cleanups and bugfixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (87 commits) KVM: PPC: Ignore PIR writes KVM: PPC: Book3S PR: Invalidate SLB entries properly KVM: PPC: Book3S PR: Allow guest to use 1TB segments KVM: PPC: Book3S PR: Don't keep scanning HPTEG after we find a match KVM: PPC: Book3S PR: Fix invalidation of SLB entry 0 on guest entry KVM: PPC: Book3S PR: Fix proto-VSID calculations KVM: PPC: Guard doorbell exception with CONFIG_PPC_DOORBELL KVM: Fix RTC interrupt coalescing tracking kvm: Add a tracepoint write_tsc_offset KVM: MMU: Inform users of mmio generation wraparound KVM: MMU: document fast invalidate all mmio sptes KVM: MMU: document fast invalidate all pages KVM: MMU: document fast page fault KVM: MMU: document mmio page fault KVM: MMU: document write_flooding_count KVM: MMU: document clear_spte_count KVM: MMU: drop kvm_mmu_zap_mmio_sptes KVM: MMU: init kvm generation close to mmio wrap-around value KVM: MMU: add tracepoint for check_mmio_spte KVM: MMU: fast invalidate all mmio sptes ...
2013-06-27Merge git://git.linaro.org/people/cdall/linux-kvm-arm.git tags/kvm-arm-3.11 ↵Gleb Natapov1-9/+20
into queue KVM/ARM pull request for 3.11 merge window * tag 'kvm-arm-3.11' of git://git.linaro.org/people/cdall/linux-kvm-arm.git: ARM: kvm: don't include drivers/virtio/Kconfig Update MAINTAINERS: KVM/ARM work now funded by Linaro arm/kvm: Cleanup KVM_ARM_MAX_VCPUS logic ARM: KVM: clear exclusive monitor on all exception returns ARM: KVM: add missing dsb before invalidating Stage-2 TLBs ARM: KVM: perform save/restore of PAR ARM: KVM: get rid of S2_PGD_SIZE ARM: KVM: don't special case PC when doing an MMIO ARM: KVM: use phys_addr_t instead of unsigned long long for HYP PGDs ARM: KVM: remove dead prototype for __kvm_tlb_flush_vmid ARM: KVM: Don't handle PSCI calls via SMC ARM: KVM: Allow host virt timer irq to be different from guest timer virt irq
2013-06-27KVM: Fix RTC interrupt coalescing trackingGleb Natapov1-6/+3
This reverts most of the f1ed0450a5fac7067590317cbf027f566b6ccbca. After the commit kvm_apic_set_irq() no longer returns accurate information about interrupt injection status if injection is done into disabled APIC. RTC interrupt coalescing tracking relies on the information to be accurate and cannot recover if it is not. Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-26ARM: KVM: Allow host virt timer irq to be different from guest timer virt irqAnup Patel1-9/+20
The arch_timer irq numbers (or PPI numbers) are implementation dependent, so the host virtual timer irq number can be different from guest virtual timer irq number. This patch ensures that host virtual timer irq number is read from DTB and guest virtual timer irq is determined based on vcpu target type. Signed-off-by: Anup Patel <anup.patel@linaro.org> Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org> Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
2013-06-04kvm: exclude ioeventfd from counting kvm_io_range limitAmos Kong2-1/+4
We can easily reach the 1000 limit by start VM with a couple hundred I/O devices (multifunction=on). The hardcode limit already been adjusted 3 times (6 ~ 200 ~ 300 ~ 1000). In userspace, we already have maximum file descriptor to limit ioeventfd count. But kvm_io_bus devices also are used for pit, pic, ioapic, coalesced_mmio. They couldn't be limited by maximum file descriptor. Currently only ioeventfds take too much kvm_io_bus devices, so just exclude it from counting kvm_io_range limit. Also fixed one indent issue in kvm_host.h Signed-off-by: Amos Kong <akong@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-05-19ARM: KVM: move GIC/timer code to a common locationMarc Zyngier2-0/+1771
As KVM/arm64 is looming on the horizon, it makes sense to move some of the common code to a single location in order to reduce duplication. The code could live anywhere. Actually, most of KVM is already built with a bunch of ugly ../../.. hacks in the various Makefiles, so we're not exactly talking about style here. But maybe it is time to start moving into a less ugly direction. The include files must be in a "public" location, as they are accessed from non-KVM files (arch/arm/kernel/asm-offsets.c). For this purpose, introduce two new locations: - virt/kvm/arm/ : x86 and ia64 already share the ioapic code in virt/kvm, so this could be seen as a (very ugly) precedent. - include/kvm/ : there is already an include/xen, and while the intent is slightly different, this seems as good a location as any Eventually, we should probably have independant Makefiles at every levels (just like everywhere else in the kernel), but this is just the first step. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-05-14KVM: x86: Remove support for reporting coalesced APIC IRQsJan Kiszka1-3/+6
Since the arrival of posted interrupt support we can no longer guarantee that coalesced IRQs are always reported to the IRQ source. Moreover, accumulated APIC timer events could cause a busy loop when a VCPU should rather be halted. The consensus is to remove coalesced tracking from the LAPIC. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Acked-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-05-12KVM: add missing misc_deregister() on error in kvm_init()Wei Yongjun1-0/+1
Add the missing misc_deregister() before return from kvm_init() in the debugfs init error handling case. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-05-10Merge tag 'kvm-3.10-2' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-5/+13
Pull kvm fixes from Gleb Natapov: "Most of the fixes are in the emulator since now we emulate more than we did before for correctness sake we see more bugs there, but there is also an OOPS fixed and corruption of xcr0 register." * tag 'kvm-3.10-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: emulator: emulate SALC KVM: emulator: emulate XLAT KVM: emulator: emulate AAM KVM: VMX: fix halt emulation while emulating invalid guest sate KVM: Fix kvm_irqfd_init initialization KVM: x86: fix maintenance of guest/host xcr0 state
2013-05-10Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linusLinus Torvalds1-1/+1
Pull MIPS updates from Ralf Baechle: - More work on DT support for various platforms - Various fixes that were to late to make it straight into 3.9 - Improved platform support, in particular the Netlogic XLR and BCM63xx, and the SEAD3 and Malta eval boards. - Support for several Ralink SOC families. - Complete support for the microMIPS ASE which basically reencodes the existing MIPS32/MIPS64 ISA to use non-constant size instructions. - Some fallout from LTO work which remove old cruft and will generally make the MIPS kernel easier to maintain and resistant to compiler optimization, even in absence of LTO. - KVM support. While MIPS has announced hardware virtualization extensions this KVM extension uses trap and emulate mode for virtualization of MIPS32. More KVM work to add support for VZ hardware virtualizaiton extensions and MIPS64 will probably already be merged for 3.11. Most of this has been sitting in -next for a long time. All defconfigs have been build or run time tested except three for which fixes are being sent by other maintainers. Semantic conflict with kvm updates done as per Ralf * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (118 commits) MIPS: Add new GIC clockevent driver. MIPS: Formatting clean-ups for clocksources. MIPS: Refactor GIC clocksource code. MIPS: Move 'gic_frequency' to common location. MIPS: Move 'gic_present' to common location. MIPS: MIPS16e: Add unaligned access support. MIPS: MIPS16e: Support handling of delay slots. MIPS: MIPS16e: Add instruction formats. MIPS: microMIPS: Optimise 'strnlen' core library function. MIPS: microMIPS: Optimise 'strlen' core library function. MIPS: microMIPS: Optimise 'strncpy' core library function. MIPS: microMIPS: Optimise 'memset' core library function. MIPS: microMIPS: Add configuration option for microMIPS kernel. MIPS: microMIPS: Disable LL/SC and fix linker bug. MIPS: microMIPS: Add vdso support. MIPS: microMIPS: Add unaligned access support. MIPS: microMIPS: Support handling of delay slots. MIPS: microMIPS: Add support for exception handling. MIPS: microMIPS: Floating point support. MIPS: microMIPS: Fix macro naming in micro-assembler. ...
2013-05-09Merge branch 'next/kvm' into mips-for-linux-nextRalf Baechle1-1/+1
2013-05-09KVM/MIPS32: Do not call vcpu_load when injecting interrupts.Sanjay Lal1-1/+1
Signed-off-by: Sanjay Lal <sanjayl@kymasys.com> Cc: kvm@vger.kernel.org Cc: linux-mips@linux-mips.org Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2013-05-08KVM: Fix kvm_irqfd_init initializationAsias He1-5/+13
In commit a0f155e96 'KVM: Initialize irqfd from kvm_init()', when kvm_init() is called the second time (e.g kvm-amd.ko and kvm-intel.ko), kvm_arch_init() will fail with -EEXIST, then kvm_irqfd_exit() will be called on the error handling path. This way, the kvm_irqfd system will not be ready. This patch fix the following: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff81c0721e>] _raw_spin_lock+0xe/0x30 PGD 0 Oops: 0002 [#1] SMP Modules linked in: vhost_net CPU 6 Pid: 4257, comm: qemu-system-x86 Not tainted 3.9.0-rc3+ #757 Dell Inc. OptiPlex 790/0V5HMK RIP: 0010:[<ffffffff81c0721e>] [<ffffffff81c0721e>] _raw_spin_lock+0xe/0x30 RSP: 0018:ffff880221721cc8 EFLAGS: 00010046 RAX: 0000000000000100 RBX: ffff88022dcc003f RCX: ffff880221734950 RDX: ffff8802208f6ca8 RSI: 000000007fffffff RDI: 0000000000000000 RBP: ffff880221721cc8 R08: 0000000000000002 R09: 0000000000000002 R10: 00007f7fd01087e0 R11: 0000000000000246 R12: ffff8802208f6ca8 R13: 0000000000000080 R14: ffff880223e2a900 R15: 0000000000000000 FS: 00007f7fd38488e0(0000) GS:ffff88022dcc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000022309f000 CR4: 00000000000427e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process qemu-system-x86 (pid: 4257, threadinfo ffff880221720000, task ffff880222bd5640) Stack: ffff880221721d08 ffffffff810ac5c5 ffff88022431dc00 0000000000000086 0000000000000080 ffff880223e2a900 ffff8802208f6ca8 0000000000000000 ffff880221721d48 ffffffff810ac8fe 0000000000000000 ffff880221734000 Call Trace: [<ffffffff810ac5c5>] __queue_work+0x45/0x2d0 [<ffffffff810ac8fe>] queue_work_on+0x8e/0xa0 [<ffffffff810ac949>] queue_work+0x19/0x20 [<ffffffff81009b6b>] irqfd_deactivate+0x4b/0x60 [<ffffffff8100a69d>] kvm_irqfd+0x39d/0x580 [<ffffffff81007a27>] kvm_vm_ioctl+0x207/0x5b0 [<ffffffff810c9545>] ? update_curr+0xf5/0x180 [<ffffffff811b66e8>] do_vfs_ioctl+0x98/0x550 [<ffffffff810c1f5e>] ? finish_task_switch+0x4e/0xe0 [<ffffffff81c054aa>] ? __schedule+0x2ea/0x710 [<ffffffff811b6bf7>] sys_ioctl+0x57/0x90 [<ffffffff8140ae9e>] ? trace_hardirqs_on_thunk+0x3a/0x3c [<ffffffff81c0f602>] system_call_fastpath+0x16/0x1b Code: c1 ea 08 38 c2 74 0f 66 0f 1f 44 00 00 f3 90 0f b6 03 38 c2 75 f7 48 83 c4 08 5b c9 c3 55 48 89 e5 66 66 66 66 90 b8 00 01 00 00 <f0> 66 0f c1 07 89 c2 66 c1 ea 08 38 c2 74 0c 0f 1f 00 f3 90 0f RIP [<ffffffff81c0721e>] _raw_spin_lock+0xe/0x30 RSP <ffff880221721cc8> CR2: 0000000000000000 ---[ end trace 13fb1e4b6e5ab21f ]--- Signed-off-by: Asias He <asias@redhat.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-05-05Merge tag 'kvm-3.10-1' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds8-340/+659
Pull kvm updates from Gleb Natapov: "Highlights of the updates are: general: - new emulated device API - legacy device assignment is now optional - irqfd interface is more generic and can be shared between arches x86: - VMCS shadow support and other nested VMX improvements - APIC virtualization and Posted Interrupt hardware support - Optimize mmio spte zapping ppc: - BookE: in-kernel MPIC emulation with irqfd support - Book3S: in-kernel XICS emulation (incomplete) - Book3S: HV: migration fixes - BookE: more debug support preparation - BookE: e6500 support ARM: - reworking of Hyp idmaps s390: - ioeventfd for virtio-ccw And many other bug fixes, cleanups and improvements" * tag 'kvm-3.10-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (204 commits) kvm: Add compat_ioctl for device control API KVM: x86: Account for failing enable_irq_window for NMI window request KVM: PPC: Book3S: Add API for in-kernel XICS emulation kvm/ppc/mpic: fix missing unlock in set_base_addr() kvm/ppc: Hold srcu lock when calling kvm_io_bus_read/write kvm/ppc/mpic: remove users kvm/ppc/mpic: fix mmio region lists when multiple guests used kvm/ppc/mpic: remove default routes from documentation kvm: KVM_CAP_IOMMU only available with device assignment ARM: KVM: iterate over all CPUs for CPU compatibility check KVM: ARM: Fix spelling in error message ARM: KVM: define KVM_ARM_MAX_VCPUS unconditionally KVM: ARM: Fix API documentation for ONE_REG encoding ARM: KVM: promote vfp_host pointer to generic host cpu context ARM: KVM: add architecture specific hook for capabilities ARM: KVM: perform HYP initilization for hotplugged CPUs ARM: KVM: switch to a dual-step HYP init code ARM: KVM: rework HYP page table freeing ARM: KVM: enforce maximum size for identity mapped code ARM: KVM: move to a KVM provided HYP idmap ...
2013-05-05kvm: Add compat_ioctl for device control APIScott Wood1-0/+3
This API shouldn't have 32/64-bit issues, but VFS assumes it does unless told otherwise. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-05-02KVM: PPC: Book3S: Add API for in-kernel XICS emulationPaul Mackerras1-0/+5
This adds the API for userspace to instantiate an XICS device in a VM and connect VCPUs to it. The API consists of a new device type for the KVM_CREATE_DEVICE ioctl, a new capability KVM_CAP_IRQ_XICS, which functions similarly to KVM_CAP_IRQ_MPIC, and the KVM_IRQ_LINE ioctl, which is used to assert and deassert interrupt inputs of the XICS. The XICS device has one attribute group, KVM_DEV_XICS_GRP_SOURCES. Each attribute within this group corresponds to the state of one interrupt source. The attribute number is the same as the interrupt source number. This does not support irq routing or irqfd yet. Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26kvm: destroy emulated devices on VM exitScott Wood1-13/+16
The hassle of getting refcounting right was greater than the hassle of keeping a list of devices to destroy on VM exit. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26kvm/ppc/mpic: in-kernel MPIC emulationScott Wood1-0/+6
Hook the MPIC code up to the KVM interfaces, add locking, etc. Signed-off-by: Scott Wood <scottwood@freescale.com> [agraf: add stub function for kvmppc_mpic_set_epr, non-booke, 64bit] Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26kvm: add device control APIScott Wood1-0/+129
Currently, devices that are emulated inside KVM are configured in a hardcoded manner based on an assumption that any given architecture only has one way to do it. If there's any need to access device state, it is done through inflexible one-purpose-only IOCTLs (e.g. KVM_GET/SET_LAPIC). Defining new IOCTLs for every little thing is cumbersome and depletes a limited numberspace. This API provides a mechanism to instantiate a device of a certain type, returning an ID that can be used to set/get attributes of the device. Attributes may include configuration parameters (e.g. register base address), device state, operational commands, etc. It is similar to the ONE_REG API, except that it acts on devices rather than vcpus. Both device types and individual attributes can be tested without having to create the device or get/set the attribute, without the need for separately managing enumerated capabilities. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26KVM: Move irqfd resample cap handling to generic codeAlexander Graf1-0/+3
Now that we have most irqfd code completely platform agnostic, let's move irqfd's resample capability return to generic code as well. Signed-off-by: Alexander Graf <agraf@suse.de> Acked-by: Michael S. Tsirkin <mst@redhat.com>
2013-04-26KVM: Move irq routing setup to irqchip.cAlexander Graf2-73/+88
Setting up IRQ routes is nothing IOAPIC specific. Extract everything that really is generic code into irqchip.c and only leave the ioapic specific bits to irq_comm.c. Signed-off-by: Alexander Graf <agraf@suse.de> Acked-by: Michael S. Tsirkin <mst@redhat.com>
2013-04-26KVM: Extract generic irqchip logic into irqchip.cAlexander Graf2-118/+152
The current irq_comm.c file contains pieces of code that are generic across different irqchip implementations, as well as code that is fully IOAPIC specific. Split the generic bits out into irqchip.c. Signed-off-by: Alexander Graf <agraf@suse.de> Acked-by: Michael S. Tsirkin <mst@redhat.com>
2013-04-26KVM: Move irq routing to generic codeAlexander Graf2-30/+30
The IRQ routing set ioctl lives in the hacky device assignment code inside of KVM today. This is definitely the wrong place for it. Move it to the much more natural kvm_main.c. Signed-off-by: Alexander Graf <agraf@suse.de> Acked-by: Michael S. Tsirkin <mst@redhat.com>
2013-04-26KVM: Introduce CONFIG_HAVE_KVM_IRQ_ROUTINGAlexander Graf3-4/+7
Quite a bit of code in KVM has been conditionalized on availability of IOAPIC emulation. However, most of it is generically applicable to platforms that don't have an IOPIC, but a different type of irq chip. Make code that only relies on IRQ routing, not an APIC itself, on CONFIG_HAVE_KVM_IRQ_ROUTING, so that we can reuse it later. Signed-off-by: Alexander Graf <agraf@suse.de> Acked-by: Michael S. Tsirkin <mst@redhat.com>
2013-04-26KVM: Add KVM_IRQCHIP_NUM_PINS in addition to KVM_IOAPIC_NUM_PINSAlexander Graf1-1/+1
The concept of routing interrupt lines to an irqchip is nothing that is IOAPIC specific. Every irqchip has a maximum number of pins that can be linked to irq lines. So let's add a new define that allows us to reuse generic code for non-IOAPIC platforms. Signed-off-by: Alexander Graf <agraf@suse.de> Acked-by: Michael S. Tsirkin <mst@redhat.com>
2013-04-16KVM: VMX: Add the deliver posted interrupt algorithmYang Zhang1-0/+1
Only deliver the posted interrupt when target vcpu is running and there is no previous interrupt pending in pir. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-04-16KVM: Set TMR when programming ioapic entryYang Zhang2-4/+11
We already know the trigger mode of a given interrupt when programming the ioapice entry. So it's not necessary to set it in each interrupt delivery. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-04-16KVM: Call common update function when ioapic entry changed.Yang Zhang4-17/+19
Both TMR and EOI exit bitmap need to be updated when ioapic changed or vcpu's id/ldr/dfr changed. So use common function instead eoi exit bitmap specific function. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-04-15KVM: Use eoi to track RTC interrupt delivery statusYang Zhang1-1/+35
Current interrupt coalescing logci which only used by RTC has conflict with Posted Interrupt. This patch introduces a new mechinism to use eoi to track interrupt: When delivering an interrupt to vcpu, the pending_eoi set to number of vcpu that received the interrupt. And decrease it when each vcpu writing eoi. No subsequent RTC interrupt can deliver to vcpu until all vcpus write eoi. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-04-15KVM: Let ioapic know the irq line statusYang Zhang6-29/+41
Userspace may deliver RTC interrupt without query the status. So we want to track RTC EOI for this case. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>