summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2008-04-27s390: KVM preparation: provide hook to enable pgstes in user pagetableCarsten Otte8-5/+82
The SIE instruction on s390 uses the 2nd half of the page table page to virtualize the storage keys of a guest. This patch offers the s390_enable_sie function, which reorganizes the page tables of a single-threaded process to reserve space in the page table: s390_enable_sie makes sure that the process is single threaded and then uses dup_mm to create a new mm with reorganized page tables. The old mm is freed and the process has now a page status extended field after every page table. Code that wants to exploit pgstes should SELECT CONFIG_PGSTE. This patch has a small common code hit, namely making dup_mm non-static. Edit (Carsten): I've modified Martin's patch, following Jeremy Fitzhardinge's review feedback. Now we do have the prototype for dup_mm in include/linux/sched.h. Following Martin's suggestion, s390_enable_sie() does now call task_lock() to prevent race against ptrace modification of mm_users. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Carsten Otte <cotte@de.ibm.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: x86: hardware task switching supportIzik Eidus6-3/+507
This emulates the x86 hardware task switch mechanism in software, as it is unsupported by either vmx or svm. It allows operating systems which use it, like freedos, to run as kvm guests. Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: x86: add functions to get the cpl of vcpuIzik Eidus3-0/+24
Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: VMX: Add module option to disable flexpriorityAvi Kivity1-2/+6
Useful for debugging. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: no longer EXPERIMENTALAvi Kivity1-1/+1
Long overdue. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: MMU: Introduce and use spte_to_page()Avi Kivity1-5/+12
Encapsulate the pte mask'n'shift in a function. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: MMU: fix dirty bit setting when removing write permissionsIzik Eidus1-0/+8
When mmu_set_spte() checks if a page related to spte should be release as dirty or clean, it check if the shadow pte was writeble, but in case rmap_write_protect() is called called it is possible for shadow ptes that were writeble to become readonly and therefor mmu_set_spte will release the pages as clean. This patch fix this issue by marking the page as dirty inside rmap_write_protect(). Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: Move some x86 specific constants and structures to include/asm-x86Avi Kivity2-13/+13
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: MMU: Set the accessed bit on non-speculative shadow ptesAvi Kivity2-5/+7
If we populate a shadow pte due to a fault (and not speculatively due to a pte write) then we can set the accessed bit on it, as we know it will be set immediately on the next guest instruction. This saves a read-modify-write operation. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: kvm.h: __user requires compiler.hChristian Borntraeger1-0/+1
include/linux/kvm.h defines struct kvm_dirty_log to [...] union { void __user *dirty_bitmap; /* one bit per page */ __u64 padding; }; __user requires compiler.h to compile. Currently, this works on x86 only coincidentally due to other include files. This patch makes kvm.h compile in all cases. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27x86: KVM guest: disable clock before rebooting.Glauber Costa1-0/+27
This patch writes 0 (actually, what really matters is that the LSB is cleared) to the system time msr before shutting down the machine for kexec. Without it, we can have a random memory location being written when the guest comes back It overrides the functions shutdown, used in the path of kernel_kexec() (sys.c) and crash_shutdown, used in the path of crash_kexec() (kexec.c) Signed-off-by: Glauber Costa <gcosta@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27x86: make native_machine_shutdown non-staticGlauber Costa2-1/+2
it will allow external users to call it. It is mainly useful for routines that will override its machine_ops field for its own special purposes, but want to call the normal shutdown routine after they're done Signed-off-by: Glauber Costa <gcosta@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27x86: allow machine_crash_shutdown to be replacedGlauber Costa3-2/+13
This patch a llows machine_crash_shutdown to be replaced, just like any of the other functions in machine_ops Signed-off-by: Glauber Costa <gcosta@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27x86: KVM guest: hypercall batchingMarcelo Tosatti1-2/+60
Batch pte updates and tlb flushes in lazy MMU mode. [avi: - adjust to mmu_op - helper for getting para_state without debug warnings] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27x86: KVM guest: hypercall based pte updates and TLB flushesMarcelo Tosatti1-0/+138
Hypercall based pte updates are faster than faults, and also allow use of the lazy MMU mode to batch operations. Don't report the feature if two dimensional paging is enabled. [avi: - guest/host split - fix 32-bit truncation issues - adjust to mmu_op - adjust to ->release_*() renamed - add ->release_pud()] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: MMU: hypercall based pte updates and TLB flushesMarcelo Tosatti6-3/+190
Hypercall based pte updates are faster than faults, and also allow use of the lazy MMU mode to batch operations. Don't report the feature if two dimensional paging is enabled. [avi: - one mmu_op hypercall instead of one per op - allow 64-bit gpa on hypercall - don't pass host errors (-ENOMEM) to guest] [akpm: warning fix on i386] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: Provide unlocked version of emulator_write_phys()Avi Kivity2-7/+17
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27x86: KVM guest: add basic paravirt supportMarcelo Tosatti6-0/+70
Add basic KVM paravirt support. Avoid vm-exits on IO delays. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: add basic paravirt supportMarcelo Tosatti3-1/+4
Add basic KVM paravirt support. Avoid vm-exits on IO delays. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: Add reset support for in kernel PITSheng Yang2-11/+20
Separate the reset part and prepare for reset support. Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: Add save/restore supporting of in kernel PITSheng Yang5-0/+79
Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: In kernel PIT modelSheng Yang7-1/+664
The patch moves the PIT model from userspace to kernel, and increases the timer accuracy greatly. [marcelo: make last_injected_time per-guest] Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Tested-and-Acked-by: Alex Davis <alex14641@yahoo.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: Remove pointless desc_ptr #ifdefAvi Kivity1-4/+0
The desc_struct changes left an unnecessary #ifdef; remove it. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: VMX: Don't adjust tsc offset forwardAvi Kivity1-3/+6
Most Intel hosts have a stable tsc, and playing with the offset only reduces accuracy. By limiting tsc offset adjustment only to forward updates, we effectively disable tsc offset adjustment on these hosts. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: replace remaining __FUNCTION__ occurancesHarvey Harrison6-45/+44
__FUNCTION__ is gcc-specific, use __func__ Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: detect if VCPU triple faultsJoerg Roedel2-5/+16
In the current inject_page_fault path KVM only checks if there is another PF pending and injects a DF then. But it has to check for a pending DF too to detect a shutdown condition in the VCPU. If this is not detected the VCPU goes to a PF -> DF -> PF loop when it should triple fault. This patch detects this condition and handles it with an KVM_SHUTDOWN exit to userspace. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: Use kzalloc to avoid allocating kvm_regs from kernel stackXiantao Zhang1-11/+22
Since the size of kvm_regs is too big to allocate from kernel stack on ia64, use kzalloc to allocate it. Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: Prefix control register accessors with kvm_ to avoid namespace pollutionAvi Kivity3-36/+36
Names like 'set_cr3()' look dangerously close to affecting the host. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: MMU: large page supportMarcelo Tosatti6-32/+262
Create large pages mappings if the guest PTE's are marked as such and the underlying memory is hugetlbfs backed. If the largepage contains write-protected pages, a large pte is not used. Gives a consistent 2% improvement for data copies on ram mounted filesystem, without NPT/EPT. Anthony measures a 4% improvement on 4-way kernbench, with NPT. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: MMU: ignore zapped root pagetablesMarcelo Tosatti5-2/+48
Mark zapped root pagetables as invalid and ignore such pages during lookup. This is a problem with the cr3-target feature, where a zapped root table fools the faulting code into creating a read-only mapping. The result is a lockup if the instruction can't be emulated. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Cc: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: Implement dummy values for MSR_PERF_STATUSAlexander Graf1-1/+7
Darwin relies on this and ceases to work without. Signed-off-by: Alexander Graf <alex@csgraf.de> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: sparse fixes for kvm/x86.cHarvey Harrison1-13/+13
In two case statements, use the ever popular 'i' instead of index: arch/x86/kvm/x86.c:1063:7: warning: symbol 'index' shadows an earlier one arch/x86/kvm/x86.c:1000:9: originally declared here arch/x86/kvm/x86.c:1079:7: warning: symbol 'index' shadows an earlier one arch/x86/kvm/x86.c:1000:9: originally declared here Make it static. arch/x86/kvm/x86.c:1945:24: warning: symbol 'emulate_ops' was not declared. Should it be static? Drop the return statements. arch/x86/kvm/x86.c:2878:2: warning: returning void-valued expression arch/x86/kvm/x86.c:2944:2: warning: returning void-valued expression Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: SVM: make iopm_base staticHarvey Harrison1-1/+1
Fixes sparse warning as well. arch/x86/kvm/svm.c:69:15: warning: symbol 'iopm_base' was not declared. Should it be static? Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: x86 emulator: fix sparse warnings in x86_emulate.cHarvey Harrison1-2/+2
Nesting __emulate_2op_nobyte inside__emulate_2op produces many shadowed variable warnings on the internal variable _tmp used by both macros. Change the outer macro to use __tmp. Avoids a sparse warning like the following at every call site of __emulate_2op arch/x86/kvm/x86_emulate.c:1091:3: warning: symbol '_tmp' shadows an earlier one arch/x86/kvm/x86_emulate.c:1091:3: originally declared here [18 more warnings suppressed] Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: Add stat counter for hypercallsAmit Shah2-0/+3
Signed-off-by: Amit Shah <amit.shah@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: Use x86's segment descriptor struct instead of private definitionAvi Kivity3-39/+8
The x86 desc_struct unification allows us to remove segment_descriptor.h. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: Increase the number of user memory slots per vmAvi Kivity1-1/+1
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: Add API for determining the number of supported memory slotsAvi Kivity2-0/+4
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: Increase vcpu count to 16Avi Kivity1-1/+1
With NPT support, scalability is much improved. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: Add API to retrieve the number of supported vcpus per vmAvi Kivity2-0/+4
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: x86 emulator: make register_address_increment and JMP_REL static inlinesHarvey Harrison1-30/+26
Change jmp_rel() to a function as well. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: x86 emulator: make register_address, address_mask static inlinesHarvey Harrison1-19/+29
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: x86 emulator: add ad_mask static inlineHarvey Harrison1-3/+8
Replaces open-coded mask calculation in macros. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27x86: KVM guest: paravirtualized clocksourceGlauber de Oliveira Costa5-0/+182
This is the guest part of kvm clock implementation It does not do tsc-only timing, as tsc can have deltas between cpus, and it did not seem worthy to me to keep adjusting them. We do use it, however, for fine-grained adjustment. Other than that, time comes from the host. [randy dunlap: add missing include] [randy dunlap: disallow on Voyager or Visual WS] Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: paravirtualized clocksource: host partGlauber de Oliveira Costa4-1/+145
This is the host part of kvm clocksource implementation. As it does not include clockevents, it is a fairly simple implementation. We only have to register a per-vcpu area, and start writing to it periodically. The area is binary compatible with xen, as we use the same shadow_info structure. [marcelo: fix bad_page on MSR_KVM_SYSTEM_TIME] [avi: save full value of the msr, even if enable bit is clear] [avi: clear previous value of time_page] Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: SVM: enable LBR virtualizationJoerg Roedel1-2/+37
This patch implements the Last Branch Record Virtualization (LBRV) feature of the AMD Barcelona and Phenom processors into the kvm-amd module. It will only be enabled if the guest enables last branch recording in the DEBUG_CTL MSR. So there is no increased world switch overhead when the guest doesn't use these MSRs. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Markus Rechberger <markus.rechberger@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: SVM: allocate the MSR permission map per VCPUJoerg Roedel2-35/+34
This patch changes the kvm-amd module to allocate the SVM MSR permission map per VCPU instead of a global map for all VCPUs. With this we have more flexibility allowing specific guests to access virtualized MSRs. This is required for LBR virtualization. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Markus Rechberger <markus.rechberger@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: SVM: let init_vmcb() take struct vcpu_svm as parameterJoerg Roedel1-6/+6
Change the parameter of the init_vmcb() function in the kvm-amd module from struct vmcb to struct vcpu_svm. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Markus Rechberger <markus.rechberger@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: VMX: fix typo in VMX header defineRyan Harper2-5/+5
Looking at Intel Volume 3b, page 148, table 20-11 and noticed that the field name is 'Deliver' not 'Deliever'. Attached patch changes the define name and its user in vmx.c Signed-off-by: Ryan Harper <ryanh@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27KVM: SVM: add support for Nested PagingJoerg Roedel1-5/+67
This patch contains the SVM architecture dependent changes for KVM to enable support for the Nested Paging feature of AMD Barcelona and Phenom processors. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>