summaryrefslogtreecommitdiffstats
path: root/arch/x86/xen
AgeCommit message (Collapse)AuthorFilesLines
2014-11-10x86/core, x86/xen/smp: Use 'die_complete' completion when taking CPU downBoris Ostrovsky1-0/+3
Commit 2ed53c0d6cc9 ("x86/smpboot: Speed up suspend/resume by avoiding 100ms sleep for CPU offline during S3") introduced completions to CPU offlining process. These completions are not initialized on Xen kernels causing a panic in play_dead_common(). Move handling of die_complete into common routines to make them available to Xen guests. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Cc: tianyu.lan@intel.com Cc: konrad.wilk@oracle.com Cc: xen-devel@lists.xenproject.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1414770572-7950-1-git-send-email-boris.ostrovsky@oracle.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-23x86/xen: panic on bad Xen-provided memory mapMartin Kelly1-0/+1
Panic if Xen provides a memory map with 0 entries. Although this is unlikely, it is better to catch the error at the point of seeing the map than later on as a symptom of some other crash. Signed-off-by: Martin Kelly <martkell@amazon.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2014-10-23x86/xen: Fix incorrect per_cpu accessor in xen_clocksource_read()Boris Ostrovsky1-1/+1
Commit 89cbc76768c2 ("x86: Replace __get_cpu_var uses") replaced __get_cpu_var() with this_cpu_ptr() in xen_clocksource_read() in such a way that instead of accessing a structure pointed to by a per-cpu pointer we are trying to get to a per-cpu structure. __this_cpu_read() of the pointer is the more appropriate accessor. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2014-10-23x86/xen: avoid race in p2m handlingJuergen Gross1-5/+5
When a new p2m leaf is allocated this leaf is linked into the p2m tree via cmpxchg. Unfortunately the compare value for checking the success of the update is read after checking for the need of a new leaf. It is possible that a new leaf has been linked into the tree concurrently in between. This could lead to a leaked memory page and to the loss of some p2m entries. Avoid the race by using the read compare value for checking the need of a new p2m leaf and use ACCESS_ONCE() to get it. There are other places which seem to need ACCESS_ONCE() to ensure proper operation. Change them accordingly. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2014-10-23x86/xen: delay construction of mfn_list_listJuergen Gross3-55/+18
The 3 level p2m tree for the Xen tools is constructed very early at boot by calling xen_build_mfn_list_list(). Memory needed for this tree is allocated via extend_brk(). As this tree (other than the kernel internal p2m tree) is only needed for domain save/restore, live migration and crash dump analysis it doesn't matter whether it is constructed very early or just some milliseconds later when memory allocation is possible by other means. This patch moves the call of xen_build_mfn_list_list() just after calling xen_pagetable_p2m_copy() simplifying this function, too, as it doesn't have to bother with two parallel trees now. The same applies for some other internal functions. While simplifying code, make early_can_reuse_p2m_middle() static and drop the unused second parameter. p2m_mid_identity_mfn can be removed as well, it isn't used either. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2014-10-23x86/xen: avoid writing to freed memory after race in p2m handlingJuergen Gross1-2/+6
In case a race was detected during allocation of a new p2m tree element in alloc_p2m() the new allocated mid_mfn page is freed without updating the pointer to the found value in the tree. This will result in overwriting the just freed page with the mfn of the p2m leaf. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2014-10-15Merge branch 'for-3.18-consistent-ops' of ↵Linus Torvalds4-12/+12
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu Pull percpu consistent-ops changes from Tejun Heo: "Way back, before the current percpu allocator was implemented, static and dynamic percpu memory areas were allocated and handled separately and had their own accessors. The distinction has been gone for many years now; however, the now duplicate two sets of accessors remained with the pointer based ones - this_cpu_*() - evolving various other operations over time. During the process, we also accumulated other inconsistent operations. This pull request contains Christoph's patches to clean up the duplicate accessor situation. __get_cpu_var() uses are replaced with with this_cpu_ptr() and __this_cpu_ptr() with raw_cpu_ptr(). Unfortunately, the former sometimes is tricky thanks to C being a bit messy with the distinction between lvalues and pointers, which led to a rather ugly solution for cpumask_var_t involving the introduction of this_cpu_cpumask_var_ptr(). This converts most of the uses but not all. Christoph will follow up with the remaining conversions in this merge window and hopefully remove the obsolete accessors" * 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (38 commits) irqchip: Properly fetch the per cpu offset percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t -fix ia64: sn_nodepda cannot be assigned to after this_cpu conversion. Use __this_cpu_write. percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t Revert "powerpc: Replace __get_cpu_var uses" percpu: Remove __this_cpu_ptr clocksource: Replace __this_cpu_ptr with raw_cpu_ptr sparc: Replace __get_cpu_var uses avr32: Replace __get_cpu_var with __this_cpu_write blackfin: Replace __get_cpu_var uses tile: Use this_cpu_ptr() for hardware counters tile: Replace __get_cpu_var uses powerpc: Replace __get_cpu_var uses alpha: Replace __get_cpu_var ia64: Replace __get_cpu_var uses s390: cio driver &__get_cpu_var replacements s390: Replace __get_cpu_var uses mips: Replace __get_cpu_var uses MIPS: Replace __get_cpu_var uses in FPU emulator. arm: Replace __this_cpu_ptr with raw_cpu_ptr ...
2014-10-13Merge branch 'x86-boot-for-linus' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 bootup updates from Ingo Molnar: "The changes in this cycle were: - Fix rare SMP-boot hang (mostly in virtual environments) - Fix build warning with certain (rare) toolchains" * 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/relocs: Make per_cpu_load_addr static x86/smpboot: Initialize secondary CPU only if master CPU will wait for it
2014-10-06x86/xen: Set EFER.NX and EFER.SCE in PVH guestsMukesh Rathor4-11/+67
This fixes two bugs in PVH guests: - Not setting EFER.NX means the NX bit in page table entries is ignored on Intel processors and causes reserved bit page faults on AMD processors. - After the Xen commit 7645640d6ff1 ("x86/PVH: don't set EFER_SCE for pvh guest") PVH guests are required to set EFER.SCE to enable the SYSCALL instruction. Secondary VCPUs are started with pagetables with the NX bit set so EFER.NX must be set before using any stack or data segment. xen_pvh_cpu_early_init() is the new secondary VCPU entry point that sets EFER before jumping to cpu_bringup_and_idle(). Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2014-10-03xen: eliminate scalability issues from initrd handlingJuergen Gross2-2/+10
Size restrictions native kernels wouldn't have resulted from the initrd getting mapped into the initial mapping. The kernel doesn't really need the initrd to be mapped, so use infrastructure available in Xen to avoid the mapping and hence the restriction. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2014-09-23x86: remove the Xen-specific _PAGE_IOMAP PTE flagDavid Vrabel1-2/+0
The _PAGE_IO_MAP PTE flag was only used by Xen PV guests to mark PTEs that were used to map I/O regions that are 1:1 in the p2m. This allowed Xen to obtain the correct PFN when converting the MFNs read from a PTE back to their PFN. Xen guests no longer use _PAGE_IOMAP for this. Instead mfn_to_pfn() returns the correct PFN by using a combination of the m2p and p2m to determine if an MFN corresponds to a 1:1 mapping in the the p2m. Remove _PAGE_IOMAP, replacing it with _PAGE_UNUSED2 to allow for future uses of the PTE flag. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Acked-by: "H. Peter Anvin" <hpa@zytor.com>
2014-09-23x86/xen: do not use _PAGE_IOMAP PTE flag for I/O mappingsDavid Vrabel1-44/+4
Since mfn_to_pfn() returns the correct PFN for identity mappings (as used for MMIO regions), the use of _PAGE_IOMAP is not required in pte_mfn_to_pfn(). Do not set the _PAGE_IOMAP flag in pte_pfn_to_mfn() and do not use it in pte_mfn_to_pfn(). This will allow _PAGE_IOMAP to be removed, making it available for future use. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-09-23xen/efi: Directly include needed headersDaniel Kiper1-0/+2
I discovered that some needed stuff is defined/declared in headers which are not included directly. Currently it works but if somebody remove required headers from currently included headers then build will break. So, just in case directly include all needed headers. Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2014-09-23xen/setup: Remap Xen Identity Mapped RAMMatt Rushton3-94/+314
Instead of ballooning up and down dom0 memory this remaps the existing mfns that were replaced by the identity map. The reason for this is that the existing implementation ballooned memory up and and down which caused dom0 to have discontiguous pages. In some cases this resulted in the use of bounce buffers which reduced network I/O performance significantly. This change will honor the existing order of the pages with the exception of some boundary conditions. To do this we need to update both the Linux p2m table and the Xen m2p table. Particular care must be taken when updating the p2m table since it's important to limit table memory consumption and reuse the existing leaf pages which get freed when an entire leaf page is set to the identity map. To implement this, mapping updates are grouped into blocks with table entries getting cached temporarily and then released. On my test system before: Total pages: 2105014 Total contiguous: 1640635 After: Total pages: 2105014 Total contiguous: 2098904 Signed-off-by: Matthew Rushton <mrushton@amazon.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2014-09-16x86/smpboot: Initialize secondary CPU only if master CPU will wait for itIgor Mammedov1-0/+2
Hang is observed on virtual machines during CPU hotplug, especially in big guests with many CPUs. (It reproducible more often if host is over-committed). It happens because master CPU gives up waiting on secondary CPU and allows it to run wild. As result AP causes locking or crashing system. For example as described here: https://lkml.org/lkml/2014/3/6/257 If master CPU have sent STARTUP IPI successfully, and AP signalled to master CPU that it's ready to start initialization, make master CPU wait indefinitely till AP is onlined. To ensure that AP won't ever run wild, make it wait at early startup till master CPU confirms its intention to wait for AP. If AP doesn't respond in 10 seconds, the master CPU will timeout and cancel AP onlining. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Acked-by: Toshi Kani <toshi.kani@hp.com> Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1403266991-12233-1-git-send-email-imammedo@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-10x86/xen: don't copy bogus duplicate entries into kernel page tablesStefan Bader1-15/+12
When RANDOMIZE_BASE (KASLR) is enabled; or the sum of all loaded modules exceeds 512 MiB, then loading modules fails with a warning (and hence a vmalloc allocation failure) because the PTEs for the newly-allocated vmalloc address space are not zero. WARNING: CPU: 0 PID: 494 at linux/mm/vmalloc.c:128 vmap_page_range_noflush+0x2a1/0x360() This is caused by xen_setup_kernel_pagetables() copying level2_kernel_pgt into level2_fixmap_pgt, overwriting many non-present entries. Without KASLR, the normal kernel image size only covers the first half of level2_kernel_pgt and module space starts after that. L4[511]->level3_kernel_pgt[510]->level2_kernel_pgt[ 0..255]->kernel [256..511]->module [511]->level2_fixmap_pgt[ 0..505]->module This allows 512 MiB of of module vmalloc space to be used before having to use the corrupted level2_fixmap_pgt entries. With KASLR enabled, the kernel image uses the full PUD range of 1G and module space starts in the level2_fixmap_pgt. So basically: L4[511]->level3_kernel_pgt[510]->level2_kernel_pgt[0..511]->kernel [511]->level2_fixmap_pgt[0..505]->module And now no module vmalloc space can be used without using the corrupt level2_fixmap_pgt entries. Fix this by properly converting the level2_fixmap_pgt entries to MFNs, and setting level1_fixmap_pgt as read-only. A number of comments were also using the the wrong L3 offset for level2_kernel_pgt. These have been corrected. Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: stable@vger.kernel.org
2014-08-26x86: Replace __get_cpu_var usesChristoph Lameter4-12/+12
__get_cpu_var() is used for multiple purposes in the kernel source. One of them is address calculation via the form &__get_cpu_var(x). This calculates the address for the instance of the percpu variable of the current processor based on an offset. Other use cases are for storing and retrieving data from the current processors percpu area. __get_cpu_var() can be used as an lvalue when writing data or on the right side of an assignment. __get_cpu_var() is defined as : #define __get_cpu_var(var) (*this_cpu_ptr(&(var))) __get_cpu_var() always only does an address determination. However, store and retrieve operations could use a segment prefix (or global register on other platforms) to avoid the address calculation. this_cpu_write() and this_cpu_read() can directly take an offset into a percpu area and use optimized assembly code to read and write per cpu variables. This patch converts __get_cpu_var into either an explicit address calculation using this_cpu_ptr() or into a use of this_cpu operations that use the offset. Thereby address calculations are avoided and less registers are used when code is generated. Transformations done to __get_cpu_var() 1. Determine the address of the percpu instance of the current processor. DEFINE_PER_CPU(int, y); int *x = &__get_cpu_var(y); Converts to int *x = this_cpu_ptr(&y); 2. Same as #1 but this time an array structure is involved. DEFINE_PER_CPU(int, y[20]); int *x = __get_cpu_var(y); Converts to int *x = this_cpu_ptr(y); 3. Retrieve the content of the current processors instance of a per cpu variable. DEFINE_PER_CPU(int, y); int x = __get_cpu_var(y) Converts to int x = __this_cpu_read(y); 4. Retrieve the content of a percpu struct DEFINE_PER_CPU(struct mystruct, y); struct mystruct x = __get_cpu_var(y); Converts to memcpy(&x, this_cpu_ptr(&y), sizeof(x)); 5. Assignment to a per cpu variable DEFINE_PER_CPU(int, y) __get_cpu_var(y) = x; Converts to __this_cpu_write(y, x); 6. Increment/Decrement etc of a per cpu variable DEFINE_PER_CPU(int, y); __get_cpu_var(y)++ Converts to __this_cpu_inc(y) Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86@kernel.org Acked-by: H. Peter Anvin <hpa@linux.intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2014-08-11x86/xen: use vmap() to map grant table pages in PVH guestsDavid Vrabel1-5/+5
Commit b7dd0e350e0b (x86/xen: safely map and unmap grant frames when in atomic context) causes PVH guests to crash in arch_gnttab_map_shared() when they attempted to map the pages for the grant table. This use of a PV-specific function during the PVH grant table setup is non-obvious and not needed. The standard vmap() function does the right thing. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reported-by: Mukesh Rathor <mukesh.rathor@oracle.com> Tested-by: Mukesh Rathor <mukesh.rathor@oracle.com> Cc: stable@vger.kernel.org
2014-08-11x86/xen: resume timer irqs earlyDavid Vrabel1-1/+1
If the timer irqs are resumed during device resume it is possible in certain circumstances for the resume to hang early on, before device interrupts are resumed. For an Ubuntu 14.04 PVHVM guest this would occur in ~0.5% of resume attempts. It is not entirely clear what is occuring the point of the hang but I think a task necessary for the resume calls schedule_timeout(), waiting for a timer interrupt (which never arrives). This failure may require specific tasks to be running on the other VCPUs to trigger (processes are not frozen during a suspend/resume if PREEMPT is disabled). Add IRQF_EARLY_RESUME to the timer interrupts so they are resumed in syscore_resume(). Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: stable@vger.kernel.org
2014-08-07Merge tag 'stable/for-linus-3.17-rc0-tag' of ↵Linus Torvalds3-58/+20
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull Xen updates from David Vrabel: - remove unused V2 grant table support - note that Konrad is xen-blkkback/front maintainer - add 'xen_nopv' option to disable PV extentions for x86 HVM guests - misc minor cleanups * tag 'stable/for-linus-3.17-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen-pciback: Document the 'quirks' sysfs file xen/pciback: Fix error return code in xen_pcibk_attach() xen/events: drop negativity check of unsigned parameter xen/setup: Remove Identity Map Debug Message xen/events/fifo: remove a unecessary use of BM() xen/events/fifo: ensure all bitops are properly aligned even on x86 xen/events/fifo: reset control block and local HEADs on resume xen/arm: use BUG_ON xen/grant-table: remove support for V2 tables x86/xen: safely map and unmap grant frames when in atomic context MAINTAINERS: Make me the Xen block subsystem (front and back) maintainer xen: Introduce 'xen_nopv' to disable PV extensions for HVM guests.
2014-08-04Merge branch 'x86-efi-for-linus' of ↵Linus Torvalds4-0/+54
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI changes from Ingo Molnar: "Main changes in this cycle are: - arm64 efi stub fixes, preservation of FP/SIMD registers across firmware calls, and conversion of the EFI stub code into a static library - Ard Biesheuvel - Xen EFI support - Daniel Kiper - Support for autoloading the efivars driver - Lee, Chun-Yi - Use the PE/COFF headers in the x86 EFI boot stub to request that the stub be loaded with CONFIG_PHYSICAL_ALIGN alignment - Michael Brown - Consolidate all the x86 EFI quirks into one file - Saurabh Tangri - Additional error logging in x86 EFI boot stub - Ulf Winkelvos - Support loading initrd above 4G in EFI boot stub - Yinghai Lu - EFI reboot patches for ACPI hardware reduced platforms" * 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits) efi/arm64: Handle missing virtual mapping for UEFI System Table arch/x86/xen: Silence compiler warnings xen: Silence compiler warnings x86/efi: Request desired alignment via the PE/COFF headers x86/efi: Add better error logging to EFI boot stub efi: Autoload efivars efi: Update stale locking comment for struct efivars arch/x86: Remove efi_set_rtc_mmss() arch/x86: Replace plain strings with constants xen: Put EFI machinery in place xen: Define EFI related stuff arch/x86: Remove redundant set_bit(EFI_MEMMAP) call arch/x86: Remove redundant set_bit(EFI_SYSTEM_TABLES) call efi: Introduce EFI_PARAVIRT flag arch/x86: Do not access EFI memory map if it is not available efi: Use early_mem*() instead of early_io*() arch/ia64: Define early_memunmap() x86/reboot: Add EFI reboot quirk for ACPI Hardware Reduced flag efi/reboot: Allow powering off machines using EFI efi/reboot: Add generic wrapper around EfiResetSystem() ...
2014-07-31xen/setup: Remove Identity Map Debug MessageMatt Rushton1-3/+2
Removing a debug message for setting the identity map since it becomes rather noisy after rework of the identity map code. Signed-off-by: Matthew Rushton <mrushton@amazon.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2014-07-30Merge tag 'stable/for-linus-3.16-rc7-tag' of ↵Linus Torvalds1-57/+91
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull Xen fix from David Vrabel: "Fix BUG when trying to expand the grant table. This seems to occur often during boot with Ubuntu 14.04 PV guests" * tag 'stable/for-linus-3.16-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/xen: safely map and unmap grant frames when in atomic context
2014-07-30x86/xen: safely map and unmap grant frames when in atomic contextDavid Vrabel1-57/+91
arch_gnttab_map_frames() and arch_gnttab_unmap_frames() are called in atomic context but were calling alloc_vm_area() which might sleep. Also, if a driver attempts to allocate a grant ref from an interrupt and the table needs expanding, then the CPU may already by in lazy MMU mode and apply_to_page_range() will BUG when it tries to re-enable lazy MMU mode. These two functions are only used in PV guests. Introduce arch_gnttab_init() to allocates the virtual address space in advance. Avoid the use of apply_to_page_range() by using saving and using the array of PTE addresses from the alloc_vm_area() call (which ensures that the required page tables are pre-allocated). Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-07-18arch/x86/xen: Silence compiler warningsDaniel Kiper4-14/+53
Compiler complains in the following way when x86 32-bit kernel with Xen support is build: CC arch/x86/xen/enlighten.o arch/x86/xen/enlighten.c: In function ‘xen_start_kernel’: arch/x86/xen/enlighten.c:1726:3: warning: right shift count >= width of type [enabled by default] Such line contains following EFI initialization code: boot_params.efi_info.efi_systab_hi = (__u32)(__pa(efi_systab_xen) >> 32); There is no issue if x86 64-bit kernel is build. However, 32-bit case generate warning (even if that code will not be executed because Xen does not work on 32-bit EFI platforms) due to __pa() returning unsigned long type which has 32-bits width. So move whole EFI initialization stuff to separate function and build it conditionally to avoid above mentioned warning on x86 32-bit architecture. Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com> Reviewed-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-07-18xen: Put EFI machinery in placeDaniel Kiper1-0/+15
This patch enables EFI usage under Xen dom0. Standard EFI Linux Kernel infrastructure cannot be used because it requires direct access to EFI data and code. However, in dom0 case it is not possible because above mentioned EFI stuff is fully owned and controlled by Xen hypervisor. In this case all calls from dom0 to EFI must be requested via special hypercall which in turn executes relevant EFI code in behalf of dom0. When dom0 kernel boots it checks for EFI availability on a machine. If it is detected then artificial EFI system table is filled. Native EFI callas are replaced by functions which mimics them by calling relevant hypercall. Later pointer to EFI system table is passed to standard EFI machinery and it continues EFI subsystem initialization taking into account that there is no direct access to EFI boot services, runtime, tables, structures, etc. After that system runs as usual. This patch is based on Jan Beulich and Tang Liang work. Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Tang Liang <liang.tang@oracle.com> Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-07-14xen/grant-table: remove support for V2 tablesDavid Vrabel1-55/+5
Since 11c7ff17c9b6dbf3a4e4f36be30ad531a6cf0ec9 (xen/grant-table: Force to use v1 of grants.) the code for V2 grant tables is not used. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-07-14x86/xen: safely map and unmap grant frames when in atomic contextDavid Vrabel1-57/+91
arch_gnttab_map_frames() and arch_gnttab_unmap_frames() are called in atomic context but were calling alloc_vm_area() which might sleep. Also, if a driver attempts to allocate a grant ref from an interrupt and the table needs expanding, then the CPU may already by in lazy MMU mode and apply_to_page_range() will BUG when it tries to re-enable lazy MMU mode. These two functions are only used in PV guests. Introduce arch_gnttab_init() to allocates the virtual address space in advance. Avoid the use of apply_to_page_range() by using saving and using the array of PTE addresses from the alloc_vm_area() call. N.B. 'alloc_vm_area' pre-allocates the pagetable so there is no need to worry about having to do a PGD/PUD/PMD walk (like apply_to_page_range does) and we can instead do set_pte. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> ---- [v2: Add comment about alloc_vm_area] [v3: Fix compile error found by 0-day bot]
2014-07-14xen: Introduce 'xen_nopv' to disable PV extensions for HVM guests.Konrad Rzeszutek Wilk1-0/+13
By default when CONFIG_XEN and CONFIG_XEN_PVHVM kernels are run, they will enable the PV extensions (drivers, interrupts, timers, etc) - which is the best option for the majority of use cases. However, in some cases (kexec not fully working, benchmarking) we want to disable Xen PV extensions. As such introduce the 'xen_nopv' parameter that will do it. This parameter is intended only for HVM guests as the Xen PV guests MUST boot with PV extensions. However, even if you use 'xen_nopv' on Xen PV guests it will be ignored. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> --- [v2: s/off/xen_nopv/ per Boris Ostrovsky recommendation.] [v3: Add Reviewed-by] [v4: Clarify that this is only for HVM guests]
2014-06-19Merge tag 'stable/for-linus-3.16-rc1-tag' of ↵Linus Torvalds3-29/+37
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull Xen fixes from David Vrabel: "Xen regression and PVH fixes for 3.16-rc1 - fix dom0 PVH memory setup on latest unstable Xen releases - fix 64-bit x86 PV guest boot failure on Xen 3.1 and earlier - fix resume regression on non-PV (auto-translated physmap) guests" * tag 'stable/for-linus-3.16-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/grant-table: fix suspend for non-PV guests x86/xen: no need to explicitly register an NMI callback Revert "xen/pvh: Update E820 to work with PVH (v2)" x86/xen: fix memory setup for PVH dom0
2014-06-18x86/xen: no need to explicitly register an NMI callbackDavid Vrabel1-8/+1
Remove xen_enable_nmi() to fix a 64-bit guest crash when registering the NMI callback on Xen 3.1 and earlier. It's not needed since the NMI callback is set by a set_trap_table hypercall (in xen_load_idt() or xen_write_idt_entry()). It's also broken since it only set the current VCPU's callback. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com> Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
2014-06-05Merge branch 'x86/vdso' of ↵Linus Torvalds2-7/+12
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next Pull x86 cdso updates from Peter Anvin: "Vdso cleanups and improvements largely from Andy Lutomirski. This makes the vdso a lot less ''special''" * 'x86/vdso' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/vdso, build: Make LE access macros clearer, host-safe x86/vdso, build: Fix cross-compilation from big-endian architectures x86/vdso, build: When vdso2c fails, unlink the output x86, vdso: Fix an OOPS accessing the HPET mapping w/o an HPET x86, mm: Replace arch_vma_name with vm_ops->name for vsyscalls x86, mm: Improve _install_special_mapping and fix x86 vdso naming mm, fs: Add vm_ops->name as an alternative to arch_vma_name x86, vdso: Fix an OOPS accessing the HPET mapping w/o an HPET x86, vdso: Remove vestiges of VDSO_PRELINK and some outdated comments x86, vdso: Move the vvar and hpet mappings next to the 64-bit vDSO x86, vdso: Move the 32-bit vdso special pages after the text x86, vdso: Reimplement vdso.so preparation in build-time C x86, vdso: Move syscall and sysenter setup into kernel/cpu/common.c x86, vdso: Clean up 32-bit vs 64-bit vdso params x86, mm: Ensure correct alignment of the fixmap
2014-06-05Revert "xen/pvh: Update E820 to work with PVH (v2)"David Vrabel1-20/+2
This reverts commit 9103bb0f8240b2a55aac3ff7ecba9c7dcf66b08b. Now than xen_memory_setup() is not called for auto-translated guests, we can remove this commit. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Tested-by: Roger Pau Monné <roger.pau@citrix.com>
2014-06-05x86/xen: fix memory setup for PVH dom0David Vrabel3-1/+34
Since af06d66ee32b (x86: fix setup of PVH Dom0 memory map) in Xen, PVH dom0 need only use the memory memory provided by Xen which has already setup all the correct holes. xen_memory_setup() then ends up being trivial for a PVH guest so introduce a new function (xen_auto_xlated_memory_setup()). Signed-off-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Tested-by: Roger Pau Monné <roger.pau@citrix.com>
2014-06-02Merge tag 'stable/for-linus-3.16-rc0-tag' of ↵Linus Torvalds6-70/+270
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip into next Pull Xen updates from David Vrabel: "xen: features and fixes for 3.16-rc0 - support foreign mappings in PVH domains (needed when dom0 is PVH) - fix mapping high MMIO regions in x86 PV guests (this is also the first half of removing the PAGE_IOMAP PTE flag). - ARM suspend/resume support. - ARM multicall support" * tag 'stable/for-linus-3.16-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/xen: map foreign pfns for autotranslated guests xen-acpi-processor: Don't display errors when we get -ENOSYS xen/pciback: Document the entry points for 'pcistub_put_pci_dev' xen/pciback: Document when the 'unbind' and 'bind' functions are called. xen-pciback: Document when we FLR an PCI device. xen-pciback: First reset, then free. xen-pciback: Cleanup up pcistub_put_pci_dev x86/xen: do not use _PAGE_IOMAP in xen_remap_domain_mfn_range() x86/xen: set regions above the end of RAM as 1:1 x86/xen: only warn once if bad MFNs are found during setup x86/xen: compactly store large identity ranges in the p2m x86/xen: fix set_phys_range_identity() if pfn_e > MAX_P2M_PFN x86/xen: rename early_p2m_alloc() and early_p2m_alloc_middle() xen/x86: set panic notifier priority to minimum arm,arm64/xen: introduce HYPERVISOR_suspend() xen: refactor suspend pre/post hooks arm: xen: export HYPERVISOR_multicall to modules. arm64: introduce virt_to_pfn arm/xen: Remove definiition of virt_to_pfn in asm/xen/page.h arm: xen: implement multicall hypercall support.
2014-05-27x86/xen: map foreign pfns for autotranslated guestsMukesh Rathor1-3/+118
When running as a dom0 in PVH mode, foreign pfns that are accessed must be added to our p2m which is managed by xen. This is done via XENMEM_add_to_physmap_range hypercall. This is needed for toolstack building guests and mapping guest memory, xentrace mapping xen pages, etc. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2014-05-21Merge remote-tracking branch 'origin/x86/espfix' into x86/vdsoH. Peter Anvin2-4/+4
Merge x86/espfix into x86/vdso, due to changes in the vdso setup code that otherwise cause conflicts. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-15x86/xen: do not use _PAGE_IOMAP in xen_remap_domain_mfn_range()David Vrabel1-3/+1
_PAGE_IOMAP is used in xen_remap_domain_mfn_range() to prevent the pfn_pte() call in remap_area_mfn_pte_fn() from using the p2m to translate the MFN. If mfn_pte() is used instead, the p2m look up is avoided and the use of _PAGE_IOMAP is no longer needed. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-05-15x86/xen: set regions above the end of RAM as 1:1David Vrabel2-1/+10
PCI devices may have BARs located above the end of RAM so mark such frames as identity frames in the p2m (instead of the default of missing). PFNs outside the p2m (above MAX_P2M_PFN) are also considered to be identity frames for the same reason. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-05-15x86/xen: only warn once if bad MFNs are found during setupDavid Vrabel1-3/+3
In xen_add_extra_mem(), if the WARN() checks for bad MFNs trigger it is likely that they will trigger at lot, spamming the log. Use WARN_ONCE() instead. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-05-15x86/xen: compactly store large identity ranges in the p2mDavid Vrabel1-50/+105
Large (multi-GB) identity ranges currently require a unique middle page (filled with p2m_identity entries) per 1 GB region. Similar to the common p2m_mid_missing middle page for large missing regions, introduce a p2m_mid_identity page (filled with p2m_identity entries) which can be used instead. set_phys_range_identity() thus only needs to allocate new middle pages at the beginning and end of the range. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-05-15x86/xen: fix set_phys_range_identity() if pfn_e > MAX_P2M_PFNDavid Vrabel1-1/+4
Allow set_phys_range_identity() to work with a range that overlaps MAX_P2M_PFN by clamping pfn_e to MAX_P2M_PFN. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-05-15x86/xen: rename early_p2m_alloc() and early_p2m_alloc_middle()David Vrabel1-6/+6
early_p2m_alloc_middle() allocates a new leaf page and early_p2m_alloc() allocates a new middle page. This is confusing. Swap the names so they match what the functions actually do. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-05-15xen/x86: set panic notifier priority to minimumRadim Krčmář1-0/+1
Execution is not going to continue after telling Xen about the crash. Let other panic notifiers run by postponing the final hypercall as much as possible. Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2014-05-12xen: refactor suspend pre/post hooksDavid Vrabel2-3/+22
New architectures currently have to provide implementations of 5 different functions: xen_arch_pre_suspend(), xen_arch_post_suspend(), xen_arch_hvm_post_suspend(), xen_mm_pin_all(), and xen_mm_unpin_all(). Refactor the suspend code to only require xen_arch_pre_suspend() and xen_arch_post_suspend(). Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2014-05-05asmlinkage, x86: Add explicit __visible to arch/x86/*Andi Kleen2-4/+4
As requested by Linus add explicit __visible to the asmlinkage users. This marks all functions visible to assembler. Tree sweep for arch/x86/* Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1398984278-29319-3-git-send-email-andi@firstfloor.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-05x86, vdso: Move the vvar and hpet mappings next to the 64-bit vDSOAndy Lutomirski1-5/+3
This makes the 64-bit and x32 vdsos use the same mechanism as the 32-bit vdso. Most of the churn is deleting all the old fixmap code. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/8af87023f57f6bb96ec8d17fce3f88018195b49b.1399317206.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-05x86, vdso: Reimplement vdso.so preparation in build-time CAndy Lutomirski1-2/+9
Currently, vdso.so files are prepared and analyzed by a combination of objcopy, nm, some linker script tricks, and some simple ELF parsers in the kernel. Replace all of that with plain C code that runs at build time. All five vdso images now generate .c files that are compiled and linked in to the kernel image. This should cause only one userspace-visible change: the loaded vDSO images are stripped more heavily than they used to be. Everything outside the loadable segment is dropped. In particular, this causes the section table and section name strings to be missing. This should be fine: real dynamic loaders don't load or inspect these tables anyway. The result is roughly equivalent to eu-strip's --strip-sections option. The purpose of this change is to enable the vvar and hpet mappings to be moved to the page following the vDSO load segment. Currently, it is possible for the section table to extend into the page after the load segment, so, if we map it, it risks overlapping the vvar or hpet page. This happens whenever the load segment is just under a multiple of PAGE_SIZE. The only real subtlety here is that the old code had a C file with inline assembler that did 'call VDSO32_vsyscall' and a linker script that defined 'VDSO32_vsyscall = __kernel_vsyscall'. This most likely worked by accident: the linker script entry defines a symbol associated with an address as opposed to an alias for the real dynamic symbol __kernel_vsyscall. That caused ld to relocate the reference at link time instead of leaving an interposable dynamic relocation. Since the VDSO32_vsyscall hack is no longer needed, I now use 'call __kernel_vsyscall', and I added -Bsymbolic to make it work. vdso2c will generate an error and abort the build if the resulting image contains any dynamic relocations, so we won't silently generate bad vdso images. (Dynamic relocations are a problem because nothing will even attempt to relocate the vdso.) Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/2c4fcf45524162a34d87fdda1eb046b2a5cecee7.1399317206.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-04-17Merge tag 'stable/for-linus-3.15-rc1-tag' of ↵Linus Torvalds3-10/+23
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull Xen fixes from David Vrabel: "Xen regression and bug fixes for 3.15-rc1: - fix completely broken 32-bit PV guests caused by x86 refactoring 32-bit thread_info. - only enable ticketlock slow path on Xen (not bare metal) - fix two bugs with PV guests not shutting down when requested - fix a minor memory leak in xen-pciback error path" * tag 'stable/for-linus-3.15-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/manage: Poweroff forcefully if user-space is not yet up. xen/xenbus: Avoid synchronous wait on XenBus stalling shutdown/restart. xen/spinlock: Don't enable them unconditionally. xen-pciback: silence an unwanted debug printk xen: fix memory leak in __xen_pcibk_add_pci_dev() x86/xen: Fix 32-bit PV guests's usage of kernel_stack
2014-04-15xen/spinlock: Don't enable them unconditionally.Konrad Rzeszutek Wilk1-1/+4
The git commit a945928ea2709bc0e8e8165d33aed855a0110279 ('xen: Do not enable spinlocks before jump_label_init() has executed') was added to deal with the jump machinery. Earlier the code that turned on the jump label was only called by Xen specific functions. But now that it had been moved to the initcall machinery it gets called on Xen, KVM, and baremetal - ouch!. And the detection machinery to only call it on Xen wasn't remembered in the heat of merge window excitement. This means that the slowpath is enabled on baremetal while it should not be. Reported-by: Waiman Long <waiman.long@hp.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> CC: stable@vger.kernel.org CC: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>