summaryrefslogtreecommitdiffstats
path: root/drivers/xen
AgeCommit message (Collapse)AuthorFilesLines
2016-01-18Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds2-5/+6
Pull virtio barrier rework+fixes from Michael Tsirkin: "This adds a new kind of barrier, and reworks virtio and xen to use it. Plus some fixes here and there" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (44 commits) checkpatch: add virt barriers checkpatch: check for __smp outside barrier.h checkpatch.pl: add missing memory barriers virtio: make find_vqs() checkpatch.pl-friendly virtio_balloon: fix race between migration and ballooning virtio_balloon: fix race by fill and leak s390: more efficient smp barriers s390: use generic memory barriers xen/events: use virt_xxx barriers xen/io: use virt_xxx barriers xenbus: use virt_xxx barriers virtio_ring: use virt_store_mb sh: move xchg_cmpxchg to a header by itself sh: support 1 and 2 byte xchg virtio_ring: update weak barriers to use virt_xxx Revert "virtio_ring: Update weak barriers to use dma_wmb/rmb" asm-generic: implement virt_xxx memory barriers x86: define __smp_xxx xtensa: define __smp_xxx tile: define __smp_xxx ...
2016-01-12xen/events: use virt_xxx barriersMichael S. Tsirkin1-1/+2
drivers/xen/events/events_fifo.c uses rmb() to communicate with the other side. For guests compiled with CONFIG_SMP, smp_rmb would be sufficient, so rmb() here is only needed if a non-SMP guest runs on an SMP host. Switch to the virt_rmb barrier which serves this exact purpose. Pull in asm/barrier.h here to make sure the file is self-contained. Suggested-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2016-01-12xenbus: use virt_xxx barriersMichael S. Tsirkin1-4/+4
drivers/xen/xenbus/xenbus_comms.c uses full memory barriers to communicate with the other side. For guests compiled with CONFIG_SMP, smp_wmb and smp_mb would be sufficient, so mb() and wmb() here are only needed if a non-SMP guest runs on an SMP host. Switch to virt_xxx barriers which serve this exact purpose. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2016-01-07xen/gntdev: add ioctl for grant copyDavid Vrabel1-0/+203
Add IOCTL_GNTDEV_GRANT_COPY to allow applications to copy between user space buffers and grant references. This interface is similar to the GNTTABOP_copy hypercall ABI except the local buffers are provided using a virtual address (instead of a GFN and offset). To avoid userspace from having to page align its buffers the driver will use two or more ops if required. If the ioctl returns 0, the application must check the status of each segment with the segments status field. If the ioctl returns a -ve error code (EINVAL or EFAULT), the status of individual ops is undefined. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2015-12-21xen/gntdev: constify mmu_notifier_ops structuresJulia Lawall1-1/+1
This mmu_notifier_ops structure is never modified, so declare it as const, like the other mmu_notifier_ops structures. Done with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-12-21xen/grant-table: constify gnttab_ops structureJulia Lawall1-2/+2
The gnttab_ops structure is never modified, so declare it as const. Done with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-12-21xen/time: use READ_ONCEStefano Stabellini1-10/+7
Use READ_ONCE through the code, rather than explicit barriers. Suggested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-12-21xen: rename dom0_op to platform_opStefano Stabellini7-29/+29
The dom0_op hypercall has been renamed to platform_op since Xen 3.2, which is ancient, and modern upstream Linux kernels cannot run as dom0 and it anymore anyway. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2015-12-21xen: move xen_setup_runstate_info and get_runstate_snapshot to ↵Stefano Stabellini2-1/+92
drivers/xen/time.c Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2015-12-18Merge tag 'for-linus-4.4-rc5-tag' of ↵Linus Torvalds5-22/+83
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen bug fixes from David Vrabel: - XSA-155 security fixes to backend drivers. - XSA-157 security fixes to pciback. * tag 'for-linus-4.4-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen-pciback: fix up cleanup path when alloc fails xen/pciback: Don't allow MSI-X ops if PCI_COMMAND_MEMORY is not set. xen/pciback: For XEN_PCI_OP_disable_msi[|x] only disable if device has MSI(X) enabled. xen/pciback: Do not install an IRQ handler for MSI interrupts. xen/pciback: Return error on XEN_PCI_OP_enable_msix when device has MSI or MSI-X enabled xen/pciback: Return error on XEN_PCI_OP_enable_msi when device has MSI or MSI-X enabled xen/pciback: Save xen_pci_op commands before processing it xen-scsiback: safely copy requests xen-blkback: read from indirect descriptors only once xen-blkback: only read request operation from shared ring once xen-netback: use RING_COPY_REQUEST() throughout xen-netback: don't use last request to determine minimum Tx credit xen: Add RING_COPY_REQUEST() xen/x86/pvh: Use HVM's flush_tlb_others op xen: Resume PMU from non-atomic context xen/events/fifo: Consume unprocessed events when a CPU dies
2015-12-18xen-pciback: fix up cleanup path when alloc failsDoug Goldstein1-1/+3
When allocating a pciback device fails, clear the private field. This could lead to an use-after free, however the 'really_probe' takes care of setting dev_set_drvdata(dev, NULL) in its failure path (which we would exercise if the ->probe function failed), so we we are OK. However lets be defensive as the code can change. Going forward we should clean up the pci_set_drvdata(dev, NULL) in the various code-base. That will be for another day. Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reported-by: Jonathan Creekmore <jonathan.creekmore@gmail.com> Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2015-12-18xen/pciback: Don't allow MSI-X ops if PCI_COMMAND_MEMORY is not set.Konrad Rzeszutek Wilk1-1/+7
commit f598282f51 ("PCI: Fix the NIU MSI-X problem in a better way") teaches us that dealing with MSI-X can be troublesome. Further checks in the MSI-X architecture shows that if the PCI_COMMAND_MEMORY bit is turned of in the PCI_COMMAND we may not be able to access the BAR (since they are memory regions). Since the MSI-X tables are located in there.. that can lead to us causing PCIe errors. Inhibit us performing any operation on the MSI-X unless the MEMORY bit is set. Note that Xen hypervisor with: "x86/MSI-X: access MSI-X table only after having enabled MSI-X" will return: xen_pciback: 0000:0a:00.1: error -6 enabling MSI-X for guest 3! When the generic MSI code tries to setup the PIRQ without MEMORY bit set. Which means with later versions of Xen (4.6) this patch is not neccessary. This is part of XSA-157 CC: stable@vger.kernel.org Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2015-12-18xen/pciback: For XEN_PCI_OP_disable_msi[|x] only disable if device has ↵Konrad Rzeszutek Wilk1-13/+20
MSI(X) enabled. Otherwise just continue on, returning the same values as previously (return of 0, and op->result has the PIRQ value). This does not change the behavior of XEN_PCI_OP_disable_msi[|x]. The pci_disable_msi or pci_disable_msix have the checks for msi_enabled or msix_enabled so they will error out immediately. However the guest can still call these operations and cause us to disable the 'ack_intr'. That means the backend IRQ handler for the legacy interrupt will not respond to interrupts anymore. This will lead to (if the device is causing an interrupt storm) for the Linux generic code to disable the interrupt line. Naturally this will only happen if the device in question is plugged in on the motherboard on shared level interrupt GSI. This is part of XSA-157 CC: stable@vger.kernel.org Reviewed-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2015-12-18xen/pciback: Do not install an IRQ handler for MSI interrupts.Konrad Rzeszutek Wilk1-0/+7
Otherwise an guest can subvert the generic MSI code to trigger an BUG_ON condition during MSI interrupt freeing: for (i = 0; i < entry->nvec_used; i++) BUG_ON(irq_has_action(entry->irq + i)); Xen PCI backed installs an IRQ handler (request_irq) for the dev->irq whenever the guest writes PCI_COMMAND_MEMORY (or PCI_COMMAND_IO) to the PCI_COMMAND register. This is done in case the device has legacy interrupts the GSI line is shared by the backend devices. To subvert the backend the guest needs to make the backend to change the dev->irq from the GSI to the MSI interrupt line, make the backend allocate an interrupt handler, and then command the backend to free the MSI interrupt and hit the BUG_ON. Since the backend only calls 'request_irq' when the guest writes to the PCI_COMMAND register the guest needs to call XEN_PCI_OP_enable_msi before any other operation. This will cause the generic MSI code to setup an MSI entry and populate dev->irq with the new PIRQ value. Then the guest can write to PCI_COMMAND PCI_COMMAND_MEMORY and cause the backend to setup an IRQ handler for dev->irq (which instead of the GSI value has the MSI pirq). See 'xen_pcibk_control_isr'. Then the guest disables the MSI: XEN_PCI_OP_disable_msi which ends up triggering the BUG_ON condition in 'free_msi_irqs' as there is an IRQ handler for the entry->irq (dev->irq). Note that this cannot be done using MSI-X as the generic code does not over-write dev->irq with the MSI-X PIRQ values. The patch inhibits setting up the IRQ handler if MSI or MSI-X (for symmetry reasons) code had been called successfully. P.S. Xen PCIBack when it sets up the device for the guest consumption ends up writting 0 to the PCI_COMMAND (see xen_pcibk_reset_device). XSA-120 addendum patch removed that - however when upstreaming said addendum we found that it caused issues with qemu upstream. That has now been fixed in qemu upstream. This is part of XSA-157 CC: stable@vger.kernel.org Reviewed-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2015-12-18xen/pciback: Return error on XEN_PCI_OP_enable_msix when device has MSI or ↵Konrad Rzeszutek Wilk1-0/+7
MSI-X enabled The guest sequence of: a) XEN_PCI_OP_enable_msix b) XEN_PCI_OP_enable_msix results in hitting an NULL pointer due to using freed pointers. The device passed in the guest MUST have MSI-X capability. The a) constructs and SysFS representation of MSI and MSI groups. The b) adds a second set of them but adding in to SysFS fails (duplicate entry). 'populate_msi_sysfs' frees the newly allocated msi_irq_groups (note that in a) pdev->msi_irq_groups is still set) and also free's ALL of the MSI-X entries of the device (the ones allocated in step a) and b)). The unwind code: 'free_msi_irqs' deletes all the entries and tries to delete the pdev->msi_irq_groups (which hasn't been set to NULL). However the pointers in the SysFS are already freed and we hit an NULL pointer further on when 'strlen' is attempted on a freed pointer. The patch adds a simple check in the XEN_PCI_OP_enable_msix to guard against that. The check for msi_enabled is not stricly neccessary. This is part of XSA-157 CC: stable@vger.kernel.org Reviewed-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2015-12-18xen/pciback: Return error on XEN_PCI_OP_enable_msi when device has MSI or ↵Konrad Rzeszutek Wilk1-1/+6
MSI-X enabled The guest sequence of: a) XEN_PCI_OP_enable_msi b) XEN_PCI_OP_enable_msi c) XEN_PCI_OP_disable_msi results in hitting an BUG_ON condition in the msi.c code. The MSI code uses an dev->msi_list to which it adds MSI entries. Under the above conditions an BUG_ON() can be hit. The device passed in the guest MUST have MSI capability. The a) adds the entry to the dev->msi_list and sets msi_enabled. The b) adds a second entry but adding in to SysFS fails (duplicate entry) and deletes all of the entries from msi_list and returns (with msi_enabled is still set). c) pci_disable_msi passes the msi_enabled checks and hits: BUG_ON(list_empty(dev_to_msi_list(&dev->dev))); and blows up. The patch adds a simple check in the XEN_PCI_OP_enable_msi to guard against that. The check for msix_enabled is not stricly neccessary. This is part of XSA-157. CC: stable@vger.kernel.org Reviewed-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2015-12-18xen/pciback: Save xen_pci_op commands before processing itKonrad Rzeszutek Wilk2-1/+15
Double fetch vulnerabilities that happen when a variable is fetched twice from shared memory but a security check is only performed the first time. The xen_pcibk_do_op function performs a switch statements on the op->cmd value which is stored in shared memory. Interestingly this can result in a double fetch vulnerability depending on the performed compiler optimization. This patch fixes it by saving the xen_pci_op command before processing it. We also use 'barrier' to make sure that the compiler does not perform any optimization. This is part of XSA155. CC: stable@vger.kernel.org Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Jan Beulich <JBeulich@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2015-12-18xen-scsiback: safely copy requestsDavid Vrabel1-1/+1
The copy of the ring request was lacking a following barrier(), potentially allowing the compiler to optimize the copy away. Use RING_COPY_REQUEST() to ensure the request is copied to local memory. This is part of XSA155. CC: stable@vger.kernel.org Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2015-12-02xen/events/fifo: Consume unprocessed events when a CPU diesRoss Lagerwall1-5/+18
When a CPU is offlined, there may be unprocessed events on a port for that CPU. If the port is subsequently reused on a different CPU, it could be in an unexpected state with the link bit set, resulting in interrupts being missed. Fix this by consuming any unprocessed events for a particular CPU when that CPU dies. Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Cc: <stable@vger.kernel.org> # 3.14+ Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-11-26Merge tag 'for-linus-4.4-rc2-tag' of ↵Linus Torvalds3-19/+111
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen bug fixes from David Vrabel: - Fix gntdev and numa balancing. - Fix x86 boot crash due to unallocated legacy irq descs. - Fix overflow in evtchn device when > 1024 event channels. * tag 'for-linus-4.4-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/evtchn: dynamically grow pending event channel ring xen/events: Always allocate legacy interrupts on PV guests xen/gntdev: Grant maps should not be subject to NUMA balancing
2015-11-26xen/evtchn: dynamically grow pending event channel ringDavid Vrabel1-16/+107
If more than 1024 event channels are bound to a evtchn device then it possible (even with well behaved applications) for the ring to overflow and events to be lost (reported as an -EFBIG error). Dynamically increase the size of the ring so there is always enough space for all bound events. Well behaved applicables that only unmask events after draining them from the ring can thus no longer lose events. However, an application could unmask an event before draining it, allowing multiple entries per port to accumulate in the ring, and a overflow could still occur. So the overflow detection and reporting is retained. The ring size is initially only 64 entries so the common use case of an application only binding a few events will use less memory than before. The ring size may grow to 512 KiB (enough for all 2^17 possible channels). This order 7 kmalloc() may fail due to memory fragmentation, so we fall back to trying vmalloc(). Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
2015-11-26xen/events: Always allocate legacy interrupts on PV guestsBoris Ostrovsky1-2/+3
After commit 8c058b0b9c34 ("x86/irq: Probe for PIC presence before allocating descs for legacy IRQs") early_irq_init() will no longer preallocate descriptors for legacy interrupts if PIC does not exist, which is the case for Xen PV guests. Therefore we may need to allocate those descriptors ourselves. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-11-26xen/gntdev: Grant maps should not be subject to NUMA balancingBoris Ostrovsky1-1/+1
Doing so will cause the grant to be unmapped and then, during fault handling, the fault to be mistakenly treated as NUMA hint fault. In addition, even if those maps could partcipate in NUMA balancing, it wouldn't provide any benefit since we are unable to determine physical page's node (even if/when VNUMA is implemented). Marking grant maps' VMAs as VM_IO will exclude them from being part of NUMA balancing. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-11-13Merge branch 'for-next' of ↵Linus Torvalds1-16/+16
git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending Pull SCSI target updates from Nicholas Bellinger: "This series contains HCH's changes to absorb configfs attribute ->show() + ->store() function pointer usage from it's original tree-wide consumers, into common configfs code. It includes usb-gadget, target w/ drivers, netconsole and ocfs2 changes to realize the improved simplicity, that now renders the original include/target/configfs_macros.h CPP magic for fabric drivers and others, unnecessary and obsolete. And with common code in place, new configfs attributes can be added easier than ever before. Note, there are further improvements in-flight from other folks for v4.5 code in configfs land, plus number of target fixes for post -rc1 code" In the meantime, a new user of the now-removed old configfs API came in through the char/misc tree in commit 7bd1d4093c2f ("stm class: Introduce an abstraction for System Trace Module devices"). This merge resolution comes from Alexander Shishkin, who updated his stm class tracing abstraction to account for the removal of the old show_attribute and store_attribute methods in commit 517982229f78 ("configfs: remove old API") from this pull. As Alexander says about that patch: "There's no need to keep an extra wrapper structure per item and the awkward show_attribute/store_attribute item ops are no longer needed. This patch converts policy code to the new api, all the while making the code quite a bit smaller and easier on the eyes. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>" That patch was folded into the merge so that the tree should be fully bisectable. * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (23 commits) configfs: remove old API ocfs2/cluster: use per-attribute show and store methods ocfs2/cluster: move locking into attribute store methods netconsole: use per-attribute show and store methods target: use per-attribute show and store methods spear13xx_pcie_gadget: use per-attribute show and store methods dlm: use per-attribute show and store methods usb-gadget/f_serial: use per-attribute show and store methods usb-gadget/f_phonet: use per-attribute show and store methods usb-gadget/f_obex: use per-attribute show and store methods usb-gadget/f_uac2: use per-attribute show and store methods usb-gadget/f_uac1: use per-attribute show and store methods usb-gadget/f_mass_storage: use per-attribute show and store methods usb-gadget/f_sourcesink: use per-attribute show and store methods usb-gadget/f_printer: use per-attribute show and store methods usb-gadget/f_midi: use per-attribute show and store methods usb-gadget/f_loopback: use per-attribute show and store methods usb-gadget/ether: use per-attribute show and store methods usb-gadget/f_acm: use per-attribute show and store methods usb-gadget/f_hid: use per-attribute show and store methods ...
2015-10-23xen, cpu_hotplug: call device_offline instead of cpu_downStefano Stabellini1-1/+5
When offlining a cpu, instead of cpu_down, call device_offline, which also takes care of updating the cpu.dev.offline field. This keeps the sysfs file /sys/devices/system/cpu/cpuN/online, up to date. Also move the call to disable_hotplug_cpu, because it makes more sense to have it there. We don't call device_online at cpu-hotplug time, because that would immediately take the cpu online, while we want to retain the current behaviour: the user needs to explicitly enable the cpu after it has been hotplugged. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> CC: konrad.wilk@oracle.com CC: boris.ostrovsky@oracle.com CC: david.vrabel@citrix.com
2015-10-23xen/arm: Enable cpu_hotplug.cStefano Stabellini2-4/+6
Build cpu_hotplug for ARM and ARM64 guests. Rename arch_(un)register_cpu to xen_(un)register_cpu and provide an empty implementation on ARM and ARM64. On x86 just call arch_(un)register_cpu as we are already doing. Initialize cpu_hotplug on ARM. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: Julien Grall <julien.grall@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2015-10-23xenbus: Support multiple grants ring with 64KBJulien Grall1-25/+72
The PV ring may use multiple grants and expect them to be mapped contiguously in the virtual memory. Although, the current code is relying on a Linux page will be mapped to a single grant. On build where Linux is using a different page size than the grant (i.e other than 4KB), the grant will always be mapped on the first 4KB of each Linux page which make the final ring not contiguous in the memory. This can be fixed by mapping multiple grant in a same Linux page. Signed-off-by: Julien Grall <julien.grall@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-10-23xen/grant-table: Add an helper to iterate over a specific number of grantsJulien Grall1-0/+22
With the 64KB page granularity support on ARM64, a Linux page may be split accross multiple grant. Currently we have the helper gnttab_foreach_grant_in_grant to break a Linux page based on an offset and a len, but it doesn't fit when we only have a number of grants in hand. Introduce a new helper which take an array of Linux page and a number of grant and will figure out the address of each grant. Signed-off-by: Julien Grall <julien.grall@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-10-23xen/xenbus: Rename *RING_PAGE* to *RING_GRANT*Julien Grall1-17/+17
Linux may use a different page size than the size of grant. So make clear that the order is actually in number of grant. Signed-off-by: Julien Grall <julien.grall@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-10-23xen/balloon: Use the correct sizeof when declaring frame_listJulien Grall1-1/+1
The type of the item in frame_list is xen_pfn_t which is not an unsigned long on ARM but an uint64_t. With the current computation, the size of frame_list will be 2 * PAGE_SIZE rather than PAGE_SIZE. I bet it's just mistake when the type has been switched from "unsigned long" to "xen_pfn_t" in commit 965c0aaafe3e75d4e65cd4ec862915869bde3abd "xen: balloon: use correct type for frame_list". Signed-off-by: Julien Grall <julien.grall@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-10-23xen/swiotlb: Add support for 64KB page granularityJulien Grall1-20/+19
Swiotlb is used on ARM64 to support DMA on platform where devices are not protected by an SMMU. Furthermore it's only enabled for DOM0. While Xen is always using 4KB page granularity in the stage-2 page table, Linux ARM64 may either use 4KB or 64KB. This means that a Linux page can be spanned accross multiple Xen page. The Swiotlb code has to validate that the buffer used for DMA is physically contiguous in the memory. As a Linux page can't be shared between local memory and foreign page by design (the balloon code always removing entirely a Linux page), the changes in the code are very minimal because we only need to check the first Xen PFN. Note that it may be possible to optimize the function check_page_physically_contiguous to avoid looping over every Xen PFN for local memory. Although I will let this optimization for a follow-up. Signed-off-by: Julien Grall <julien.grall@citrix.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-10-23xen/swiotlb: Pass addresses rather than frame numbers to xen_arch_need_swiotlbJulien Grall1-2/+2
With 64KB page granularity support, the frame number will be different. It will be easier to modify the behavior in a single place rather than in each caller. Signed-off-by: Julien Grall <julien.grall@citrix.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-10-23xen/privcmd: Add support for Linux 64KB page granularityJulien Grall2-43/+89
The hypercall interface (as well as the toolstack) is always using 4KB page granularity. When the toolstack is asking for mapping a series of guest PFN in a batch, it expects to have the page map contiguously in its virtual memory. When Linux is using 64KB page granularity, the privcmd driver will have to map multiple Xen PFN in a single Linux page. Note that this solution works on page granularity which is a multiple of 4KB. Signed-off-by: Julien Grall <julien.grall@citrix.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-10-23xen/grant-table: Make it running on 64KB granularityJulien Grall1-3/+3
The Xen interface is using 4KB page granularity. This means that each grant is 4KB. The current implementation allocates a Linux page per grant. On Linux using 64KB page granularity, only the first 4KB of the page will be used. We could decrease the memory wasted by sharing the page with multiple grant. It will require some care with the {Set,Clear}ForeignPage macro. Note that no changes has been made in the x86 code because both Linux and Xen will only use 4KB page granularity. Signed-off-by: Julien Grall <julien.grall@citrix.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-10-23xen/events: fifo: Make it running on 64KB granularityJulien Grall2-2/+2
Only use the first 4KB of the page to store the events channel info. It means that we will waste 60KB every time we allocate page for: * control block: a page is allocating per CPU * event array: a page is allocating everytime we need to expand it I think we can reduce the memory waste for the 2 areas by: * control block: sharing between multiple vCPUs. Although it will require some bookkeeping in order to not free the page when the CPU goes offline and the other CPUs sharing the page still there * event array: always extend the array event by 64K (i.e 16 4K chunk). That would require more care when we fail to expand the event channel. Signed-off-by: Julien Grall <julien.grall@citrix.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-10-23xen/balloon: Don't rely on the page granularity is the same for Xen and LinuxJulien Grall1-15/+55
For ARM64 guests, Linux is able to support either 64K or 4K page granularity. Although, the hypercall interface is always based on 4K page granularity. With 64K page granularity, a single page will be spread over multiple Xen frame. To avoid splitting the page into 4K frame, take advantage of the extent_order field to directly allocate/free chunk of the Linux page size. Note that PVMMU is only used for PV guest (which is x86) and the page granularity is always 4KB. Some BUILD_BUG_ON has been added to ensure that because the code has not been modified. Signed-off-by: Julien Grall <julien.grall@citrix.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-10-23xen/xenbus: Use Xen page definitionJulien Grall2-4/+5
All the ring (xenstore, and PV rings) are always based on the page granularity of Xen. Signed-off-by: Julien Grall <julien.grall@citrix.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-10-23xen/biomerge: Don't allow biovec's to be merged when Linux is not using 4KB ↵Julien Grall1-0/+8
pages On ARM all dma-capable devices on a same platform may not be protected by an IOMMU. The DMA requests have to use the BFN (i.e MFN on ARM) in order to use correctly the device. While the DOM0 memory is allocated in a 1:1 fashion (PFN == MFN), grant mapping will screw this contiguous mapping. When Linux is using 64KB page granularitary, the page may be split accross multiple non-contiguous MFN (Xen is using 4KB page granularity). Therefore a DMA request will likely fail. Checking that a 64KB page is using contiguous MFN is tedious. For now, always says that biovec are not mergeable. Signed-off-by: Julien Grall <julien.grall@citrix.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-10-23xen/grant: Introduce helpers to split a page into grantJulien Grall1-0/+26
Currently, a grant is always based on the Xen page granularity (i.e 4KB). When Linux is using a different page granularity, a single page will be split between multiple grants. The new helpers will be in charge of splitting the Linux page into grants and call a function given by the caller on each grant. Also provide an helper to count the number of grants within a given contiguous region. Note that the x86/include/asm/xen/page.h is now including xen/interface/grant_table.h rather than xen/grant_table.h. It's necessary because xen/grant_table.h depends on asm/xen/page.h and will break the compilation. Furthermore, only definition in interface/grant_table.h is required. Signed-off-by: Julien Grall <julien.grall@citrix.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-10-23xen/balloon: pre-allocate p2m entries for ballooned pagesDavid Vrabel1-0/+5
Pages returned by alloc_xenballooned_pages() will be used for grant mapping which will call set_phys_to_machine() (in PV guests). Ballooned pages are set as INVALID_P2M_ENTRY in the p2m and thus may be using the (shared) missing tables and a subsequent set_phys_to_machine() will need to allocate new tables. Since the grant mapping may be done from a context that cannot sleep, the p2m entries must already be allocated. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
2015-10-23xen/balloon: use hotplugged pages for foreign mappings etc.David Vrabel1-10/+80
alloc_xenballooned_pages() is used to get ballooned pages to back foreign mappings etc. Instead of having to balloon out real pages, use (if supported) hotplugged memory. This makes more memory available to the guest and reduces fragmentation in the p2m. This is only enabled if the xen.balloon.hotplug_unpopulated sysctl is set to 1. This sysctl defaults to 0 in case the udev rules to automatically online hotplugged memory do not exist. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com> --- v3: - Add xen.balloon.hotplug_unpopulated sysctl to enable use of hotplug for unpopulated pages.
2015-10-23xen/balloon: make alloc_xenballoon_pages() always allocate low pagesDavid Vrabel4-17/+11
All users of alloc_xenballoon_pages() wanted low memory pages, so remove the option for high memory. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
2015-10-23xen/balloon: only hotplug additional memory if requiredDavid Vrabel1-4/+19
Now that we track the total number of pages (included hotplugged regions), it is easy to determine if more memory needs to be hotplugged. Add a new BP_WAIT state to signal that the balloon process needs to wait until kicked by the memory add notifier (when the new section is onlined by userspace). Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com> --- v3: - Return BP_WAIT if enough sections are already hotplugged. v2: - New BP_WAIT status after adding new memory sections.
2015-10-23xen/balloon: rationalize memory hotplug statsDavid Vrabel1-63/+12
The stats used for memory hotplug make no sense and are fiddled with in odd ways. Remove them and introduce total_pages to track the total number of pages (both populated and unpopulated) including those within hotplugged regions (note that this includes not yet onlined pages). This will be used in a subsequent commit (xen/balloon: only hotplug additional memory if required) when deciding whether additional memory needs to be hotplugged. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
2015-10-23xen/balloon: find non-conflicting regions to place hotplugged memoryDavid Vrabel1-20/+53
Instead of placing hotplugged memory at the end of RAM (which may conflict with PCI devices or reserved regions) use allocate_resource() to get a new, suitably aligned resource that does not conflict. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com> --- v3: - Remove stale comment.
2015-10-13target: use per-attribute show and store methodsChristoph Hellwig1-16/+16
This also allows to remove the target-specific old configfs macros, and gets rid of the target_core_fabric_configfs.h header which only had one function declaration left that could be moved to a better place. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Nicholas Bellinger <nab@linux-iscsi.org> Acked-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-09-10Merge branch 'akpm' (patches from Andrew)Linus Torvalds4-10/+4
Merge third patch-bomb from Andrew Morton: - even more of the rest of MM - lib/ updates - checkpatch updates - small changes to a few scruffy filesystems - kmod fixes/cleanups - kexec updates - a dma-mapping cleanup series from hch * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (81 commits) dma-mapping: consolidate dma_set_mask dma-mapping: consolidate dma_supported dma-mapping: cosolidate dma_mapping_error dma-mapping: consolidate dma_{alloc,free}_noncoherent dma-mapping: consolidate dma_{alloc,free}_{attrs,coherent} mm: use vma_is_anonymous() in create_huge_pmd() and wp_huge_pmd() mm: make sure all file VMAs have ->vm_ops set mm, mpx: add "vm_flags_t vm_flags" arg to do_mmap_pgoff() mm: mark most vm_operations_struct const namei: fix warning while make xmldocs caused by namei.c ipc: convert invalid scenarios to use WARN_ON zlib_deflate/deftree: remove bi_reverse() lib/decompress_unlzma: Do a NULL check for pointer lib/decompressors: use real out buf size for gunzip with kernel fs/affs: make root lookup from blkdev logical size sysctl: fix int -> unsigned long assignments in INT_MIN case kexec: export KERNEL_IMAGE_SIZE to vmcoreinfo kexec: align crash_notes allocation to make it be inside one physical page kexec: remove unnecessary test in kimage_alloc_crash_control_pages() kexec: split kexec_load syscall from kexec core code ...
2015-09-10Merge tag 'for-linus-4.3-rc0b-tag' of ↵Linus Torvalds13-75/+66
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen terminology fixes from David Vrabel: "Use the correct GFN/BFN terms more consistently" * tag 'for-linus-4.3-rc0b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/xenbus: Rename the variable xen_store_mfn to xen_store_gfn xen/privcmd: Further s/MFN/GFN/ clean-up hvc/xen: Further s/MFN/GFN clean-up video/xen-fbfront: Further s/MFN/GFN clean-up xen/tmem: Use xen_page_to_gfn rather than pfn_to_gfn xen: Use correctly the Xen memory terminologies arm/xen: implement correctly pfn_to_mfn xen: Make clear that swiotlb and biomerge are dealing with DMA address
2015-09-10dma-mapping: consolidate dma_{alloc,free}_{attrs,coherent}Christoph Hellwig1-6/+0
Since 2009 we have a nice asm-generic header implementing lots of DMA API functions for architectures using struct dma_map_ops, but unfortunately it's still missing a lot of APIs that all architectures still have to duplicate. This series consolidates the remaining functions, although we still need arch opt outs for two of them as a few architectures have very non-standard implementations. This patch (of 5): The coherent DMA allocator works the same over all architectures supporting dma_map operations. This patch consolidates them and converges the minor differences: - the debug_dma helpers are now called from all architectures, including those that were previously missing them - dma_alloc_from_coherent and dma_release_from_coherent are now always called from the generic alloc/free routines instead of the ops dma-mapping-common.h always includes dma-coherent.h to get the defintions for them, or the stubs if the architecture doesn't support this feature - checks for ->alloc / ->free presence are removed. There is only one magic instead of dma_map_ops without them (mic_dma_ops) and that one is x86 only anyway. Besides that only x86 needs special treatment to replace a default devices if none is passed and tweak the gfp_flags. An optional arch hook is provided for that. [linux@roeck-us.net: fix build] [jcmvbkbc@gmail.com: fix xtensa] Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Russell King <linux@arm.linux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Michal Simek <monstr@monstr.eu> Cc: Jonas Bonn <jonas@southpole.se> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-10mm: mark most vm_operations_struct constKirill A. Shutemov3-4/+4
With two exceptions (drm/qxl and drm/radeon) all vm_operations_struct structs should be constant. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Minchan Kim <minchan@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>