summaryrefslogtreecommitdiffstats
path: root/drivers/vfio
AgeCommit message (Collapse)AuthorFilesLines
2017-06-20sched/wait: Rename wait_queue_t => wait_queue_entry_tIngo Molnar1-1/+1
Rename: wait_queue_t => wait_queue_entry_t 'wait_queue_t' was always a slight misnomer: its name implies that it's a "queue", but in reality it's a queue *entry*. The 'real' queue is the wait queue head, which had to carry the name. Start sorting this out by renaming it to 'wait_queue_entry_t'. This also allows the real structure name 'struct __wait_queue' to lose its double underscore and become 'struct wait_queue_entry', which is the more canonical nomenclature for such data types. Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-05-05Merge tag 'powerpc-4.12-1' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "Highlights include: - Larger virtual address space on 64-bit server CPUs. By default we use a 128TB virtual address space, but a process can request access to the full 512TB by passing a hint to mmap(). - Support for the new Power9 "XIVE" interrupt controller. - TLB flushing optimisations for the radix MMU on Power9. - Support for CAPI cards on Power9, using the "Coherent Accelerator Interface Architecture 2.0". - The ability to configure the mmap randomisation limits at build and runtime. - Several small fixes and cleanups to the kprobes code, as well as support for KPROBES_ON_FTRACE. - Major improvements to handling of system reset interrupts, correctly treating them as NMIs, giving them a dedicated stack and using a new hypervisor call to trigger them, all of which should aid debugging and robustness. - Many fixes and other minor enhancements. Thanks to: Alastair D'Silva, Alexey Kardashevskiy, Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anshuman Khandual, Anton Blanchard, Balbir Singh, Ben Hutchings, Benjamin Herrenschmidt, Bhupesh Sharma, Chris Packham, Christian Zigotzky, Christophe Leroy, Christophe Lombard, Daniel Axtens, David Gibson, Gautham R. Shenoy, Gavin Shan, Geert Uytterhoeven, Guilherme G. Piccoli, Hamish Martin, Hari Bathini, Kees Cook, Laurent Dufour, Madhavan Srinivasan, Mahesh J Salgaonkar, Mahesh Salgaonkar, Masami Hiramatsu, Matt Brown, Matthew R. Ochs, Michael Neuling, Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran, Pan Xinhui, Paul Mackerras, Rashmica Gupta, Russell Currey, Sukadev Bhattiprolu, Thadeu Lima de Souza Cascardo, Tobin C. Harding, Tyrel Datwyler, Uma Krishnan, Vaibhav Jain, Vipin K Parashar, Yang Shi" * tag 'powerpc-4.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (214 commits) powerpc/64s: Power9 has no LPCR[VRMASD] field so don't set it powerpc/powernv: Fix TCE kill on NVLink2 powerpc/mm/radix: Drop support for CPUs without lockless tlbie powerpc/book3s/mce: Move add_taint() later in virtual mode powerpc/sysfs: Move #ifdef CONFIG_HOTPLUG_CPU out of the function body powerpc/smp: Document irq enable/disable after migrating IRQs powerpc/mpc52xx: Don't select user-visible RTAS_PROC powerpc/powernv: Document cxl dependency on special case in pnv_eeh_reset() powerpc/eeh: Clean up and document event handling functions powerpc/eeh: Avoid use after free in eeh_handle_special_event() cxl: Mask slice error interrupts after first occurrence cxl: Route eeh events to all drivers in cxl_pci_error_detected() cxl: Force context lock during EEH flow powerpc/64: Allow CONFIG_RELOCATABLE if COMPILE_TEST powerpc/xmon: Teach xmon oops about radix vectors powerpc/mm/hash: Fix off-by-one in comment about kernel contexts ids powerpc/pseries: Enable VFIO powerpc/powernv: Fix iommu table size calculation hook for small tables powerpc/powernv: Check kzalloc() return value in pnv_pci_table_alloc powerpc: Add arch/powerpc/tools directory ...
2017-04-18vfio/type1: Reduce repetitive calls in vfio_pin_pages_remote()Alex Williamson1-6/+11
vfio_pin_pages_remote() is typically called to iterate over a range of memory. Testing CAP_IPC_LOCK is relatively expensive, so it makes sense to push it up to the caller, which can then repeatedly call vfio_pin_pages_remote() using that value. This can show nearly a 20% improvement on the worst case path through VFIO_IOMMU_MAP_DMA with contiguous page mapping disabled. Testing RLIMIT_MEMLOCK is much more lightweight, but we bring it along on the same principle and it does seem to show a marginal improvement. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2017-04-18vfio/type1: Prune vfio_pin_page_external()Alex Williamson1-27/+8
With vfio_lock_acct() testing the locked memory limit under mmap_sem, it's redundant to do it here for a single page. We can also reorder our tests such that we can avoid testing for reserved pages if we're not doing accounting and let vfio_lock_acct() test the process CAP_IPC_LOCK. Finally, this function oddly returns 1 on success. Update to return zero on success, -errno on error. Since the function only pins a single page, there's no need to return the number of pages pinned. N.B. vfio_pin_pages_remote() can pin a large contiguous range of pages before calling vfio_lock_acct(). If we were to similarly remove the extra test there, a user could temporarily pin far more pages than they're allowed. Suggested-by: Kirti Wankhede <kwankhede@nvidia.com> Suggested-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2017-04-13vfio/type1: Remove locked page accounting workqueueAlex Williamson1-59/+51
If the mmap_sem is contented then the vfio type1 IOMMU backend will defer locked page accounting updates to a workqueue task. This has a few problems and depending on which side the user tries to play, they might be over-penalized for unmaps that haven't yet been accounted or race the workqueue to enter more mappings than they're allowed. The original intent of this workqueue mechanism seems to be focused on reducing latency through the ioctl, but we cannot do so at the cost of correctness. Remove this workqueue mechanism and update the callers to allow for failure. We can also now recheck the limit under write lock to make sure we don't exceed it. vfio_pin_pages_remote() also now necessarily includes an unwind path which we can jump to directly if the consecutive page pinning finds that we're exceeding the user's memory limits. This avoids the current lazy approach which does accounting and mapping up to the fault, only to return an error on the next iteration to unwind the entire vfio_dma. Cc: stable@vger.kernel.org Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2017-04-11vfio/spapr_tce: Check kzalloc() return when preregistering memoryAlexey Kardashevskiy1-0/+5
This adds missing checking for kzalloc() return value. Fixes: 4b6fad7097f8 ("powerpc/mm/iommu, vfio/spapr: Put pages on VFIO container shutdown") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2017-04-11vfio/powerpc/spapr_tce: Enforce IOMMU type compatibility checkAlexey Kardashevskiy1-0/+8
The existing SPAPR TCE driver advertises both VFIO_SPAPR_TCE_IOMMU and VFIO_SPAPR_TCE_v2_IOMMU types to the userspace and the userspace usually picks the v2. Normally the userspace would create a container, attach an IOMMU group to it and only then set the IOMMU type (which would normally be v2). However a specific IOMMU group may not support v2, in other words it may not implement set_window/unset_window/take_ownership/ release_ownership and such a group should not be attached to a v2 container. This adds extra checks that a new group can do what the selected IOMMU type suggests. The userspace can then test the return value from ioctl(VFIO_SET_IOMMU, VFIO_SPAPR_TCE_v2_IOMMU) and try VFIO_SPAPR_TCE_IOMMU. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2017-03-30powerpc/vfio_spapr_tce: Add reference counting to iommu_tableAlexey Kardashevskiy1-1/+1
So far iommu_table obejcts were only used in virtual mode and had a single owner. We are going to change this by implementing in-kernel acceleration of DMA mapping requests. The proposed acceleration will handle requests in real mode and KVM will keep references to tables. This adds a kref to iommu_table and defines new helpers to update it. This replaces iommu_free_table() with iommu_tce_table_put() and makes iommu_free_table() static. iommu_tce_table_get() is not used in this patch but it will be in the following patch. Since this touches prototypes, this also removes @node_name parameter as it has never been really useful on powernv and carrying it for the pseries platform code to iommu_free_table() seems to be quite useless as well. This should cause no behavioral change. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-30powerpc/iommu/vfio_spapr_tce: Cleanup iommu_table disposalAlexey Kardashevskiy1-1/+1
At the moment iommu_table can be disposed by either calling iommu_table_free() directly or it_ops::free(); the only implementation of free() is in IODA2 - pnv_ioda2_table_free() - and it calls iommu_table_free() anyway. As we are going to have reference counting on tables, we need an unified way of disposing tables. This moves it_ops::free() call into iommu_free_table() and makes use of the latter. The free() callback now handles only platform-specific data. As from now on the iommu_free_table() calls it_ops->free(), we need to have it_ops initialized before calling iommu_free_table() so this moves this initialization in pnv_pci_ioda2_create_table(). This should cause no behavioral change. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-24Merge tag 'vfio-v4.11-rc4' of git://github.com/awilliam/linux-vfioLinus Torvalds1-3/+5
Pull VFIO fix from Alex Williamson: "Rework sanity check for mdev driver group notifier de-registration (Alex Williamson)" * tag 'vfio-v4.11-rc4' of git://github.com/awilliam/linux-vfio: vfio: Rework group release notifier warning
2017-03-22iommu: Disambiguate MSI region typesRobin Murphy1-4/+3
The introduction of reserved regions has left a couple of rough edges which we could do with sorting out sooner rather than later. Since we are not yet addressing the potential dynamic aspect of software-managed reservations and presenting them at arbitrary fixed addresses, it is incongruous that we end up displaying hardware vs. software-managed MSI regions to userspace differently, especially since ARM-based systems may actually require one or the other, or even potentially both at once, (which iommu-dma currently has no hope of dealing with at all). Let's resolve the former user-visible inconsistency ASAP before the ABI has been baked into a kernel release, in a way that also lays the groundwork for the latter shortcoming to be addressed by follow-up patches. For clarity, rename the software-managed type to IOMMU_RESV_SW_MSI, use IOMMU_RESV_MSI to describe the hardware type, and document everything a little bit. Since the x86 MSI remapping hardware falls squarely under this meaning of IOMMU_RESV_MSI, apply that type to their regions as well, so that we tell the same story to userspace across all platforms. Secondly, as the various region types require quite different handling, and it really makes little sense to ever try combining them, convert the bitfield-esque #defines to a plain enum in the process before anyone gets the wrong impression. Fixes: d30ddcaa7b02 ("iommu: Add a new type field in iommu_resv_region") Reviewed-by: Eric Auger <eric.auger@redhat.com> CC: Alex Williamson <alex.williamson@redhat.com> CC: David Woodhouse <dwmw2@infradead.org> CC: kvm@vger.kernel.org Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-03-21vfio: Rework group release notifier warningAlex Williamson1-3/+5
The intent of the original warning is make sure that the mdev vendor driver has removed any group notifiers at the point where the group is closed by the user. Theoretically this would be through an orderly shutdown where any devices are release prior to the group release. We can't always count on an orderly shutdown, the user can close the group before the notifier can be removed or the user task might be killed. We'd like to add this sanity test when the group is idle and the only references are from the devices within the group themselves, but we don't have a good way to do that. Instead check both when the group itself is removed and when the group is opened. A bit later than we'd prefer, but better than the current over aggressive approach. Fixes: ccd46dbae77d ("vfio: support notifier chain in vfio_group") Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Cc: <stable@vger.kernel.org> # v4.10 Cc: Jike Song <jike.song@intel.com>
2017-03-02sched/headers: Prepare for new header dependencies before moving code to ↵Ingo Molnar2-1/+2
<linux/sched/signal.h> We are going to split <linux/sched/signal.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/signal.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02sched/headers: Prepare for new header dependencies before moving code to ↵Ingo Molnar2-0/+3
<linux/sched/mm.h> We are going to split <linux/sched/mm.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/mm.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. The APIs that are going to be moved first are: mm_alloc() __mmdrop() mmdrop() mmdrop_async_fn() mmdrop_async() mmget_not_zero() mmput() mmput_async() get_task_mm() mm_access() mm_release() Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-23Merge tag 'vfio-v4.11-rc1' of git://github.com/awilliam/linux-vfioLinus Torvalds3-25/+6
Pull VFIO updates from Alex Williamson: - Kconfig fixes for SPAPR_TCE_IOMMU=n (Michael Ellerman) - Module softdep rather than request_module to simplify usage from initrd (Alex Williamson) - Comment typo fix (Changbin Du) * tag 'vfio-v4.11-rc1' of git://github.com/awilliam/linux-vfio: vfio: fix a typo in comment of function vfio_pin_pages vfio: Replace module request with softdep vfio/mdev: Use a module softdep for vfio_mdev vfio: Fix build break when SPAPR_TCE_IOMMU=n
2017-02-22vfio: fix a typo in comment of function vfio_pin_pagesChangbin Du1-1/+1
Correct the description that 'unpinned' -> 'pinned'. Signed-off-by: Changbin Du <changbin.du@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2017-02-20Merge tag 'iommu-updates-v4.11' of ↵Linus Torvalds1-2/+38
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull IOMMU UPDATES from Joerg Roedel: - KVM PCIe/MSI passthrough support on ARM/ARM64 - introduction of a core representation for individual hardware iommus - support for IOMMU privileged mappings as supported by some ARM IOMMUS - 16-bit SID support for ARM-SMMUv2 - stream table optimization for ARM-SMMUv3 - various fixes and other small improvements * tag 'iommu-updates-v4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (61 commits) vfio/type1: Fix error return code in vfio_iommu_type1_attach_group() iommu: Remove iommu_register_instance interface iommu/exynos: Make use of iommu_device_register interface iommu/mediatek: Make use of iommu_device_register interface iommu/msm: Make use of iommu_device_register interface iommu/arm-smmu: Make use of the iommu_register interface iommu: Add iommu_device_set_fwnode() interface iommu: Make iommu_device_link/unlink take a struct iommu_device iommu: Add sysfs bindings for struct iommu_device iommu: Introduce new 'struct iommu_device' iommu: Rename struct iommu_device iommu: Rename iommu_get_instance() iommu: Fix static checker warning in iommu_insert_device_resv_regions iommu: Avoid unnecessary assignment of dev->iommu_fwspec iommu/mediatek: Remove bogus 'select' statements iommu/dma: Remove bogus dma_supported() implementation iommu/ipmmu-vmsa: Restrict IOMMU Domain Geometry to 32-bit address space iommu/vt-d: Don't over-free page table directories iommu/vt-d: Tylersburg isoch identity map check is done too late. iommu/vt-d: Fix some macros that are incorrectly specified in intel-iommu ...
2017-02-10Merge branches 'iommu/fixes', 'arm/exynos', 'arm/renesas', 'arm/smmu', ↵Joerg Roedel8-88/+227
'arm/mediatek', 'arm/core', 'x86/vt-d' and 'core' into next
2017-02-10vfio/type1: Fix error return code in vfio_iommu_type1_attach_group()Wei Yongjun1-2/+5
Fix to return a negative error code from the error handling case instead of 0, as done elsewhere in this function. Fixes: 5d704992189f ("vfio/type1: Allow transparent MSI IOVA allocation") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-02-09vfio: Replace module request with softdepAlex Williamson1-8/+1
Rather than doing a module request from within the init function, add a soft dependency on the available IOMMU backend drivers. This makes the dependency visible to userspace when picking modules for the ram disk. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2017-02-08vfio/mdev: Use a module softdep for vfio_mdevAlex Williamson1-12/+2
Use an explicit module softdep rather than a request module call such that the dependency is exposed to userspace. This allows us to more easily support modules loaded at initrd time. Reviewed by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2017-02-08vfio: Fix build break when SPAPR_TCE_IOMMU=nMichael Ellerman1-4/+2
Currently the kconfig logic for VFIO_IOMMU_SPAPR_TCE and VFIO_SPAPR_EEH is broken when SPAPR_TCE_IOMMU=n. Leading to: warning: (VFIO) selects VFIO_IOMMU_SPAPR_TCE which has unmet direct dependencies (VFIO && SPAPR_TCE_IOMMU) warning: (VFIO) selects VFIO_IOMMU_SPAPR_TCE which has unmet direct dependencies (VFIO && SPAPR_TCE_IOMMU) drivers/vfio/vfio_iommu_spapr_tce.c:113:8: error: implicit declaration of function 'mm_iommu_find' This stems from the fact that VFIO selects VFIO_IOMMU_SPAPR_TCE, and although it has an if clause, the condition is not correct. We could fix it by doing select VFIO_IOMMU_SPAPR_TCE if SPAPR_TCE_IOMMU, but the cleaner fix is to drop the selects and tie VFIO_IOMMU_SPAPR_TCE to the value of VFIO, and express the dependencies in only once place. Do the same for VFIO_SPAPR_EEH. The end result is that the values of VFIO_IOMMU_SPAPR_TCE and VFIO_SPAPR_EEH follow the value of VFIO, except when SPAPR_TCE_IOMMU=n and/or EEH=n. Which is exactly what we want to happen. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2017-02-07vfio/spapr_tce: Set window when adding additional groups to containerAlexey Kardashevskiy1-0/+22
If a container already has a group attached, attaching a new group should just program already created IOMMU tables to the hardware via the iommu_table_group_ops::set_window() callback. However commit 6f01cc692a16 ("vfio/spapr: Add a helper to create default DMA window") did not just simplify the code but also removed the set_window() calls in the case of attaching groups to a container which already has tables so it broke VFIO PCI hotplug. This reverts set_window() bits in tce_iommu_take_ownership_ddw(). Fixes: 6f01cc692a16 ("vfio/spapr: Add a helper to create default DMA window") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2017-02-01vfio/spapr: Fix missing mutex unlock when creating a windowAlexey Kardashevskiy1-6/+5
Commit d9c728949ddc ("vfio/spapr: Postpone default window creation") added an additional exit to the VFIO_IOMMU_SPAPR_TCE_CREATE case and made it possible to return from tce_iommu_ioctl() without unlocking container->lock; this fixes the issue. Fixes: d9c728949ddc ("vfio/spapr: Postpone default window creation") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2017-01-30Merge branch 'iommu/guest-msi' of ↵Joerg Roedel1-2/+35
git://git.kernel.org/pub/scm/linux/kernel/git/will/linux into arm/core
2017-01-24vfio/spapr: fail tce_iommu_attach_group() when iommu_data is nullGreg Kurz1-0/+4
The recently added mediated VFIO driver doesn't know about powerpc iommu. It thus doesn't register a struct iommu_table_group in the iommu group upon device creation. The iommu_data pointer hence remains null. This causes a kernel oops when userspace tries to set the iommu type of a container associated with a mediated device to VFIO_SPAPR_TCE_v2_IOMMU. [ 82.585440] mtty mtty: MDEV: Registered [ 87.655522] iommu: Adding device 83b8f4f2-509f-382f-3c1e-e6bfe0fa1001 to group 10 [ 87.655527] vfio_mdev 83b8f4f2-509f-382f-3c1e-e6bfe0fa1001: MDEV: group_id = 10 [ 116.297184] Unable to handle kernel paging request for data at address 0x00000030 [ 116.297389] Faulting instruction address: 0xd000000007870524 [ 116.297465] Oops: Kernel access of bad area, sig: 11 [#1] [ 116.297611] SMP NR_CPUS=2048 [ 116.297611] NUMA [ 116.297627] PowerNV ... [ 116.297954] CPU: 33 PID: 7067 Comm: qemu-system-ppc Not tainted 4.10.0-rc5-mdev-test #8 [ 116.297993] task: c000000e7718b680 task.stack: c000000e77214000 [ 116.298025] NIP: d000000007870524 LR: d000000007870518 CTR: 0000000000000000 [ 116.298064] REGS: c000000e77217990 TRAP: 0300 Not tainted (4.10.0-rc5-mdev-test) [ 116.298103] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> [ 116.298107] CR: 84004444 XER: 00000000 [ 116.298154] CFAR: c00000000000888c DAR: 0000000000000030 DSISR: 40000000 SOFTE: 1 GPR00: d000000007870518 c000000e77217c10 d00000000787b0ed c000000eed2103c0 GPR04: 0000000000000000 0000000000000000 c000000eed2103e0 0000000f24320000 GPR08: 0000000000000104 0000000000000001 0000000000000000 d0000000078729b0 GPR12: c00000000025b7e0 c00000000fe08400 0000000000000001 000001002d31d100 GPR16: 000001002c22c850 00003ffff315c750 0000000043145680 0000000043141bc0 GPR20: ffffffffffffffed fffffffffffff000 0000000020003b65 d000000007706018 GPR24: c000000f16cf0d98 d000000007706000 c000000003f42980 c000000003f42980 GPR28: c000000f1575ac00 c000000003f429c8 0000000000000000 c000000eed2103c0 [ 116.298504] NIP [d000000007870524] tce_iommu_attach_group+0x10c/0x360 [vfio_iommu_spapr_tce] [ 116.298555] LR [d000000007870518] tce_iommu_attach_group+0x100/0x360 [vfio_iommu_spapr_tce] [ 116.298601] Call Trace: [ 116.298610] [c000000e77217c10] [d000000007870518] tce_iommu_attach_group+0x100/0x360 [vfio_iommu_spapr_tce] (unreliable) [ 116.298671] [c000000e77217cb0] [d0000000077033a0] vfio_fops_unl_ioctl+0x278/0x3e0 [vfio] [ 116.298713] [c000000e77217d40] [c0000000002a3ebc] do_vfs_ioctl+0xcc/0x8b0 [ 116.298745] [c000000e77217de0] [c0000000002a4700] SyS_ioctl+0x60/0xc0 [ 116.298782] [c000000e77217e30] [c00000000000b220] system_call+0x38/0xfc [ 116.298812] Instruction dump: [ 116.298828] 7d3f4b78 409effc8 3d220000 e9298020 3c800140 38a00018 608480c0 e8690028 [ 116.298869] 4800249d e8410018 7c7f1b79 41820230 <e93e0030> 2fa90000 419e0114 e9090020 [ 116.298914] ---[ end trace 1e10b0ced08b9120 ]--- This patch fixes the oops. Reported-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com> Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2017-01-23vfio/type1: Check MSI remapping at irq domain levelEric Auger1-3/+6
In case the IOMMU translates MSI transactions (typical case on ARM), we check MSI remapping capability at IRQ domain level. Otherwise it is checked at IOMMU level. At this stage the arm-smmu-(v3) still advertise the IOMMU_CAP_INTR_REMAP capability at IOMMU level. This will be removed in subsequent patches. Signed-off-by: Eric Auger <eric.auger@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Bharat Bhushan <bharat.bhushan@nxp.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-01-23vfio/type1: Allow transparent MSI IOVA allocationEric Auger1-0/+30
When attaching a group to the container, check the group's reserved regions and test whether the IOMMU translates MSI transactions. If yes, we initialize an IOVA allocator through the iommu_get_msi_cookie API. This will allow the MSI IOVAs to be transparently allocated on MSI controller's compose(). Signed-off-by: Eric Auger <eric.auger@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Bharat Bhushan <bharat.bhushan@nxp.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-01-13vfio/type1: Remove pid_namespace.h includeAlex Williamson1-1/+0
Using has_capability() rather than ns_capable(), we're no longer using this header. Cc: Jike Song <jike.song@intel.com> Cc: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2017-01-12vfio iommu type1: fix the testing of capability for remote taskJike Song1-2/+1
Before the mdev enhancement type1 iommu used capable() to test the capability of current task; in the course of mdev development a new requirement, testing for another task other than current, was raised. ns_capable() was used for this purpose, however it still tests current, the only difference is, in a specified namespace. Fix it by using has_capability() instead, which tests the cap for specified task in init_user_ns, the same namespace as capable(). Cc: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Jike Song <jike.song@intel.com> Reviewed-by: James Morris <james.l.morris@oracle.com> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2017-01-04vfio-pci: Handle error from pci_iomapArvind Yadav1-0/+4
Here, pci_iomap can fail, handle this case release selected pci regions and return -ENOMEM. Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-12-30vfio-pci: use 32-bit comparisons for register address for gcc-4.5Arnd Bergmann1-1/+4
Using ancient compilers (gcc-4.5 or older) on ARM, we get a link failure with the vfio-pci driver: ERROR: "__aeabi_lcmp" [drivers/vfio/pci/vfio-pci.ko] undefined! The reason is that the compiler tries to do a comparison of a 64-bit range. This changes it to convert to a 32-bit number explicitly first, as newer compilers do for themselves. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-12-30vfio-mdev: Make mdev_device private and abstract interfacesAlex Williamson2-0/+43
Abstract access to mdev_device so that we can define which interfaces are public rather than relying on comments in the structure. Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Zhi Wang <zhi.a.wang@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Jike Song <jike.song@intel.com> Reviewed by: Kirti Wankhede <kwankhede@nvidia.com>
2016-12-30vfio-mdev: Make mdev_parent privateAlex Williamson2-0/+16
Rather than hoping for good behavior by marking some elements internal, enforce it by making the entire structure private and creating an accessor function for the one useful external field. Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Zhi Wang <zhi.a.wang@intel.com> Cc: Jike Song <jike.song@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed by: Kirti Wankhede <kwankhede@nvidia.com>
2016-12-30vfio-mdev: de-polute the namespace, rename parent_device & parent_opsAlex Williamson4-27/+27
Add an mdev_ prefix so we're not poluting the namespace so much. Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Zhi Wang <zhi.a.wang@intel.com> Cc: Jike Song <jike.song@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed by: Kirti Wankhede <kwankhede@nvidia.com>
2016-12-30vfio-mdev: Fix remove raceAlex Williamson1-2/+34
Using the mtty mdev sample driver we can generate a remove race by starting one shell that continuously creates mtty devices and several other shells all attempting to remove devices, in my case four remove shells. The fault occurs in mdev_remove_sysfs_files() where the passed type arg is NULL, which suggests we've received a struct device in mdev_device_remove() but it's in some sort of teardown state. The solution here is to make use of the accidentally unused list_head on the mdev_device such that the mdev core keeps a list of all the mdev devices. This allows us to validate that we have a valid mdev before we start removal, remove it from the list to prevent others from working on it, and if the vendor driver refuses to remove, we can re-add it to the list. Cc: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-12-30vfio/type1: Restore mapping performance with mdev supportAlex Williamson1-47/+51
As part of the mdev support, type1 now gets a task reference per vfio_dma and uses that to get an mm reference for the task while working on accounting. That's correct, but it's not fast. For some paths, like vfio_pin_pages_remote(), we know we're only called from user context, so we can restore the lighter weight calls. In other cases, we're effectively already testing whether we're in the stored task context elsewhere, extend this vfio_lock_acct() as well. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed by: Kirti Wankhede <kwankhede@nvidia.com>
2016-12-16Merge tag 'powerpc-4.10-1' of ↵Linus Torvalds1-111/+217
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "Highlights include: - Support for the kexec_file_load() syscall, which is a prereq for secure and trusted boot. - Prevent kernel execution of userspace on P9 Radix (similar to SMEP/PXN). - Sort the exception tables at build time, to save time at boot, and store them as relative offsets to save space in the kernel image & memory. - Allow building the kernel with thin archives, which should allow us to build an allyesconfig once some other fixes land. - Build fixes to allow us to correctly rebuild when changing the kernel endian from big to little or vice versa. - Plumbing so that we can avoid doing a full mm TLB flush on P9 Radix. - Initial stack protector support (-fstack-protector). - Support for dumping the radix (aka. Linux) and hash page tables via debugfs. - Fix an oops in cxl coredump generation when cxl_get_fd() is used. - Freescale updates from Scott: "Highlights include 8xx hugepage support, qbman fixes/cleanup, device tree updates, and some misc cleanup." - Many and varied fixes and minor enhancements as always. Thanks to: Alexey Kardashevskiy, Andrew Donnellan, Aneesh Kumar K.V, Anshuman Khandual, Anton Blanchard, Balbir Singh, Bartlomiej Zolnierkiewicz, Christophe Jaillet, Christophe Leroy, Denis Kirjanov, Elimar Riesebieter, Frederic Barrat, Gautham R. Shenoy, Geliang Tang, Geoff Levand, Jack Miller, Johan Hovold, Lars-Peter Clausen, Libin, Madhavan Srinivasan, Michael Neuling, Nathan Fontenot, Naveen N. Rao, Nicholas Piggin, Pan Xinhui, Peter Senna Tschudin, Rashmica Gupta, Rui Teng, Russell Currey, Scott Wood, Simon Guo, Suraj Jitindar Singh, Thiago Jung Bauermann, Tobias Klauser, Vaibhav Jain" [ And thanks to Michael, who took time off from a new baby to get this pull request done. - Linus ] * tag 'powerpc-4.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (174 commits) powerpc/fsl/dts: add FMan node for t1042d4rdb powerpc/fsl/dts: add sg_2500_aqr105_phy4 alias on t1024rdb powerpc/fsl/dts: add QMan and BMan nodes on t1024 powerpc/fsl/dts: add QMan and BMan nodes on t1023 soc/fsl/qman: test: use DEFINE_SPINLOCK() powerpc/fsl-lbc: use DEFINE_SPINLOCK() powerpc/8xx: Implement support of hugepages powerpc: get hugetlbpage handling more generic powerpc: port 64 bits pgtable_cache to 32 bits powerpc/boot: Request no dynamic linker for boot wrapper soc/fsl/bman: Use resource_size instead of computation soc/fsl/qe: use builtin_platform_driver powerpc/fsl_pmc: use builtin_platform_driver powerpc/83xx/suspend: use builtin_platform_driver powerpc/ftrace: Fix the comments for ftrace_modify_code powerpc/perf: macros for power9 format encoding powerpc/perf: power9 raw event format encoding powerpc/perf: update attribute_group data structure powerpc/perf: factor out the event format field powerpc/mm/iommu, vfio/spapr: Put pages on VFIO container shutdown ...
2016-12-15Merge tag 'pci-v4.10-changes' of ↵Linus Torvalds1-2/+0
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "PCI changes: - add support for PCI on ARM64 boxes with ACPI. We already had this for theoretical spec-compliant hardware; now we're adding quirks for the actual hardware (Cavium, HiSilicon, Qualcomm, X-Gene) - add runtime PM support for hotplug ports - enable runtime suspend for Intel UHCI that uses platform-specific wakeup signaling - add yet another host bridge registration interface. We hope this is extensible enough to subsume the others - expose device revision in sysfs for DRM - to avoid device conflicts, make sure any VF BAR updates are done before enabling the VF - avoid unnecessary link retrains for ASPM - allow INTx masking on Mellanox devices that support it - allow access to non-standard VPD for Chelsio devices - update Broadcom iProc support for PAXB v2, PAXC v2, inbound DMA, etc - update Rockchip support for max-link-speed - add NVIDIA Tegra210 support - add Layerscape LS1046a support - update R-Car compatibility strings - add Qualcomm MSM8996 support - remove some uninformative bootup messages" * tag 'pci-v4.10-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (115 commits) PCI: Enable access to non-standard VPD for Chelsio devices (cxgb3) PCI: Expand "VPD access disabled" quirk message PCI: pciehp: Remove loading message PCI: hotplug: Remove hotplug core message PCI: Remove service driver load/unload messages PCI/AER: Log AER IRQ when claiming Root Port PCI/AER: Log errors with PCI device, not PCIe service device PCI/AER: Remove unused version macros PCI/PME: Log PME IRQ when claiming Root Port PCI/PME: Drop unused support for PMEs from Root Complex Event Collectors PCI: Move config space size macros to pci_regs.h x86/platform/intel-mid: Constify mid_pci_platform_pm PCI/ASPM: Don't retrain link if ASPM not possible PCI: iproc: Skip check for legacy IRQ on PAXC buses PCI: pciehp: Leave power indicator on when enabling already-enabled slot PCI: pciehp: Prioritize data-link event over presence detect PCI: rcar: Add gen3 fallback compatibility string for pcie-rcar PCI: rcar: Use gen2 fallback compatibility last PCI: rcar-gen2: Use gen2 fallback compatibility last PCI: rockchip: Move the deassert of pm/aclk/pclk after phy_init() ..
2016-12-14mm: add locked parameter to get_user_pages_remote()Lorenzo Stoakes1-1/+1
Patch series "mm: unexport __get_user_pages_unlocked()". This patch series continues the cleanup of get_user_pages*() functions taking advantage of the fact we can now pass gup_flags as we please. It firstly adds an additional 'locked' parameter to get_user_pages_remote() to allow for its callers to utilise VM_FAULT_RETRY functionality. This is necessary as the invocation of __get_user_pages_unlocked() in process_vm_rw_single_vec() makes use of this and no other existing higher level function would allow it to do so. Secondly existing callers of __get_user_pages_unlocked() are replaced with the appropriate higher-level replacement - get_user_pages_unlocked() if the current task and memory descriptor are referenced, or get_user_pages_remote() if other task/memory descriptors are referenced (having acquiring mmap_sem.) This patch (of 2): Add a int *locked parameter to get_user_pages_remote() to allow VM_FAULT_RETRY faulting behaviour similar to get_user_pages_[un]locked(). Taking into account the previous adjustments to get_user_pages*() functions allowing for the passing of gup_flags, we are now in a position where __get_user_pages_unlocked() need only be exported for his ability to allow VM_FAULT_RETRY behaviour, this adjustment allows us to subsequently unexport __get_user_pages_unlocked() as well as allowing for future flexibility in the use of get_user_pages_remote(). [sfr@canb.auug.org.au: merge fix for get_user_pages_remote API change] Link: http://lkml.kernel.org/r/20161122210511.024ec341@canb.auug.org.au Link: http://lkml.kernel.org/r/20161027095141.2569-2-lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Jan Kara <jack@suse.cz> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-12PCI: Move config space size macros to pci_regs.hWang Sheng-Hui1-2/+0
Move PCI configuration space size macros (PCI_CFG_SPACE_SIZE and PCI_CFG_SPACE_EXP_SIZE) from drivers/pci/pci.h to include/uapi/linux/pci_regs.h so they can be used by more drivers and eliminate duplicate definitions. [bhelgaas: Expand comment to include PCI-X details] Signed-off-by: Wang Sheng-Hui <shhuiw@foxmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-12-06vfio iommu type1: Fix size argument to vfio_find_dma() in pin_pages/unpin_pagesKirti Wankhede1-3/+3
Passing zero for the size to vfio_find_dma() isn't compatible with matching the start address of an existing vfio_dma. Doing so triggers a corner case. In vfio_find_dma(), when the start address is equal to dma->iova and size is 0, check for the end of search range makes it to take wrong side of RB-tree. That fails the search even though the address is present in mapped dma ranges. In functions pin_pages and unpin_pages, the iova which is being searched is base address of page to be pinned or unpinned. So here size should be set to PAGE_SIZE, as argument to vfio_find_dma(). Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-12-06vfio iommu type1: Fix size argument to vfio_find_dma() during DMA UNMAP.Kirti Wankhede1-1/+1
Passing zero for the size to vfio_find_dma() isn't compatible with matching the start address of an existing vfio_dma. Doing so triggers a corner case. In vfio_find_dma(), when the start address is equal to dma->iova and size is 0, check for the end of search range makes it to take wrong side of RB-tree. That fails the search even though the address is present in mapped dma ranges. Due to this, in vfio_dma_do_unmap(), while checking boundary conditions, size should be set to 1 for verifying start address of unmap range. vfio_find_dma() is also used to verify last address in unmap range with size = 0, but in that case address to be searched is calculated with start + size - 1 and so it works correctly. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> [aw: changelog tweak] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-12-05vfio iommu type1: WARN_ON if notifier block is not unregisteredKirti Wankhede1-0/+2
mdev vendor driver should unregister the iommu notifier since the vfio iommu can persist beyond the attachment of the mdev group. WARN_ON will show warning if vendor driver doesn't unregister the notifier and is forced to follow the implementations steps. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-12-02powerpc/mm/iommu, vfio/spapr: Put pages on VFIO container shutdownAlexey Kardashevskiy1-1/+60
At the moment the userspace tool is expected to request pinning of the entire guest RAM when VFIO IOMMU SPAPR v2 driver is present. When the userspace process finishes, all the pinned pages need to be put; this is done as a part of the userspace memory context (MM) destruction which happens on the very last mmdrop(). This approach has a problem that a MM of the userspace process may live longer than the userspace process itself as kernel threads use userspace process MMs which was runnning on a CPU where the kernel thread was scheduled to. If this happened, the MM remains referenced until this exact kernel thread wakes up again and releases the very last reference to the MM, on an idle system this can take even hours. This moves preregistered regions tracking from MM to VFIO; insteads of using mm_iommu_table_group_mem_t::used, tce_container::prereg_list is added so each container releases regions which it has pre-registered. This changes the userspace interface to return EBUSY if a memory region is already registered in a container. However it should not have any practical effect as the only userspace tool available now does register memory region once per container anyway. As tce_iommu_register_pages/tce_iommu_unregister_pages are called under container->lock, this does not need additional locking. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-12-02vfio/spapr: Reference mm in tce_containerAlexey Kardashevskiy1-60/+100
In some situations the userspace memory context may live longer than the userspace process itself so if we need to do proper memory context cleanup, we better have tce_container take a reference to mm_struct and use it later when the process is gone (@current or @current->mm is NULL). This references mm and stores the pointer in the container; this is done in a new helper - tce_iommu_mm_set() - when one of the following happens: - a container is enabled (IOMMU v1); - a first attempt to pre-register memory is made (IOMMU v2); - a DMA window is created (IOMMU v2). The @mm stays referenced till the container is destroyed. This replaces current->mm with container->mm everywhere except debug prints. This adds a check that current->mm is the same as the one stored in the container to prevent userspace from making changes to a memory context of other processes. DMA map/unmap ioctls() do not check for @mm as they already check for @enabled which is set after tce_iommu_mm_set() is called. This does not reference a task as multiple threads within the same mm are allowed to ioctl() to vfio and supposedly they will have same limits and capabilities and if they do not, we'll just fail with no harm made. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-12-02vfio/spapr: Postpone default window creationAlexey Kardashevskiy1-15/+25
We are going to allow the userspace to configure container in one memory context and pass container fd to another so we are postponing memory allocations accounted against the locked memory limit. One of previous patches took care of it_userspace. At the moment we create the default DMA window when the first group is attached to a container; this is done for the userspace which is not DDW-aware but familiar with the SPAPR TCE IOMMU v2 in the part of memory pre-registration - such client expects the default DMA window to exist. This postpones the default DMA window allocation till one of the folliwing happens: 1. first map/unmap request arrives; 2. new window is requested; This adds noop for the case when the userspace requested removal of the default window which has not been created yet. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-12-02vfio/spapr: Add a helper to create default DMA windowAlexey Kardashevskiy1-45/+42
There is already a helper to create a DMA window which does allocate a table and programs it to the IOMMU group. However tce_iommu_take_ownership_ddw() did not use it and did these 2 calls itself to simplify error path. Since we are going to delay the default window creation till the default window is accessed/removed or new window is added, we need a helper to create a default window from all these cases. This adds tce_iommu_create_default_window(). Since it relies on a VFIO container to have at least one IOMMU group (for future use), this changes tce_iommu_attach_group() to add a group to the container first and then call the new helper. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-12-02vfio/spapr: Postpone allocation of userspace version of TCE tableAlexey Kardashevskiy1-13/+7
The iommu_table struct manages a hardware TCE table and a vmalloc'd table with corresponding userspace addresses. Both are allocated when the default DMA window is created and this happens when the very first group is attached to a container. As we are going to allow the userspace to configure container in one memory context and pas container fd to another, we have to postpones such allocations till a container fd is passed to the destination user process so we would account locked memory limit against the actual container user constrainsts. This postpones the it_userspace array allocation till it is used first time for mapping. The unmapping patch already checks if the array is allocated. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-12-02powerpc/iommu: Stop using @current in mm_iommu_xxxAlexey Kardashevskiy1-4/+10
This changes mm_iommu_xxx helpers to take mm_struct as a parameter instead of getting it from @current which in some situations may not have a valid reference to mm. This changes helpers to receive @mm and moves all references to @current to the caller, including checks for !current and !current->mm; checks in mm_iommu_preregistered() are removed as there is no caller yet. This moves the mm_iommu_adjust_locked_vm() call to the caller as it receives mm_iommu_table_group_mem_t but it needs mm. This should cause no behavioral change. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>