summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2022-10-04vfio/mdev: embedd struct mdev_parent in the parent data structureChristoph Hellwig17-146/+71
Simplify mdev_{un}register_device by requiring the caller to pass in a structure allocate as part of the parent device structure. This removes the need for a list of parents and the separate mdev_parent refcount as we can simplify rely on the reference to the parent device. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Eric Farman <farman@linux.ibm.com> Link: https://lore.kernel.org/r/20220923092652.100656-5-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-10-04vfio/mdev: make mdev.h standalone includableChristoph Hellwig10-13/+3
Include <linux/device.h> and <linux/uuid.h> so that users of this headers don't need to do that and remove those includes that aren't needed any more. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Link: https://lore.kernel.org/r/20220923092652.100656-4-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-10-04drm/i915/gvt: simplify vgpu configuration managementChristoph Hellwig4-145/+81
Instead of copying the information from the vgpu_types arrays into each intel_vgpu_type structure, just reference this constant information with a pointer to the already existing data structure, and pass it into the low-level VGPU creation helpers intead of copying the data into yet anothe params data structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com> Link: https://lore.kernel.org/r/20220923092652.100656-3-hch@lst.de [aw: Fold fix from 20220928121110.GA30738@lst.de] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-10-04drm/i915/gvt: fix a memory leak in intel_gvt_init_vgpu_typesChristoph Hellwig1-1/+5
gvt->types needs to be freed on error. Fixes: bc90d097ae14 ("drm/i915/gvt: define weight according to vGPU type") Reported-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com> Link: https://lore.kernel.org/r/20220923092652.100656-2-hch@lst.de [aw: Correct fixes commit ID as reported by Stephen Rothwell] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-30drm/i915/gvt: Add missing vfio_unregister_group_dev() callJason Gunthorpe1-0/+1
When converting to directly create the vfio_device the mdev driver has to put a vfio_register_emulated_iommu_dev() in the probe() and a pairing vfio_unregister_group_dev() in the remove. This was missed for gvt, add it. Cc: stable@vger.kernel.org Fixes: 978cf586ac35 ("drm/i915/gvt: convert to use vfio_register_emulated_iommu_dev") Reported-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/0-v1-013609965fe8+9d-vfio_gvt_unregister_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-27hisi_acc_vfio_pci: Update some log and comment formatsLongfang Liu2-12/+12
1. Modify some annotation information formats to keep the entire driver annotation format consistent. 2. Modify some log description formats to be consistent with the format of the entire driver log. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Longfang Liu <liulongfang@huawei.com> Link: https://lore.kernel.org/r/20220926093332.28824-6-liulongfang@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-27hisi_acc_vfio_pci: Remove useless macro definitionsLongfang Liu1-1/+0
The QM_QUE_ISO_CFG macro definition is no longer used and needs to be deleted from the current driver. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Longfang Liu <liulongfang@huawei.com> Link: https://lore.kernel.org/r/20220926093332.28824-5-liulongfang@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-27hisi_acc_vfio_pci: Remove useless function parameterLongfang Liu1-3/+5
Remove unused function parameters for vf_qm_fun_reset() and ensure the device is enabled before the reset operation is performed. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Longfang Liu <liulongfang@huawei.com> Link: https://lore.kernel.org/r/20220926093332.28824-4-liulongfang@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-27hisi_acc_vfio_pci: Fix device data address combination problemLongfang Liu1-4/+4
The queue address of the accelerator device should be combined into a dma address in a way of combining the low and high bits. The previous combination is wrong and needs to be modified. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Longfang Liu <liulongfang@huawei.com> Link: https://lore.kernel.org/r/20220926093332.28824-3-liulongfang@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-27hisi_acc_vfio_pci: Fixes error return code issueLongfang Liu1-1/+1
During the process of compatibility and matching of live migration device information, if the isolation status of the two devices is inconsistent, the live migration needs to be exited. The current driver does not return the error code correctly and needs to be fixed. Reviewed-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Longfang Liu <liulongfang@huawei.com> Link: https://lore.kernel.org/r/20220926093332.28824-2-liulongfang@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-27vfio: Follow a strict lifetime for struct iommu_groupJason Gunthorpe2-23/+53
The iommu_group comes from the struct device that a driver has been bound to and then created a struct vfio_device against. To keep the iommu layer sane we want to have a simple rule that only an attached driver should be using the iommu API. Particularly only an attached driver should hold ownership. In VFIO's case since it uses the group APIs and it shares between different drivers it is a bit more complicated, but the principle still holds. Solve this by waiting for all users of the vfio_group to stop before allowing vfio_unregister_group_dev() to complete. This is done with a new completion to know when the users go away and an additional refcount to keep track of how many device drivers are sharing the vfio group. The last driver to be unregistered will clean up the group. This solves crashes in the S390 iommu driver that come because VFIO ends up racing releasing ownership (which attaches the default iommu_domain to the device) with the removal of that same device from the iommu driver. This is a side case that iommu drivers should not have to cope with. iommu driver failed to attach the default/blocking domain WARNING: CPU: 0 PID: 5082 at drivers/iommu/iommu.c:1961 iommu_detach_group+0x6c/0x80 Modules linked in: macvtap macvlan tap vfio_pci vfio_pci_core irqbypass vfio_virqfd kvm nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink mlx5_ib sunrpc ib_uverbs ism smc uvdevice ib_core s390_trng eadm_sch tape_3590 tape tape_class vfio_ccw mdev vfio_iommu_type1 vfio zcrypt_cex4 sch_fq_codel configfs ghash_s390 prng chacha_s390 libchacha aes_s390 mlx5_core des_s390 libdes sha3_512_s390 nvme sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 sha_common nvme_core zfcp scsi_transport_fc pkey zcrypt rng_core autofs4 CPU: 0 PID: 5082 Comm: qemu-system-s39 Tainted: G W 6.0.0-rc3 #5 Hardware name: IBM 3931 A01 782 (LPAR) Krnl PSW : 0704c00180000000 000000095bb10d28 (iommu_detach_group+0x70/0x80) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 Krnl GPRS: 0000000000000001 0000000900000027 0000000000000039 000000095c97ffe0 00000000fffeffff 00000009fc290000 00000000af1fda50 00000000af590b58 00000000af1fdaf0 0000000135c7a320 0000000135e52258 0000000135e52200 00000000a29e8000 00000000af590b40 000000095bb10d24 0000038004b13c98 Krnl Code: 000000095bb10d18: c020003d56fc larl %r2,000000095c2bbb10 000000095bb10d1e: c0e50019d901 brasl %r14,000000095be4bf20 #000000095bb10d24: af000000 mc 0,0 >000000095bb10d28: b904002a lgr %r2,%r10 000000095bb10d2c: ebaff0a00004 lmg %r10,%r15,160(%r15) 000000095bb10d32: c0f4001aa867 brcl 15,000000095be65e00 000000095bb10d38: c004002168e0 brcl 0,000000095bf3def8 000000095bb10d3e: eb6ff0480024 stmg %r6,%r15,72(%r15) Call Trace: [<000000095bb10d28>] iommu_detach_group+0x70/0x80 ([<000000095bb10d24>] iommu_detach_group+0x6c/0x80) [<000003ff80243b0e>] vfio_iommu_type1_detach_group+0x136/0x6c8 [vfio_iommu_type1] [<000003ff80137780>] __vfio_group_unset_container+0x58/0x158 [vfio] [<000003ff80138a16>] vfio_group_fops_unl_ioctl+0x1b6/0x210 [vfio] pci 0004:00:00.0: Removing from iommu group 4 [<000000095b5b62e8>] __s390x_sys_ioctl+0xc0/0x100 [<000000095be5d3b4>] __do_syscall+0x1d4/0x200 [<000000095be6c072>] system_call+0x82/0xb0 Last Breaking-Event-Address: [<000000095be4bf80>] __warn_printk+0x60/0x68 It indicates that domain->ops->attach_dev() failed because the driver has already passed the point of destructing the device. Fixes: 9ac8545199a1 ("iommu: Fix use-after-free in iommu_release_device") Reported-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/0-v2-a3c5f4429e2a+55-iommu_group_lifetime_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-22vfio: Move container code into drivers/vfio/container.cJason Gunthorpe4-691/+728
All the functions that dereference struct vfio_container are moved into container.c. Simple code motion, no functional change. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/8-v3-297af71838d2+b9-vfio_container_split_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-22vfio: Split the register_device ops call into functionsJason Gunthorpe1-16/+23
This is a container item. A following patch will move the vfio_container functions to their own .c file. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/7-v3-297af71838d2+b9-vfio_container_split_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-22vfio: Rename vfio_ioctl_check_extension()Jason Gunthorpe1-5/+6
To vfio_container_ioctl_check_extension(). A following patch will turn this into a non-static function, make it clear it is related to the container. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/6-v3-297af71838d2+b9-vfio_container_split_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-22vfio: Split out container code from the init/cleanup functionsJason Gunthorpe1-18/+36
This miscdev, noiommu driver and a couple of globals are all container items. Move this init into its own functions. A following patch will move the vfio_container functions to their own .c file. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v3-297af71838d2+b9-vfio_container_split_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-22vfio: Remove #ifdefs around CONFIG_VFIO_NOIOMMUJason Gunthorpe1-24/+19
This can all be accomplished using typical IS_ENABLED techniques, drop it all. Also rename the variable to vfio_noiommu so this can be made global in following patches. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/4-v3-297af71838d2+b9-vfio_container_split_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-22vfio: Split the container logic into vfio_container_attach_group()Jason Gunthorpe1-30/+48
This splits up the ioctl of vfio_group_ioctl_set_container() so it determines the type of file then invokes a type specific attachment function. Future patches will add iommufd to this function as an alternative type. A following patch will move the vfio_container functions to their own .c file. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v3-297af71838d2+b9-vfio_container_split_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-22vfio: Rename __vfio_group_unset_container()Jason Gunthorpe1-6/+5
To vfio_group_detach_container(). This function is really a container function. Fold the WARN_ON() into it as a precondition assertion. A following patch will move the vfio_container functions to their own .c file. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/2-v3-297af71838d2+b9-vfio_container_split_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-22vfio: Add header guards and includes to drivers/vfio/vfio.hJason Gunthorpe1-0/+10
As is normal for headers. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v3-297af71838d2+b9-vfio_container_split_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-21vfio: Add struct device to vfio_deviceYi Liu4-14/+65
and replace kref. With it a 'vfio-dev/vfioX' node is created under the sysfs path of the parent, indicating the device is bound to a vfio driver, e.g.: /sys/devices/pci0000\:6f/0000\:6f\:01.0/vfio-dev/vfio0 It is also a preparatory step toward adding cdev for supporting future device-oriented uAPI. Add Documentation/ABI/testing/sysfs-devices-vfio-dev. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20220921104401.38898-16-kevin.tian@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-21vfio: Rename vfio_device_put() and vfio_device_try_get()Kevin Tian1-8/+9
With the addition of vfio_put_device() now the names become confusing. vfio_put_device() is clear from object life cycle p.o.v given kref. vfio_device_put()/vfio_device_try_get() are helpers for tracking users on a registered device. Now rename them: - vfio_device_put() -> vfio_device_put_registration() - vfio_device_try_get() -> vfio_device_try_get_registration() Signed-off-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/20220921104401.38898-15-kevin.tian@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-21vfio/ccw: Use the new device life cycle helpersKevin Tian4-28/+53
ccw is the only exception which cannot use vfio_alloc_device() because its private device structure is designed to serve both mdev and parent. Life cycle of the parent is managed by css_driver so vfio_ccw_private must be allocated/freed in css_driver probe/remove path instead of conforming to vfio core life cycle for mdev. Given that use a wait/completion scheme so the mdev remove path waits after vfio_put_device() until receiving a completion notification from @release. The completion indicates that all active references on vfio_device have been released. After that point although free of vfio_ccw_private is delayed to css_driver it's at least guaranteed to have no parallel reference on released vfio device part from other code paths. memset() in @probe is removed. vfio_device is either already cleared when probed for the first time or cleared in @release from last probe. The right fix is to introduce separate structures for mdev and parent, but this won't happen in short term per prior discussions. Remove vfio_init/uninit_group_dev() as no user now. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Eric Farman <farman@linux.ibm.com> Link: https://lore.kernel.org/r/20220921104401.38898-14-kevin.tian@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-21vfio/amba: Use the new device life cycle helpersKevin Tian3-80/+55
Implement amba's own vfio_device_ops. Remove vfio_platform_probe/remove_common() given no user now. Signed-off-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/20220921104401.38898-13-kevin.tian@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-21vfio/platform: Use the new device life cycle helpersKevin Tian3-23/+111
Move vfio_device_ops from platform core to platform drivers so device specific init/cleanup can be added. Introduce two new helpers vfio_platform_init/release_common() for the use in driver @init/@release. vfio_platform_probe/remove_common() will be deprecated. Signed-off-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/20220921104401.38898-12-kevin.tian@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-21vfio/fsl-mc: Use the new device life cycle helpersYi Liu1-36/+49
Also add a comment to mark that vfio core releases device_set if @init fails. Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20220921104401.38898-11-kevin.tian@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-21vfio/ap: Use the new device life cycle helpersYi Liu1-21/+29
and manage available_instances inside @init/@release. Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20220921104401.38898-10-kevin.tian@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-21drm/i915/gvt: Use the new device life cycle helpersKevin Tian3-40/+50
Move vfio_device to the start of intel_vgpu as required by the new helpers. Change intel_gvt_create_vgpu() to use intel_vgpu as the first param as other vgpu helpers do. Signed-off-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com> Link: https://lore.kernel.org/r/20220921104401.38898-9-kevin.tian@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-21vfio/mbochs: Use the new device life cycle helpersYi Liu1-27/+46
and manage avail_mbytes inside @init/@release. Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20220921104401.38898-8-kevin.tian@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-21vfio/mtty: Use the new device life cycle helpersYi Liu1-28/+39
and manage available ports inside @init/@release. Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20220921104401.38898-7-kevin.tian@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-21vfio/mdpy: Use the new device life cycle helpersYi Liu1-34/+47
and manage mdpy_count inside @init/@release. Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20220921104401.38898-6-kevin.tian@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-21vfio/hisi_acc: Use the new device life cycle helpersYi Liu3-77/+37
Tidy up @probe so all migration specific initialization logic is moved to migration specific @init callback. Remove vfio_pci_core_{un}init_device() given no user now. Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Link: https://lore.kernel.org/r/20220921104401.38898-5-kevin.tian@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-21vfio/mlx5: Use the new device life cycle helpersYi Liu1-14/+36
mlx5 has its own @init/@release for handling migration cap. Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20220921104401.38898-4-kevin.tian@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-21vfio/pci: Use the new device life cycle helpersYi Liu3-10/+47
Also introduce two pci core helpers as @init/@release for pci drivers: - vfio_pci_core_init_dev() - vfio_pci_core_release_dev() Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20220921104401.38898-3-kevin.tian@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-21vfio: Add helpers for unifying vfio_device life cycleKevin Tian2-1/+116
The idea is to let vfio core manage the vfio_device life cycle instead of duplicating the logic cross drivers. This is also a preparatory step for adding struct device into vfio_device. New pair of helpers together with a kref in vfio_device: - vfio_alloc_device() - vfio_put_device() Drivers can register @init/@release callbacks to manage any private state wrapping the vfio_device. However vfio-ccw doesn't fit this model due to a life cycle mess that its private structure mixes both parent and mdev info hence must be allocated/freed outside of the life cycle of vfio device. Per prior discussions this won't be fixed in short term by IBM folks. Instead of waiting for those modifications introduce another helper vfio_init_device() so ccw can call it to initialize a pre-allocated vfio_device. Further implication of the ccw trick is that vfio_device cannot be freed uniformly in vfio core. Instead, require *EVERY* driver to implement @release and free vfio_device inside. Then ccw can choose to delay the free at its own discretion. Another trick down the road is that kvzalloc() is used to accommodate the need of gvt which uses vzalloc() while all others use kzalloc(). So drivers should call a helper vfio_free_device() to free the vfio_device instead of assuming that kfree() or vfree() is appliable. Later once the ccw mess is fixed we can remove those tricks and fully handle structure alloc/free in vfio core. Existing vfio_{un}init_group_dev() will be deprecated after all existing usages are converted to the new model. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Co-developed-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/20220921104401.38898-2-kevin.tian@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-08vfio/mlx5: Set the driver DMA logging callbacksYishai Hadas3-3/+14
Now that everything is ready set the driver DMA logging callbacks if supported by the device. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20220908183448.195262-11-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-08vfio/mlx5: Manage error scenarios on trackerYishai Hadas2-2/+61
Handle async error events and health/recovery flow to safely stop the tracker upon error scenarios. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20220908183448.195262-10-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-08vfio/mlx5: Report dirty pages from trackerYishai Hadas2-0/+195
Report dirty pages from tracker. It includes: Querying for dirty pages in a given IOVA range, this is done by modifying the tracker into the reporting state and supplying the required range. Using the CQ event completion mechanism to be notified once data is ready on the CQ/QP to be processed. Once data is available turn on the corresponding bits in the bit map. This functionality will be used as part of the 'log_read_and_clear' driver callback in the next patches. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20220908183448.195262-9-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-08vfio/mlx5: Create and destroy page tracker objectYishai Hadas2-0/+148
Add support for creating and destroying page tracker object. This object is used to control/report the device dirty pages. As part of creating the tracker need to consider the device capabilities for max ranges and adapt/combine ranges accordingly. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20220908183448.195262-8-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-08vfio/mlx5: Init QP based resources for dirty trackingYishai Hadas2-12/+636
Init QP based resources for dirty tracking to be used upon start logging. It includes: Creating the host and firmware RC QPs, move each of them to its expected state based on the device specification, etc. Creating the relevant resources which are needed by both QPs as of UAR, PD, etc. Creating the host receive side resources as of MKEY, CQ, receive WQEs, etc. The above resources are cleaned-up upon stop logging. The tracker object that will be introduced by next patches will use those resources. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20220908183448.195262-7-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-08vfio: Introduce the DMA logging feature supportYishai Hadas4-2/+207
Introduce the DMA logging feature support in the vfio core layer. It includes the processing of the device start/stop/report DMA logging UAPIs and calling the relevant driver 'op' to do the work. Specifically, Upon start, the core translates the given input ranges into an interval tree, checks for unexpected overlapping, non aligned ranges and then pass the translated input to the driver for start tracking the given ranges. Upon report, the core translates the given input user space bitmap and page size into an IOVA kernel bitmap iterator. Then it iterates it and call the driver to set the corresponding bits for the dirtied pages in a specific IOVA range. Upon stop, the driver is called to stop the previous started tracking. The next patches from the series will introduce the mlx5 driver implementation for the logging ops. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20220908183448.195262-6-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-08vfio: Add an IOVA bitmap supportJoao Martins3-2/+452
The new facility adds a bunch of wrappers that abstract how an IOVA range is represented in a bitmap that is granulated by a given page_size. So it translates all the lifting of dealing with user pointers into its corresponding kernel addresses backing said user memory into doing finally the (non-atomic) bitmap ops to change various bits. The formula for the bitmap is: data[(iova / page_size) / 64] & (1ULL << (iova % 64)) Where 64 is the number of bits in a unsigned long (depending on arch) It introduces an IOVA iterator that uses a windowing scheme to minimize the pinning overhead, as opposed to pinning it on demand 4K at a time. Assuming a 4K kernel page and 4K requested page size, we can use a single kernel page to hold 512 page pointers, mapping 2M of bitmap, representing 64G of IOVA space. An example usage of these helpers for a given @base_iova, @page_size, @length and __user @data: bitmap = iova_bitmap_alloc(base_iova, page_size, length, data); if (IS_ERR(bitmap)) return -ENOMEM; ret = iova_bitmap_for_each(bitmap, arg, dirty_reporter_fn); iova_bitmap_free(bitmap); Each iteration of the @dirty_reporter_fn is called with a unique @iova and @length argument, indicating the current range available through the iova_bitmap. The @dirty_reporter_fn uses iova_bitmap_set() to mark dirty areas (@iova_length) within that provided range, as following: iova_bitmap_set(bitmap, iova, iova_length); The facility is intended to be used for user bitmaps representing dirtied IOVAs by IOMMU (via IOMMUFD) and PCI Devices (via vfio-pci). Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20220908183448.195262-5-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-08vfio: Introduce DMA logging uAPIsYishai Hadas1-0/+86
DMA logging allows a device to internally record what DMAs the device is initiating and report them back to userspace. It is part of the VFIO migration infrastructure that allows implementing dirty page tracking during the pre copy phase of live migration. Only DMA WRITEs are logged, and this API is not connected to VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE. This patch introduces the DMA logging involved uAPIs. It uses the FEATURE ioctl with its GET/SET/PROBE options as of below. It exposes a PROBE option to detect if the device supports DMA logging. It exposes a SET option to start device DMA logging in given IOVAs ranges. It exposes a SET option to stop device DMA logging that was previously started. It exposes a GET option to read back and clear the device DMA log. Extra details exist as part of vfio.h per a specific option. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20220908183448.195262-4-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-08Merge remote-tracking branch 'mlx5/mlx5-vfio' into v6.1/vfio/nextAlex Williamson331-1757/+3171
Merge net/mlx5 depedencies for device DMA logging and mlx5 variant driver suppport. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-08vfio/fsl-mc: Fix a typo in a messageChristophe JAILLET1-2/+2
L and S are swapped in the message. s/VFIO_FLS_MC/VFIO_FSL_MC/ Also use 'ret' instead of 'WARN_ON(ret)' to avoid a duplicated message. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Diana Craciun <diana.craciun@oss.nxp.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Link: https://lore.kernel.org/r/a7c1394346725b7435792628c8d4c06a0a745e0b.1662134821.git.christophe.jaillet@wanadoo.fr Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-07net/mlx5: Query ADV_VIRTUALIZATION capabilitiesYishai Hadas3-0/+16
Query ADV_VIRTUALIZATION capabilities which provide information for advanced virtualization related features. Current capabilities refer to the page tracker object which is used for tracking the pages that are dirtied by the device. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20220905105852.26398-3-yishaih@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-09-07net/mlx5: Introduce ifc bits for page trackerYishai Hadas1-1/+82
Introduce ifc related stuff to enable using page tracker. A page tracker is a dirty page tracking object used by the device to report the tracking log. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20220905105852.26398-2-yishaih@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-09-04Linux 6.0-rc4v6.0-rc4Linus Torvalds1-1/+1
2022-09-04Merge tag 'powerpc-6.0-4' of ↵Linus Torvalds6-69/+89
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Fix handling of PCI domains in /proc on 32-bit systems using the recently added support for numbering buses from zero for each domain. - A fix and a revert for some changes to use READ/WRITE_ONCE() which caused problems with KASAN enabled due to sanitisation calls being introduced in low-level paths that can't cope with it. - Fix build errors on 32-bit caused by the syscall table being misaligned sometimes. - Two fixes to get IBM Cell native machines booting again, which had bit-rotted while my QS22 was temporarily out of action. - Fix the papr_scm driver to not assume the order of events returned by the hypervisor is stable, and a related compile fix. Thanks to Aneesh Kumar K.V, Christophe Leroy, Jordan Niethe, Kajol Jain, Masahiro Yamada, Nathan Chancellor, Pali Rohár, Vaibhav Jain, and Zhouyi Zhou. * tag 'powerpc-6.0-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/papr_scm: Ensure rc is always initialized in papr_scm_pmu_register() Revert "powerpc/irq: Don't open code irq_soft_mask helpers" powerpc: Fix hard_irq_disable() with sanitizer powerpc/rtas: Fix RTAS MSR[HV] handling for Cell Revert "powerpc: Remove unused FW_FEATURE_NATIVE references" powerpc: align syscall table for ppc32 powerpc/pci: Enable PCI domains in /proc when PCI bus numbers are not unique powerpc/papr_scm: Fix nvdimm event mappings
2022-09-04Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds14-103/+151
Pull kvm fixes from Paolo Bonzini: "s390: - PCI interpretation compile fixes RISC-V: - fix unused variable warnings in vcpu_timer.c - move extern sbi_ext declarations to a header x86: - check validity of argument to KVM_SET_MP_STATE - use guest's global_ctrl to completely disable guest PEBS - fix a memory leak on memory allocation failure - mask off unsupported and unknown bits of IA32_ARCH_CAPABILITIES - fix build failure with Clang integrated assembler - fix MSR interception - always flush TLBs when enabling dirty logging" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: check validity of argument to KVM_SET_MP_STATE perf/x86/core: Completely disable guest PEBS via guest's global_ctrl KVM: x86: fix memoryleak in kvm_arch_vcpu_create() KVM: x86: Mask off unsupported and unknown bits of IA32_ARCH_CAPABILITIES KVM: s390: pci: Hook to access KVM lowlevel from VFIO riscv: kvm: move extern sbi_ext declarations to a header riscv: kvm: vcpu_timer: fix unused variable warnings KVM: selftests: Fix ambiguous mov in KVM_ASM_SAFE() KVM: selftests: Fix KVM_EXCEPTION_MAGIC build with Clang KVM: VMX: Heed the 'msr' argument in msr_write_intercepted() kvm: x86: mmu: Always flush TLBs when enabling dirty logging kvm: x86: mmu: Drop the need_remote_flush() function
2022-09-04Makefile.extrawarn: re-enable -Wformat for clang; take 2Nick Desaulniers1-0/+12
-Wformat was recently re-enabled for builds with clang, then quickly re-disabled, due to concerns stemming from the frequency of default argument promotion related warning instances. commit 258fafcd0683 ("Makefile.extrawarn: re-enable -Wformat for clang") commit 21f9c8a13bb2 ("Revert "Makefile.extrawarn: re-enable -Wformat for clang"") ISO WG14 has ratified N2562 to address default argument promotion explicitly for printf, as part of the upcoming ISO C2X standard. The behavior of clang was changed in clang-16 to not warn for the cited cases in all language modes. Add a version check, so that users of clang-16 now get the full effect of -Wformat. For older clang versions, re-enable flags under the -Wformat group that way users still get some useful checks related to format strings, without noisy default argument promotion warnings. I intentionally omitted -Wformat-y2k and -Wformat-security from being re-enabled, which are also part of -Wformat in clang-16. Link: https://github.com/ClangBuiltLinux/linux/issues/378 Link: https://github.com/llvm/llvm-project/issues/57102 Link: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2562.pdf Suggested-by: Justin Stitt <jstitt007@gmail.com> Suggested-by: Nathan Chancellor <nathan@kernel.org> Suggested-by: Youngmin Nam <youngmin.nam@samsung.com> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>