summaryrefslogtreecommitdiffstats
path: root/drivers/vhost
AgeCommit message (Collapse)AuthorFilesLines
2020-06-22tools/virtio: Add --resetEugenio Pérez2-0/+58
Currently, it only removes and add backend, but it will reset vq position in future commits. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Link: https://lore.kernel.org/r/20200418102217.32327-5-eperezma@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-22vhost_vdpa: Fix potential underflow in vhost_vdpa_mmap()Dan Carpenter1-1/+1
The "vma->vm_pgoff" variable is an unsigned long so if it's larger than INT_MAX then "index" can be negative leading to an underflow. Fix this by changing the type of "index" to "unsigned long". Fixes: ddd89d0a059d ("vhost_vdpa: support doorbell mapping via mmap") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/20200610085852.GB5439@mwanda Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-13Merge tag 'kbuild-v5.8-2' of ↵Linus Torvalds1-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull more Kbuild updates from Masahiro Yamada: - fix build rules in binderfs sample - fix build errors when Kbuild recurses to the top Makefile - covert '---help---' in Kconfig to 'help' * tag 'kbuild-v5.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: treewide: replace '---help---' in Kconfig files with 'help' kbuild: fix broken builds because of GZIP,BZIP2,LZOP variables samples: binderfs: really compile this sample and fix build issues
2020-06-14treewide: replace '---help---' in Kconfig files with 'help'Masahiro Yamada1-4/+4
Since commit 84af7a6194e4 ("checkpatch: kconfig: prefer 'help' over '---help---'"), the number of '---help---' has been gradually decreasing, but there are still more than 2400 instances. This commit finishes the conversion. While I touched the lines, I also fixed the indentation. There are a variety of indentation styles found. a) 4 spaces + '---help---' b) 7 spaces + '---help---' c) 8 spaces + '---help---' d) 1 space + 1 tab + '---help---' e) 1 tab + '---help---' (correct indentation) f) 1 tab + 1 space + '---help---' g) 1 tab + 2 spaces + '---help---' In order to convert all of them to 1 tab + 'help', I ran the following commend: $ find . -name 'Kconfig*' | xargs sed -i 's/^[[:space:]]*---help---/\thelp/' Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-06-11Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-6/+2
Merge some more updates from Andrew Morton: - various hotfixes and minor things - hch's use_mm/unuse_mm clearnups Subsystems affected by this patch series: mm/hugetlb, scripts, kcov, lib, nilfs, checkpatch, lib, mm/debug, ocfs2, lib, misc. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: kernel: set USER_DS in kthread_use_mm kernel: better document the use_mm/unuse_mm API contract kernel: move use_mm/unuse_mm to kthread.c kernel: move use_mm/unuse_mm to kthread.c stacktrace: cleanup inconsistent variable type lib: test get_count_order/long in test_bitops.c mm: add comments on pglist_data zones ocfs2: fix spelling mistake and grammar mm/debug_vm_pgtable: fix kernel crash by checking for THP support lib: fix bitmap_parse() on 64-bit big endian archs checkpatch: correct check for kernel parameters doc nilfs2: fix null pointer dereference at nilfs_segctor_do_construct() lib/lz4/lz4_decompress.c: document deliberate use of `&' kcov: check kcov_softirq in kcov_remote_stop() scripts/spelling: add a few more typos khugepaged: selftests: fix timeout condition in wait_for_scan()
2020-06-10kernel: set USER_DS in kthread_use_mmChristoph Hellwig1-3/+0
Some architectures like arm64 and s390 require USER_DS to be set for kernel threads to access user address space, which is the whole purpose of kthread_use_mm, but other like x86 don't. That has lead to a huge mess where some callers are fixed up once they are tested on said architectures, while others linger around and yet other like io_uring try to do "clever" optimizations for what usually is just a trivial asignment to a member in the thread_struct for most architectures. Make kthread_use_mm set USER_DS, and kthread_unuse_mm restore to the previous value instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Jens Axboe <axboe@kernel.dk> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Felipe Balbi <balbi@kernel.org> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Zhi Wang <zhi.a.wang@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: http://lkml.kernel.org/r/20200404094101.672954-7-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-10kernel: better document the use_mm/unuse_mm API contractChristoph Hellwig1-2/+2
Switch the function documentation to kerneldoc comments, and add WARN_ON_ONCE asserts that the calling thread is a kernel thread and does not have ->mm set (or has ->mm set in the case of unuse_mm). Also give the functions a kthread_ prefix to better document the use case. [hch@lst.de: fix a comment typo, cover the newly merged use_mm/unuse_mm caller in vfio] Link: http://lkml.kernel.org/r/20200416053158.586887-3-hch@lst.de [sfr@canb.auug.org.au: powerpc/vas: fix up for {un}use_mm() rename] Link: http://lkml.kernel.org/r/20200422163935.5aa93ba5@canb.auug.org.au Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Jens Axboe <axboe@kernel.dk> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [usb] Acked-by: Haren Myneni <haren@linux.ibm.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Felipe Balbi <balbi@kernel.org> Cc: Jason Wang <jasowang@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Zhi Wang <zhi.a.wang@intel.com> Link: http://lkml.kernel.org/r/20200404094101.672954-6-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-10kernel: move use_mm/unuse_mm to kthread.cChristoph Hellwig1-1/+0
Patch series "improve use_mm / unuse_mm", v2. This series improves the use_mm / unuse_mm interface by better documenting the assumptions, and my taking the set_fs manipulations spread over the callers into the core API. This patch (of 3): Use the proper API instead. Link: http://lkml.kernel.org/r/20200404094101.672954-1-hch@lst.de These helpers are only for use with kernel threads, and I will tie them more into the kthread infrastructure going forward. Also move the prototypes to kthread.h - mmu_context.h was a little weird to start with as it otherwise contains very low-level MM bits. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Jens Axboe <axboe@kernel.dk> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Felipe Balbi <balbi@kernel.org> Cc: Jason Wang <jasowang@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Zhi Wang <zhi.a.wang@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: http://lkml.kernel.org/r/20200404094101.672954-1-hch@lst.de Link: http://lkml.kernel.org/r/20200416053158.586887-1-hch@lst.de Link: http://lkml.kernel.org/r/20200404094101.672954-5-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-10Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds9-54/+195
Pull virtio updates from Michael Tsirkin: - virtio-mem: paravirtualized memory hotplug - support doorbell mapping for vdpa - config interrupt support in ifc - fixes all over the place * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (40 commits) vhost/test: fix up after API change virtio_mem: convert device block size into 64bit virtio-mem: drop unnecessary initialization ifcvf: implement config interrupt in IFCVF vhost: replace -1 with VHOST_FILE_UNBIND in ioctls vhost_vdpa: Support config interrupt in vdpa ifcvf: ignore continuous setting same status value virtio-mem: Don't rely on implicit compiler padding for requests virtio-mem: Try to unplug the complete online memory block first virtio-mem: Use -ETXTBSY as error code if the device is busy virtio-mem: Unplug subblocks right-to-left virtio-mem: Drop manual check for already present memory virtio-mem: Add parent resource for all added "System RAM" virtio-mem: Better retry handling virtio-mem: Offline and remove completely unplugged memory blocks mm/memory_hotplug: Introduce offline_and_remove_memory() virtio-mem: Allow to offline partially unplugged memory blocks mm: Allow to offline unmovable PageOffline() pages via MEM_GOING_OFFLINE virtio-mem: Paravirtualized memory hotunplug part 2 virtio-mem: Paravirtualized memory hotunplug part 1 ...
2020-06-09mmap locking API: use coccinelle to convert mmap_sem rwsem call sitesMichel Lespinasse1-2/+2
This change converts the existing mmap_sem rwsem calls to use the new mmap locking API instead. The change is generated using coccinelle with the following rule: // spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir . @@ expression mm; @@ ( -init_rwsem +mmap_init_lock | -down_write +mmap_write_lock | -down_write_killable +mmap_write_lock_killable | -down_write_trylock +mmap_write_trylock | -up_write +mmap_write_unlock | -downgrade_write +mmap_write_downgrade | -down_read +mmap_read_lock | -down_read_killable +mmap_read_lock_killable | -down_read_trylock +mmap_read_trylock | -up_read +mmap_read_unlock ) -(&mm->mmap_sem) +(mm) Signed-off-by: Michel Lespinasse <walken@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09vhost/test: fix up after API changeMichael S. Tsirkin1-1/+1
Pass a flag to request kernel thread use. Fixes: 01fcb1cbc88e ("vhost: allow device that does not depend on vhost worker") Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-08vhost: convert get_user_pages() --> pin_user_pages()John Hubbard1-3/+2
This code was using get_user_pages*(), in approximately a "Case 5" scenario (accessing the data within a page), using the categorization from [1]. That means that it's time to convert the get_user_pages*() + put_page() calls to pin_user_pages*() + unpin_user_pages() calls. There is some helpful background in [2]: basically, this is a small part of fixing a long-standing disconnect between pinning pages, and file systems' use of those pages. [1] Documentation/core-api/pin_user_pages.rst [2] "Explicit pinning of user-space pages": https://lwn.net/Articles/807108/ Signed-off-by: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Souptick Joarder <jrdr.linux@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Link: http://lkml.kernel.org/r/20200529234309.484480-3-jhubbard@nvidia.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-06vhost: replace -1 with VHOST_FILE_UNBIND in ioctlsZhu Lingshan1-4/+4
This commit replaces -1 with VHOST_FILE_UNBIND in ioctls since we have added such a macro in the uapi header for vdpa_host. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/1591352835-22441-5-git-send-email-lingshan.zhu@intel.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-06vhost_vdpa: Support config interrupt in vdpaZhu Lingshan1-0/+47
This commit implements config interrupt support in vhost_vdpa layer. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/1591352835-22441-4-git-send-email-lingshan.zhu@intel.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-05Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds1-0/+1
Pull SCSI updates from James Bottomley: :This series consists of the usual driver updates (qla2xxx, ufs, zfcp, target, scsi_debug, lpfc, qedi, qedf, hisi_sas, mpt3sas) plus a host of other minor updates. There are no major core changes in this series apart from a refactoring in scsi_lib.c" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (207 commits) scsi: ufs: ti-j721e-ufs: Fix unwinding of pm_runtime changes scsi: cxgb3i: Fix some leaks in init_act_open() scsi: ibmvscsi: Make some functions static scsi: iscsi: Fix deadlock on recovery path during GFP_IO reclaim scsi: ufs: Fix WriteBooster flush during runtime suspend scsi: ufs: Fix index of attributes query for WriteBooster feature scsi: ufs: Allow WriteBooster on UFS 2.2 devices scsi: ufs: Remove unnecessary memset for dev_info scsi: ufs-qcom: Fix scheduling while atomic issue scsi: mpt3sas: Fix reply queue count in non RDPQ mode scsi: lpfc: Fix lpfc_nodelist leak when processing unsolicited event scsi: target: tcmu: Fix a use after free in tcmu_check_expired_queue_cmd() scsi: vhost: Notify TCM about the maximum sg entries supported per command scsi: qla2xxx: Remove return value from qla_nvme_ls() scsi: qla2xxx: Remove an unused function scsi: iscsi: Register sysfs for iscsi workqueue scsi: scsi_debug: Parser tables and code interaction scsi: core: Refactor scsi_mq_setup_tags function scsi: core: Fix incorrect usage of shost_for_each_device scsi: qla2xxx: Fix endianness annotations in source files ...
2020-06-04vhost: (cosmetic) remove a superfluous variable initialisationGuennadi Liakhovetski1-1/+1
Even the compiler is able to figure out that in this case the initialisation is superfluous. Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com> Link: https://lore.kernel.org/r/20200527180541.5570-3-guennadi.liakhovetski@linux.intel.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-04vhost_vdpa: disable doorbell mapping for !MMUMichael S. Tsirkin1-0/+4
There could be ways to support doorbell mapping with !MMU, but things like pgprot_noncached are not universally supported. Fixable, but just disable this for now. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-04vhost_vdpa: support doorbell mapping via mmapJason Wang1-0/+59
Currently the doorbell is relayed via eventfd which may have significant overhead because of the cost of vmexits or syscall. This patch introduces mmap() based doorbell mapping which can eliminate the overhead caused by vmexit or syscall. To ease the userspace modeling of the doorbell layout (usually virtio-pci), this patch starts from a doorbell per page model. Vhost-vdpa only support the hardware doorbell that sit at the boundary of a page and does not share the page with other registers. Doorbell of each virtqueue must be mapped separately, pgoff is the index of the virtqueue. This allows userspace to map a subset of the doorbell which may be useful for the implementation of software assisted virtqueue (control vq) in the future. Signed-off-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20200529080303.15449-5-jasowang@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-04vhost: use mmgrab() instead of mmget() for non worker deviceJason Wang1-8/+34
For the device that doesn't use vhost worker and use_mm(), mmget() is too heavy weight and it may brings troubles for implementing mmap() support for vDPA device. This is because, an reference to the address space was held via mm_get() in vhost_dev_set_owner() and an reference to the file was held in mmap(). This means when process exits, the mm can not be released thus we can not release the file. This patch tries to use mmgrab() instead of mmget(), which allows the address space to be destroy in process exit without releasing the mm structure itself. This is sufficient for vDPA device which pin user pages and does not depend on the address space to work. Signed-off-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20200529080303.15449-3-jasowang@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-04vhost: allow device that does not depend on vhost workerJason Wang6-17/+31
vDPA device currently relays the eventfd via vhost worker. This is inefficient due the latency of wakeup and scheduling, so this patch tries to introduce a use_worker attribute for the vhost device. When use_worker is not set with vhost_dev_init(), vhost won't try to allocate a worker thread and the vhost_poll will be processed directly in the wakeup function. This help for vDPA since it reduces the latency caused by vhost worker. In my testing, it saves 0.2 ms in pings between VMs on a mutual host. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20200529080303.15449-2-jasowang@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-02vhost: revert "vhost: disable for OABI"Michael S. Tsirkin1-13/+4
This reverts commit d085eb8ce727 ("vhost: disable for OABI") With commit "virtio: force spec specified alignment on types" in place, we force proper alignment for all structures, so there's no longer a reason to blacklist OABI. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-02virtio: force spec specified alignment on typesMichael S. Tsirkin3-10/+10
The ring element addresses are passed between components with different alignments assumptions. Thus, if guest/userspace selects a pointer and host then gets and dereferences it, we might need to decrease the compiler-selected alignment to prevent compiler on the host from assuming pointer is aligned. This actually triggers on ARM with -mabi=apcs-gnu - which is a deprecated configuration, but it seems safer to handle this generally. Note that userspace that allocates the memory is actually OK and does not need to be fixed, but userspace that gets it from guest or another process does need to be fixed. The later doesn't generally talk to the kernel so while it might be buggy it's not talking to the kernel in the buggy way - it's just using the header in the buggy way - so fixing header and asking userspace to recompile is the best we can do. I verified that the produced kernel binary on x86 is exactly identical before and after the change. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2020-05-26scsi: vhost: Notify TCM about the maximum sg entries supported per commandSudhakar Panneerselvam1-0/+1
vhost-scsi pre-allocates the maximum sg entries per command and if a command requires more than VHOST_SCSI_PREALLOC_SGLS entries, then that command is failed by it. This patch lets vhost communicate the max sg limit when it registers vhost_scsi_ops with TCM. With this change, TCM would report the max sg entries through "Block Limits" VPD page which will be typically queried by the SCSI initiator during device discovery. By knowing this limit, the initiator could ensure the maximum transfer length is less than or equal to what is reported by vhost-scsi. Link: https://lore.kernel.org/r/1590166317-953-1-git-send-email-sudhakar.panneerselvam@oracle.com Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Sudhakar Panneerselvam <sudhakar.panneerselvam@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-05-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller1-2/+2
The MSCC bug fix in 'net' had to be slightly adjusted because the register accesses are done slightly differently in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-21Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds1-2/+2
Pull virtio fixes from Michael Tsirkin: "Fix a couple of build warnings" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vhost: missing __user tags vdpasim: remove unused variable 'ret'
2020-05-15vhost: missing __user tagsMichael S. Tsirkin1-2/+2
sparse warns about converting void * to void __user *. This is not new but only got noticed now that vhost is built on more systems. This is just a question of __user tags missing in a couple of places, so fix it up. Fixes: f88949138058 ("vhost: introduce O(1) vq metadata cache") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-05-14vhost_net: Also populate XDP frame sizeJesper Dangaard Brouer1-0/+1
In vhost_net_build_xdp() the 'buf' that gets queued via an xdp_buff have embedded a struct tun_xdp_hdr (located at xdp->data_hard_start) which contains the buffer length 'buflen' (with tailroom for skb_shared_info). Also storing this buflen in xdp->frame_sz, does not obsolete struct tun_xdp_hdr, as it also contains a struct virtio_net_hdr with other information. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/bpf/158945343928.97035.4620233649151726289.stgit@firesoul
2020-05-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds1-5/+11
Pull networking fixes from David Miller: 1) Fix reference count leaks in various parts of batman-adv, from Xiyu Yang. 2) Update NAT checksum even when it is zero, from Guillaume Nault. 3) sk_psock reference count leak in tls code, also from Xiyu Yang. 4) Sanity check TCA_FQ_CODEL_DROP_BATCH_SIZE netlink attribute in fq_codel, from Eric Dumazet. 5) Fix panic in choke_reset(), also from Eric Dumazet. 6) Fix VLAN accel handling in bnxt_fix_features(), from Michael Chan. 7) Disallow out of range quantum values in sch_sfq, from Eric Dumazet. 8) Fix crash in x25_disconnect(), from Yue Haibing. 9) Don't pass pointer to local variable back to the caller in nf_osf_hdr_ctx_init(), from Arnd Bergmann. 10) Wireguard should use the ECN decap helper functions, from Toke Høiland-Jørgensen. 11) Fix command entry leak in mlx5 driver, from Moshe Shemesh. 12) Fix uninitialized variable access in mptcp's subflow_syn_recv_sock(), from Paolo Abeni. 13) Fix unnecessary out-of-order ingress frame ordering in macsec, from Scott Dial. 14) IPv6 needs to use a global serial number for dst validation just like ipv4, from David Ahern. 15) Fix up PTP_1588_CLOCK deps, from Clay McClure. 16) Missing NLM_F_MULTI flag in gtp driver netlink messages, from Yoshiyuki Kurauchi. 17) Fix a regression in that dsa user port errors should not be fatal, from Florian Fainelli. 18) Fix iomap leak in enetc driver, from Dejin Zheng. 19) Fix use after free in lec_arp_clear_vccs(), from Cong Wang. 20) Initialize protocol value earlier in neigh code paths when generating events, from Roman Mashak. 21) netdev_update_features() must be called with RTNL mutex in macsec driver, from Antoine Tenart. 22) Validate untrusted GSO packets even more strictly, from Willem de Bruijn. 23) Wireguard decrypt worker needs a cond_resched(), from Jason Donenfeld. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (111 commits) net: flow_offload: skip hw stats check for FLOW_ACTION_HW_STATS_DONT_CARE MAINTAINERS: put DYNAMIC INTERRUPT MODERATION in proper order wireguard: send/receive: use explicit unlikely branch instead of implicit coalescing wireguard: selftests: initalize ipv6 members to NULL to squelch clang warning wireguard: send/receive: cond_resched() when processing worker ringbuffers wireguard: socket: remove errant restriction on looping to self wireguard: selftests: use normal kernel stack size on ppc64 net: ethernet: ti: am65-cpsw-nuss: fix irqs type ionic: Use debugfs_create_bool() to export bool net: dsa: Do not leave DSA master with NULL netdev_ops net: dsa: remove duplicate assignment in dsa_slave_add_cls_matchall_mirred net: stricter validation of untrusted gso packets seg6: fix SRH processing to comply with RFC8754 net: mscc: ocelot: ANA_AUTOAGE_AGE_PERIOD holds a value in seconds, not ms net: dsa: ocelot: the MAC table on Felix is twice as large net: dsa: sja1105: the PTP_CLK extts input reacts on both edges selftests: net: tcp_mmap: fix SO_RCVLOWAT setting net: hsr: fix incorrect type usage for protocol variable net: macsec: fix rtnl locking issue net: mvpp2: cls: Prevent buffer overflow in mvpp2_ethtool_cls_rule_del() ...
2020-05-02vhost: vsock: kick send_pkt worker once device is startedJia He1-0/+5
Ning Bo reported an abnormal 2-second gap when booting Kata container [1]. The unconditional timeout was caused by VSOCK_DEFAULT_CONNECT_TIMEOUT of connecting from the client side. The vhost vsock client tries to connect an initializing virtio vsock server. The abnormal flow looks like: host-userspace vhost vsock guest vsock ============== =========== ============ connect() --------> vhost_transport_send_pkt_work() initializing | vq->private_data==NULL | will not be queued V schedule_timeout(2s) vhost_vsock_start() <--------- device ready set vq->private_data wait for 2s and failed connect() again vq->private_data!=NULL recv connecting pkt Details: 1. Host userspace sends a connect pkt, at that time, guest vsock is under initializing, hence the vhost_vsock_start has not been called. So vq->private_data==NULL, and the pkt is not been queued to send to guest 2. Then it sleeps for 2s 3. After guest vsock finishes initializing, vq->private_data is set 4. When host userspace wakes up after 2s, send connecting pkt again, everything is fine. As suggested by Stefano Garzarella, this fixes it by additional kicking the send_pkt worker in vhost_vsock_start once the virtio device is started. This makes the pending pkt sent again. After this patch, kata-runtime (with vsock enabled) boot time is reduced from 3s to 1s on a ThunderX2 arm64 server. [1] https://github.com/kata-containers/runtime/issues/1917 Reported-by: Ning Bo <n.b@live.com> Suggested-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Jia He <justin.he@arm.com> Link: https://lore.kernel.org/r/20200501043840.186557-1-justin.he@arm.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2020-04-27vsock/virtio: fix multiple packet delivery to monitoring devicesStefano Garzarella1-0/+6
In virtio_transport.c, if the virtqueue is full, the transmitting packet is queued up and it will be sent in the next iteration. This causes the same packet to be delivered multiple times to monitoring devices. We want to continue to deliver packets to monitoring devices before it is put in the virtqueue, to avoid that replies can appear in the packet capture before the transmitted packet. This patch fixes the issue, adding a new flag (tap_delivered) in struct virtio_vsock_pkt, to check if the packet is already delivered to monitoring devices. In vhost/vsock.c, we are splitting packets, so we must set 'tap_delivered' to false when we queue up the same virtio_vsock_pkt to handle the remaining bytes. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-27vhost/vsock: fix packet delivery order to monitoring devicesStefano Garzarella1-5/+5
We want to deliver packets to monitoring devices before it is put in the virtqueue, to avoid that replies can appear in the packet capture before the transmitted packet. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-20vhost: disable for OABIMichael S. Tsirkin1-4/+13
vhost is currently broken on the some ARM configs. The reason is that the ring element addresses are passed between components with different alignments assumptions. Thus, if guest selects a pointer and host then gets and dereferences it, then alignment assumed by the host's compiler might be greater than the actual alignment of the pointer. compiler on the host from assuming pointer is aligned. This actually triggers on ARM with -mabi=apcs-gnu - which is a deprecated configuration. With this OABI, compiler assumes that all structures are 4 byte aligned - which is stronger than virtio guarantees for available and used rings, which are merely 2 bytes. Thus a guest without -mabi=apcs-gnu running on top of host with -mabi=apcs-gnu will be broken. The correct fix is to force alignment of structures - however that is an intrusive fix that's best deferred until the next release. We didn't previously support such ancient systems at all - this surfaced after vdpa support prompted removing dependency of vhost on VIRTULIZATION. So for now, let's just add something along the lines of depends on !ARM || AEABI to the virtio Kconfig declaration, and add a comment that it has to do with struct member alignment. Note: we can't make VHOST and VHOST_RING themselves have a dependency since these are selected. Add a new symbol for that. We should be able to drop this dependency down the road. Fixes: 20c384f1ea1a0bc7 ("vhost: refine vhost and vringh kconfig") Suggested-by: Ard Biesheuvel <ardb@kernel.org> Suggested-by: Richard Earnshaw <Richard.Earnshaw@arm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-04-17vdpa: make vhost, virtio depend on menuMichael S. Tsirkin1-1/+1
If user did not configure any vdpa drivers, neither vhost nor virtio vdpa are going to be useful. So there's no point in prompting for these and selecting vdpa core automatically. Simplify configuration by making virtio and vhost vdpa drivers depend on vdpa menu entry. Once done, we no longer need a separate menu entry, so also get rid of this. While at it, fix up the IFC entry: VDPA->vDPA for consistency with other places. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2020-04-16virtio/test: fix up after IOTLB changesMichael S. Tsirkin3-2/+9
Allow building vringh without IOTLB (that's the case for userspace builds, will be useful for CAIF/VOD down the road too). Update for API tweaks. Don't include vringh with userspace builds. Cc: Jason Wang <jasowang@redhat.com> Cc: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2020-04-16vhost: Create accessors for virtqueues private_dataEugenio Pérez5-32/+61
Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Link: https://lore.kernel.org/r/20200331192804.6019-2-eperezma@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-04-16vhost: remove set but not used variable 'status'Jason Yan1-3/+0
Fix the following gcc warning: drivers/vhost/vdpa.c:299:5: warning: variable 'status' set but not used [-Wunused-but-set-variable] u8 status; ^~~~~~ Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Jason Yan <yanaijie@huawei.com> Link: https://lore.kernel.org/r/20200402065106.20108-1-yanaijie@huawei.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-04-16vhost: vdpa: remove unnecessary null checkGustavo A. R. Silva1-2/+0
container_of is never null, so this null check is unnecessary. Addresses-Coverity-ID: 1492006 ("Logically dead code") Fixes: 20453a45fb06 ("vhost: introduce vDPA-based backend") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Link: https://lore.kernel.org/r/20200330235040.GA9997@embeddedor Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2020-04-02vhost: introduce vDPA-based backendTiwei Bie3-0/+898
This patch introduces a vDPA-based vhost backend. This backend is built on top of the same interface defined in virtio-vDPA and provides a generic vhost interface for userspace to accelerate the virtio devices in guest. This backend is implemented as a vDPA device driver on top of the same ops used in virtio-vDPA. It will create char device entry named vhost-vdpa-$index for userspace to use. Userspace can use vhost ioctls on top of this char device to setup the backend. Vhost ioctls are extended to make it type agnostic and behave like a virtio device, this help to eliminate type specific API like what vhost_net/scsi/vsock did: - VHOST_VDPA_GET_DEVICE_ID: get the virtio device ID which is defined by virtio specification to differ from different type of devices - VHOST_VDPA_GET_VRING_NUM: get the maximum size of virtqueue supported by the vDPA device - VHSOT_VDPA_SET/GET_STATUS: set and get virtio status of vDPA device - VHOST_VDPA_SET/GET_CONFIG: access virtio config space - VHOST_VDPA_SET_VRING_ENABLE: enable a specific virtqueue For memory mapping, IOTLB API is mandated for vhost-vDPA which means userspace drivers are required to use VHOST_IOTLB_UPDATE/VHOST_IOTLB_INVALIDATE to add or remove mapping for a specific userspace memory region. The vhost-vDPA API is designed to be type agnostic, but it allows net device only in current stage. Due to the lacking of control virtqueue support, some features were filter out by vhost-vdpa. We will enable more features and devices in the near future. Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20200326140125.19794-8-jasowang@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-04-01vringh: IOTLB supportJason Wang2-23/+399
This patch implements the third memory accessor for vringh besides current kernel and userspace accessors. This idea is to allow vringh to do the address translation through an IOTLB which is implemented via vhost_map interval tree. Users should setup and IOVA to PA mapping in this IOTLB. This allows us to: - Use vringh to access virtqueues with vIOMMU - Use vringh to implement software virtqueues for vDPA devices Signed-off-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20200326140125.19794-5-jasowang@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-04-01vhost: factor out IOTLBJason Wang6-181/+267
This patch factors out IOTLB into a dedicated module in order to be reused by other modules like vringh. User may choose to enable the automatic retiring by specifying VHOST_IOTLB_FLAG_RETIRE flag to fit for the case of vhost device IOTLB implementation. Signed-off-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20200326140125.19794-4-jasowang@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-04-01vhost: allow per device message handlerJason Wang5-6/+19
This patch allow device to register its own message handler during vhost_dev_init(). vDPA device will use it to implement its own DMA mapping logic. Signed-off-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20200326140125.19794-3-jasowang@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-04-01vhost: refine vhost and vringh kconfigJason Wang2-13/+21
Currently, CONFIG_VHOST depends on CONFIG_VIRTUALIZATION. But vhost is not necessarily for VM since it's a generic userspace and kernel communication protocol. Such dependency may prevent archs without virtualization support from using vhost. To solve this, a dedicated vhost menu is created under drivers so CONIFG_VHOST can be decoupled out of CONFIG_VIRTUALIZATION. While at it, also squash Kconfig.vringh into vhost Kconfig file. This avoids the trick of conditional inclusion from VOP or CAIF. Then it will be easier to introduce new vringh users and common dependency for both vringh and vhost. Signed-off-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20200326140125.19794-2-jasowang@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-02-22vhost: Check docket sk_family instead of call getnameEugenio Pérez1-9/+1
Doing so, we save one call to get data we already have in the struct. Also, since there is no guarantee that getname use sockaddr_ll parameter beyond its size, we add a little bit of security here. It should do not do beyond MAX_ADDR_LEN, but syzbot found that ax25_getname writes more (72 bytes, the size of full_sockaddr_ax25, versus 20 + 32 bytes of sockaddr_ll + MAX_ADDR_LEN in syzbot repro). Fixes: 3a4d5c94e9593 ("vhost_net: a kernel-level virtio server") Reported-by: syzbot+f2a62d07a5198c819c7b@syzkaller.appspotmail.com Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds1-1/+3
Pull networking fixes from David Miller: 1) More jumbo frame fixes in r8169, from Heiner Kallweit. 2) Fix bpf build in minimal configuration, from Alexei Starovoitov. 3) Use after free in slcan driver, from Jouni Hogander. 4) Flower classifier port ranges don't work properly in the HW offload case, from Yoshiki Komachi. 5) Use after free in hns3_nic_maybe_stop_tx(), from Yunsheng Lin. 6) Out of bounds access in mqprio_dump(), from Vladyslav Tarasiuk. 7) Fix flow dissection in dsa TX path, from Alexander Lobakin. 8) Stale syncookie timestampe fixes from Guillaume Nault. [ Did an evil merge to silence a warning introduced by this pull - Linus ] * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (84 commits) r8169: fix rtl_hw_jumbo_disable for RTL8168evl net_sched: validate TCA_KIND attribute in tc_chain_tmplt_add() r8169: add missing RX enabling for WoL on RTL8125 vhost/vsock: accept only packets with the right dst_cid net: phy: dp83867: fix hfs boot in rgmii mode net: ethernet: ti: cpsw: fix extra rx interrupt inet: protect against too small mtu values. gre: refetch erspan header from skb->data after pskb_may_pull() pppoe: remove redundant BUG_ON() check in pppoe_pernet tcp: Protect accesses to .ts_recent_stamp with {READ,WRITE}_ONCE() tcp: tighten acceptance of ACKs not matching a child socket tcp: fix rejected syncookies due to stale timestamps lpc_eth: kernel BUG on remove tcp: md5: fix potential overestimation of TCP option space net: sched: allow indirect blocks to bind to clsact in TC net: core: rename indirect block ingress cb function net-sysfs: Call dev_hold always in netdev_queue_add_kobject net: dsa: fix flow dissection on Tx path net/tls: Fix return values to avoid ENOTSUPP net: avoid an indirect call in ____sys_recvmsg() ...
2019-12-07vhost/vsock: accept only packets with the right dst_cidStefano Garzarella1-1/+3
When we receive a new packet from the guest, we check if the src_cid is correct, but we forgot to check the dst_cid. The host should accept only packets where dst_cid is equal to the host CID. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-04vhost, kcov: collect coverage from vhost_workerAndrey Konovalov2-0/+7
Add kcov_remote_start()/kcov_remote_stop() annotations to the vhost_worker() function, which is responsible for processing vhost works. Since vhost_worker() threads are spawned per vhost device instance the common kcov handle is used for kcov_remote_start()/stop() annotations (see Documentation/dev-tools/kcov.rst for details). As the result kcov can now be used to collect coverage from vhost worker threads. Link: http://lkml.kernel.org/r/e49d5d154e5da6c9ada521d2b7ce10a49ce9f98b.1572366574.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Alexander Potapenko <glider@google.com> Cc: Anders Roxell <anders.roxell@linaro.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: David Windsor <dwindsor@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Elena Reshetova <elena.reshetova@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jason Wang <jasowang@redhat.com> Cc: Marco Elver <elver@google.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-01Merge tag 'compat-ioctl-5.5' of ↵Linus Torvalds4-44/+4
git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground Pull removal of most of fs/compat_ioctl.c from Arnd Bergmann: "As part of the cleanup of some remaining y2038 issues, I came to fs/compat_ioctl.c, which still has a couple of commands that need support for time64_t. In completely unrelated work, I spent time on cleaning up parts of this file in the past, moving things out into drivers instead. After Al Viro reviewed an earlier version of this series and did a lot more of that cleanup, I decided to try to completely eliminate the rest of it and move it all into drivers. This series incorporates some of Al's work and many patches of my own, but in the end stops short of actually removing the last part, which is the scsi ioctl handlers. I have patches for those as well, but they need more testing or possibly a rewrite" * tag 'compat-ioctl-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (42 commits) scsi: sd: enable compat ioctls for sed-opal pktcdvd: add compat_ioctl handler compat_ioctl: move SG_GET_REQUEST_TABLE handling compat_ioctl: ppp: move simple commands into ppp_generic.c compat_ioctl: handle PPPIOCGIDLE for 64-bit time_t compat_ioctl: move PPPIOCSCOMPRESS to ppp_generic compat_ioctl: unify copy-in of ppp filters tty: handle compat PPP ioctls compat_ioctl: move SIOCOUTQ out of compat_ioctl.c compat_ioctl: handle SIOCOUTQNSD af_unix: add compat_ioctl support compat_ioctl: reimplement SG_IO handling compat_ioctl: move WDIOC handling into wdt drivers fs: compat_ioctl: move FITRIM emulation into file systems gfs2: add compat_ioctl support compat_ioctl: remove unused convert_in_user macro compat_ioctl: remove last RAID handling code compat_ioctl: remove /dev/raw ioctl translation compat_ioctl: remove PCI ioctl translation compat_ioctl: remove joystick ioctl translation ...
2019-11-14vhost/vsock: refuse CID assigned to the guest->host transportStefano Garzarella1-0/+6
In a nested VM environment, we have to refuse to assign to a nested guest the same CID assigned to our guest->host transport. In this way, the user can use the local CID for loopback. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14vsock: prevent transport modules unloadingStefano Garzarella1-0/+2
This patch adds 'module' member in the 'struct vsock_transport' in order to get/put the transport module. This prevents the module unloading while sockets are assigned to it. We increase the module refcnt when a socket is assigned to a transport, and we decrease the module refcnt when the socket is destructed. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14vsock: add multi-transports supportStefano Garzarella1-2/+3
This patch adds the support of multiple transports in the VSOCK core. With the multi-transports support, we can use vsock with nested VMs (using also different hypervisors) loading both guest->host and host->guest transports at the same time. Major changes: - vsock core module can be loaded regardless of the transports - vsock_core_init() and vsock_core_exit() are renamed to vsock_core_register() and vsock_core_unregister() - vsock_core_register() has a feature parameter (H2G, G2H, DGRAM) to identify which directions the transport can handle and if it's support DGRAM (only vmci) - each stream socket is assigned to a transport when the remote CID is set (during the connect() or when we receive a connection request on a listener socket). The remote CID is used to decide which transport to use: - remote CID <= VMADDR_CID_HOST will use guest->host transport; - remote CID == local_cid (guest->host transport) will use guest->host transport for loopback (host->guest transports don't support loopback); - remote CID > VMADDR_CID_HOST will use host->guest transport; - listener sockets are not bound to any transports since no transport operations are done on it. In this way we can create a listener socket, also if the transports are not loaded or with VMADDR_CID_ANY to listen on all transports. - DGRAM sockets are handled as before, since only the vmci_transport provides this feature. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>