aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/vhost
AgeCommit message (Collapse)Author
2020-10-14vhost-vdpa: fix page pinning leakage in error pathSi-Wei Liu
[ Upstream commit 7ed9e3d97c32d969caded2dfb6e67c1a2cc5a0b1 ] Pinned pages are not properly accounted particularly when mapping error occurs on IOTLB update. Clean up dangling pinned pages for the error path. As the inflight pinned pages, specifically for memory region that strides across multiple chunks, would need more than one free page for book keeping and accounting. For simplicity, pin pages for all memory in the IOVA range in one go rather than have multiple pin_user_pages calls to make up the entire region. This way it's easier to track and account the pages already mapped, particularly for clean-up in the error path. Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend") Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> Link: https://lore.kernel.org/r/1601701330-16837-3-git-send-email-si-wei.liu@oracle.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-14vhost-vdpa: fix vhost_vdpa_map() on error conditionSi-Wei Liu
[ Upstream commit 1477c8aebb94a1db398c12d929a9d27bbd678d8c ] vhost_vdpa_map() should remove the iotlb entry just added if the corresponding mapping fails to set up properly. Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend") Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> Link: https://lore.kernel.org/r/1601701330-16837-2-git-send-email-si-wei.liu@oracle.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-14vhost: Use vhost_get_used_size() in vhost_vring_set_addr()Greg Kurz
commit 71878fa46c7e3b40fa7b3f1b6e4ba3f92f1ac359 upstream. The open-coded computation of the used size doesn't take the event into account when the VIRTIO_RING_F_EVENT_IDX feature is present. Fix that by using vhost_get_used_size(). Fixes: 8ea8cf89e19a ("vhost: support event index") Cc: stable@vger.kernel.org Signed-off-by: Greg Kurz <groug@kaod.org> Link: https://lore.kernel.org/r/160171932300.284610.11846106312938909461.stgit@bahia.lan Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-14vhost: Don't call access_ok() when using IOTLBGreg Kurz
commit 0210a8db2aeca393fb3067e234967877e3146266 upstream. When the IOTLB device is enabled, the vring addresses we get from userspace are GIOVAs. It is thus wrong to pass them down to access_ok() which only takes HVAs. Access validation is done at prefetch time with IOTLB. Teach vq_access_ok() about that by moving the (vq->iotlb) check from vhost_vq_access_ok() to vq_access_ok(). This prevents vhost_vring_set_addr() to fail when verifying the accesses. No behavior change for vhost_vq_access_ok(). BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1883084 Fixes: 6b1e6cc7855b ("vhost: new device IOTLB API") Cc: jasowang@redhat.com CC: stable@vger.kernel.org # 4.14+ Signed-off-by: Greg Kurz <groug@kaod.org> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/160171931213.284610.2052489816407219136.stgit@bahia.lan Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-29vhost/scsi: fix up req type endian-nessMichael S. Tsirkin
vhost/scsi doesn't handle type conversion correctly for request type when using virtio 1.0 and up for BE, or cross-endian platforms. Fix it up using vhost_32_to_cpu. Cc: stable@vger.kernel.org Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-06-22tools/virtio: Add --resetEugenio Pérez
Currently, it only removes and add backend, but it will reset vq position in future commits. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Link: https://lore.kernel.org/r/20200418102217.32327-5-eperezma@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-22vhost_vdpa: Fix potential underflow in vhost_vdpa_mmap()Dan Carpenter
The "vma->vm_pgoff" variable is an unsigned long so if it's larger than INT_MAX then "index" can be negative leading to an underflow. Fix this by changing the type of "index" to "unsigned long". Fixes: ddd89d0a059d ("vhost_vdpa: support doorbell mapping via mmap") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/20200610085852.GB5439@mwanda Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-13Merge tag 'kbuild-v5.8-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull more Kbuild updates from Masahiro Yamada: - fix build rules in binderfs sample - fix build errors when Kbuild recurses to the top Makefile - covert '---help---' in Kconfig to 'help' * tag 'kbuild-v5.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: treewide: replace '---help---' in Kconfig files with 'help' kbuild: fix broken builds because of GZIP,BZIP2,LZOP variables samples: binderfs: really compile this sample and fix build issues
2020-06-14treewide: replace '---help---' in Kconfig files with 'help'Masahiro Yamada
Since commit 84af7a6194e4 ("checkpatch: kconfig: prefer 'help' over '---help---'"), the number of '---help---' has been gradually decreasing, but there are still more than 2400 instances. This commit finishes the conversion. While I touched the lines, I also fixed the indentation. There are a variety of indentation styles found. a) 4 spaces + '---help---' b) 7 spaces + '---help---' c) 8 spaces + '---help---' d) 1 space + 1 tab + '---help---' e) 1 tab + '---help---' (correct indentation) f) 1 tab + 1 space + '---help---' g) 1 tab + 2 spaces + '---help---' In order to convert all of them to 1 tab + 'help', I ran the following commend: $ find . -name 'Kconfig*' | xargs sed -i 's/^[[:space:]]*---help---/\thelp/' Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-06-11Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge some more updates from Andrew Morton: - various hotfixes and minor things - hch's use_mm/unuse_mm clearnups Subsystems affected by this patch series: mm/hugetlb, scripts, kcov, lib, nilfs, checkpatch, lib, mm/debug, ocfs2, lib, misc. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: kernel: set USER_DS in kthread_use_mm kernel: better document the use_mm/unuse_mm API contract kernel: move use_mm/unuse_mm to kthread.c kernel: move use_mm/unuse_mm to kthread.c stacktrace: cleanup inconsistent variable type lib: test get_count_order/long in test_bitops.c mm: add comments on pglist_data zones ocfs2: fix spelling mistake and grammar mm/debug_vm_pgtable: fix kernel crash by checking for THP support lib: fix bitmap_parse() on 64-bit big endian archs checkpatch: correct check for kernel parameters doc nilfs2: fix null pointer dereference at nilfs_segctor_do_construct() lib/lz4/lz4_decompress.c: document deliberate use of `&' kcov: check kcov_softirq in kcov_remote_stop() scripts/spelling: add a few more typos khugepaged: selftests: fix timeout condition in wait_for_scan()
2020-06-10kernel: set USER_DS in kthread_use_mmChristoph Hellwig
Some architectures like arm64 and s390 require USER_DS to be set for kernel threads to access user address space, which is the whole purpose of kthread_use_mm, but other like x86 don't. That has lead to a huge mess where some callers are fixed up once they are tested on said architectures, while others linger around and yet other like io_uring try to do "clever" optimizations for what usually is just a trivial asignment to a member in the thread_struct for most architectures. Make kthread_use_mm set USER_DS, and kthread_unuse_mm restore to the previous value instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Jens Axboe <axboe@kernel.dk> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Felipe Balbi <balbi@kernel.org> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Zhi Wang <zhi.a.wang@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: http://lkml.kernel.org/r/20200404094101.672954-7-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-10kernel: better document the use_mm/unuse_mm API contractChristoph Hellwig
Switch the function documentation to kerneldoc comments, and add WARN_ON_ONCE asserts that the calling thread is a kernel thread and does not have ->mm set (or has ->mm set in the case of unuse_mm). Also give the functions a kthread_ prefix to better document the use case. [hch@lst.de: fix a comment typo, cover the newly merged use_mm/unuse_mm caller in vfio] Link: http://lkml.kernel.org/r/20200416053158.586887-3-hch@lst.de [sfr@canb.auug.org.au: powerpc/vas: fix up for {un}use_mm() rename] Link: http://lkml.kernel.org/r/20200422163935.5aa93ba5@canb.auug.org.au Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Jens Axboe <axboe@kernel.dk> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [usb] Acked-by: Haren Myneni <haren@linux.ibm.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Felipe Balbi <balbi@kernel.org> Cc: Jason Wang <jasowang@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Zhi Wang <zhi.a.wang@intel.com> Link: http://lkml.kernel.org/r/20200404094101.672954-6-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-10kernel: move use_mm/unuse_mm to kthread.cChristoph Hellwig
Patch series "improve use_mm / unuse_mm", v2. This series improves the use_mm / unuse_mm interface by better documenting the assumptions, and my taking the set_fs manipulations spread over the callers into the core API. This patch (of 3): Use the proper API instead. Link: http://lkml.kernel.org/r/20200404094101.672954-1-hch@lst.de These helpers are only for use with kernel threads, and I will tie them more into the kthread infrastructure going forward. Also move the prototypes to kthread.h - mmu_context.h was a little weird to start with as it otherwise contains very low-level MM bits. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Jens Axboe <axboe@kernel.dk> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Felipe Balbi <balbi@kernel.org> Cc: Jason Wang <jasowang@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Zhi Wang <zhi.a.wang@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: http://lkml.kernel.org/r/20200404094101.672954-1-hch@lst.de Link: http://lkml.kernel.org/r/20200416053158.586887-1-hch@lst.de Link: http://lkml.kernel.org/r/20200404094101.672954-5-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-10Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio updates from Michael Tsirkin: - virtio-mem: paravirtualized memory hotplug - support doorbell mapping for vdpa - config interrupt support in ifc - fixes all over the place * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (40 commits) vhost/test: fix up after API change virtio_mem: convert device block size into 64bit virtio-mem: drop unnecessary initialization ifcvf: implement config interrupt in IFCVF vhost: replace -1 with VHOST_FILE_UNBIND in ioctls vhost_vdpa: Support config interrupt in vdpa ifcvf: ignore continuous setting same status value virtio-mem: Don't rely on implicit compiler padding for requests virtio-mem: Try to unplug the complete online memory block first virtio-mem: Use -ETXTBSY as error code if the device is busy virtio-mem: Unplug subblocks right-to-left virtio-mem: Drop manual check for already present memory virtio-mem: Add parent resource for all added "System RAM" virtio-mem: Better retry handling virtio-mem: Offline and remove completely unplugged memory blocks mm/memory_hotplug: Introduce offline_and_remove_memory() virtio-mem: Allow to offline partially unplugged memory blocks mm: Allow to offline unmovable PageOffline() pages via MEM_GOING_OFFLINE virtio-mem: Paravirtualized memory hotunplug part 2 virtio-mem: Paravirtualized memory hotunplug part 1 ...
2020-06-09mmap locking API: use coccinelle to convert mmap_sem rwsem call sitesMichel Lespinasse
This change converts the existing mmap_sem rwsem calls to use the new mmap locking API instead. The change is generated using coccinelle with the following rule: // spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir . @@ expression mm; @@ ( -init_rwsem +mmap_init_lock | -down_write +mmap_write_lock | -down_write_killable +mmap_write_lock_killable | -down_write_trylock +mmap_write_trylock | -up_write +mmap_write_unlock | -downgrade_write +mmap_write_downgrade | -down_read +mmap_read_lock | -down_read_killable +mmap_read_lock_killable | -down_read_trylock +mmap_read_trylock | -up_read +mmap_read_unlock ) -(&mm->mmap_sem) +(mm) Signed-off-by: Michel Lespinasse <walken@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09vhost/test: fix up after API changeMichael S. Tsirkin
Pass a flag to request kernel thread use. Fixes: 01fcb1cbc88e ("vhost: allow device that does not depend on vhost worker") Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-08vhost: convert get_user_pages() --> pin_user_pages()John Hubbard
This code was using get_user_pages*(), in approximately a "Case 5" scenario (accessing the data within a page), using the categorization from [1]. That means that it's time to convert the get_user_pages*() + put_page() calls to pin_user_pages*() + unpin_user_pages() calls. There is some helpful background in [2]: basically, this is a small part of fixing a long-standing disconnect between pinning pages, and file systems' use of those pages. [1] Documentation/core-api/pin_user_pages.rst [2] "Explicit pinning of user-space pages": https://lwn.net/Articles/807108/ Signed-off-by: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Souptick Joarder <jrdr.linux@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Link: http://lkml.kernel.org/r/20200529234309.484480-3-jhubbard@nvidia.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-06vhost: replace -1 with VHOST_FILE_UNBIND in ioctlsZhu Lingshan
This commit replaces -1 with VHOST_FILE_UNBIND in ioctls since we have added such a macro in the uapi header for vdpa_host. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/1591352835-22441-5-git-send-email-lingshan.zhu@intel.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-06vhost_vdpa: Support config interrupt in vdpaZhu Lingshan
This commit implements config interrupt support in vhost_vdpa layer. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/1591352835-22441-4-git-send-email-lingshan.zhu@intel.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-05Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds
Pull SCSI updates from James Bottomley: :This series consists of the usual driver updates (qla2xxx, ufs, zfcp, target, scsi_debug, lpfc, qedi, qedf, hisi_sas, mpt3sas) plus a host of other minor updates. There are no major core changes in this series apart from a refactoring in scsi_lib.c" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (207 commits) scsi: ufs: ti-j721e-ufs: Fix unwinding of pm_runtime changes scsi: cxgb3i: Fix some leaks in init_act_open() scsi: ibmvscsi: Make some functions static scsi: iscsi: Fix deadlock on recovery path during GFP_IO reclaim scsi: ufs: Fix WriteBooster flush during runtime suspend scsi: ufs: Fix index of attributes query for WriteBooster feature scsi: ufs: Allow WriteBooster on UFS 2.2 devices scsi: ufs: Remove unnecessary memset for dev_info scsi: ufs-qcom: Fix scheduling while atomic issue scsi: mpt3sas: Fix reply queue count in non RDPQ mode scsi: lpfc: Fix lpfc_nodelist leak when processing unsolicited event scsi: target: tcmu: Fix a use after free in tcmu_check_expired_queue_cmd() scsi: vhost: Notify TCM about the maximum sg entries supported per command scsi: qla2xxx: Remove return value from qla_nvme_ls() scsi: qla2xxx: Remove an unused function scsi: iscsi: Register sysfs for iscsi workqueue scsi: scsi_debug: Parser tables and code interaction scsi: core: Refactor scsi_mq_setup_tags function scsi: core: Fix incorrect usage of shost_for_each_device scsi: qla2xxx: Fix endianness annotations in source files ...
2020-06-04vhost: (cosmetic) remove a superfluous variable initialisationGuennadi Liakhovetski
Even the compiler is able to figure out that in this case the initialisation is superfluous. Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com> Link: https://lore.kernel.org/r/20200527180541.5570-3-guennadi.liakhovetski@linux.intel.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-04vhost_vdpa: disable doorbell mapping for !MMUMichael S. Tsirkin
There could be ways to support doorbell mapping with !MMU, but things like pgprot_noncached are not universally supported. Fixable, but just disable this for now. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-04vhost_vdpa: support doorbell mapping via mmapJason Wang
Currently the doorbell is relayed via eventfd which may have significant overhead because of the cost of vmexits or syscall. This patch introduces mmap() based doorbell mapping which can eliminate the overhead caused by vmexit or syscall. To ease the userspace modeling of the doorbell layout (usually virtio-pci), this patch starts from a doorbell per page model. Vhost-vdpa only support the hardware doorbell that sit at the boundary of a page and does not share the page with other registers. Doorbell of each virtqueue must be mapped separately, pgoff is the index of the virtqueue. This allows userspace to map a subset of the doorbell which may be useful for the implementation of software assisted virtqueue (control vq) in the future. Signed-off-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20200529080303.15449-5-jasowang@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-04vhost: use mmgrab() instead of mmget() for non worker deviceJason Wang
For the device that doesn't use vhost worker and use_mm(), mmget() is too heavy weight and it may brings troubles for implementing mmap() support for vDPA device. This is because, an reference to the address space was held via mm_get() in vhost_dev_set_owner() and an reference to the file was held in mmap(). This means when process exits, the mm can not be released thus we can not release the file. This patch tries to use mmgrab() instead of mmget(), which allows the address space to be destroy in process exit without releasing the mm structure itself. This is sufficient for vDPA device which pin user pages and does not depend on the address space to work. Signed-off-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20200529080303.15449-3-jasowang@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-04vhost: allow device that does not depend on vhost workerJason Wang
vDPA device currently relays the eventfd via vhost worker. This is inefficient due the latency of wakeup and scheduling, so this patch tries to introduce a use_worker attribute for the vhost device. When use_worker is not set with vhost_dev_init(), vhost won't try to allocate a worker thread and the vhost_poll will be processed directly in the wakeup function. This help for vDPA since it reduces the latency caused by vhost worker. In my testing, it saves 0.2 ms in pings between VMs on a mutual host. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20200529080303.15449-2-jasowang@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-02vhost: revert "vhost: disable for OABI"Michael S. Tsirkin
This reverts commit d085eb8ce727 ("vhost: disable for OABI") With commit "virtio: force spec specified alignment on types" in place, we force proper alignment for all structures, so there's no longer a reason to blacklist OABI. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-02virtio: force spec specified alignment on typesMichael S. Tsirkin
The ring element addresses are passed between components with different alignments assumptions. Thus, if guest/userspace selects a pointer and host then gets and dereferences it, we might need to decrease the compiler-selected alignment to prevent compiler on the host from assuming pointer is aligned. This actually triggers on ARM with -mabi=apcs-gnu - which is a deprecated configuration, but it seems safer to handle this generally. Note that userspace that allocates the memory is actually OK and does not need to be fixed, but userspace that gets it from guest or another process does need to be fixed. The later doesn't generally talk to the kernel so while it might be buggy it's not talking to the kernel in the buggy way - it's just using the header in the buggy way - so fixing header and asking userspace to recompile is the best we can do. I verified that the produced kernel binary on x86 is exactly identical before and after the change. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2020-05-26scsi: vhost: Notify TCM about the maximum sg entries supported per commandSudhakar Panneerselvam
vhost-scsi pre-allocates the maximum sg entries per command and if a command requires more than VHOST_SCSI_PREALLOC_SGLS entries, then that command is failed by it. This patch lets vhost communicate the max sg limit when it registers vhost_scsi_ops with TCM. With this change, TCM would report the max sg entries through "Block Limits" VPD page which will be typically queried by the SCSI initiator during device discovery. By knowing this limit, the initiator could ensure the maximum transfer length is less than or equal to what is reported by vhost-scsi. Link: https://lore.kernel.org/r/1590166317-953-1-git-send-email-sudhakar.panneerselvam@oracle.com Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Sudhakar Panneerselvam <sudhakar.panneerselvam@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-05-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
The MSCC bug fix in 'net' had to be slightly adjusted because the register accesses are done slightly differently in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-21Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio fixes from Michael Tsirkin: "Fix a couple of build warnings" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vhost: missing __user tags vdpasim: remove unused variable 'ret'
2020-05-15vhost: missing __user tagsMichael S. Tsirkin
sparse warns about converting void * to void __user *. This is not new but only got noticed now that vhost is built on more systems. This is just a question of __user tags missing in a couple of places, so fix it up. Fixes: f88949138058 ("vhost: introduce O(1) vq metadata cache") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-05-14vhost_net: Also populate XDP frame sizeJesper Dangaard Brouer
In vhost_net_build_xdp() the 'buf' that gets queued via an xdp_buff have embedded a struct tun_xdp_hdr (located at xdp->data_hard_start) which contains the buffer length 'buflen' (with tailroom for skb_shared_info). Also storing this buflen in xdp->frame_sz, does not obsolete struct tun_xdp_hdr, as it also contains a struct virtio_net_hdr with other information. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/bpf/158945343928.97035.4620233649151726289.stgit@firesoul
2020-05-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix reference count leaks in various parts of batman-adv, from Xiyu Yang. 2) Update NAT checksum even when it is zero, from Guillaume Nault. 3) sk_psock reference count leak in tls code, also from Xiyu Yang. 4) Sanity check TCA_FQ_CODEL_DROP_BATCH_SIZE netlink attribute in fq_codel, from Eric Dumazet. 5) Fix panic in choke_reset(), also from Eric Dumazet. 6) Fix VLAN accel handling in bnxt_fix_features(), from Michael Chan. 7) Disallow out of range quantum values in sch_sfq, from Eric Dumazet. 8) Fix crash in x25_disconnect(), from Yue Haibing. 9) Don't pass pointer to local variable back to the caller in nf_osf_hdr_ctx_init(), from Arnd Bergmann. 10) Wireguard should use the ECN decap helper functions, from Toke Høiland-Jørgensen. 11) Fix command entry leak in mlx5 driver, from Moshe Shemesh. 12) Fix uninitialized variable access in mptcp's subflow_syn_recv_sock(), from Paolo Abeni. 13) Fix unnecessary out-of-order ingress frame ordering in macsec, from Scott Dial. 14) IPv6 needs to use a global serial number for dst validation just like ipv4, from David Ahern. 15) Fix up PTP_1588_CLOCK deps, from Clay McClure. 16) Missing NLM_F_MULTI flag in gtp driver netlink messages, from Yoshiyuki Kurauchi. 17) Fix a regression in that dsa user port errors should not be fatal, from Florian Fainelli. 18) Fix iomap leak in enetc driver, from Dejin Zheng. 19) Fix use after free in lec_arp_clear_vccs(), from Cong Wang. 20) Initialize protocol value earlier in neigh code paths when generating events, from Roman Mashak. 21) netdev_update_features() must be called with RTNL mutex in macsec driver, from Antoine Tenart. 22) Validate untrusted GSO packets even more strictly, from Willem de Bruijn. 23) Wireguard decrypt worker needs a cond_resched(), from Jason Donenfeld. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (111 commits) net: flow_offload: skip hw stats check for FLOW_ACTION_HW_STATS_DONT_CARE MAINTAINERS: put DYNAMIC INTERRUPT MODERATION in proper order wireguard: send/receive: use explicit unlikely branch instead of implicit coalescing wireguard: selftests: initalize ipv6 members to NULL to squelch clang warning wireguard: send/receive: cond_resched() when processing worker ringbuffers wireguard: socket: remove errant restriction on looping to self wireguard: selftests: use normal kernel stack size on ppc64 net: ethernet: ti: am65-cpsw-nuss: fix irqs type ionic: Use debugfs_create_bool() to export bool net: dsa: Do not leave DSA master with NULL netdev_ops net: dsa: remove duplicate assignment in dsa_slave_add_cls_matchall_mirred net: stricter validation of untrusted gso packets seg6: fix SRH processing to comply with RFC8754 net: mscc: ocelot: ANA_AUTOAGE_AGE_PERIOD holds a value in seconds, not ms net: dsa: ocelot: the MAC table on Felix is twice as large net: dsa: sja1105: the PTP_CLK extts input reacts on both edges selftests: net: tcp_mmap: fix SO_RCVLOWAT setting net: hsr: fix incorrect type usage for protocol variable net: macsec: fix rtnl locking issue net: mvpp2: cls: Prevent buffer overflow in mvpp2_ethtool_cls_rule_del() ...
2020-05-02vhost: vsock: kick send_pkt worker once device is startedJia He
Ning Bo reported an abnormal 2-second gap when booting Kata container [1]. The unconditional timeout was caused by VSOCK_DEFAULT_CONNECT_TIMEOUT of connecting from the client side. The vhost vsock client tries to connect an initializing virtio vsock server. The abnormal flow looks like: host-userspace vhost vsock guest vsock ============== =========== ============ connect() --------> vhost_transport_send_pkt_work() initializing | vq->private_data==NULL | will not be queued V schedule_timeout(2s) vhost_vsock_start() <--------- device ready set vq->private_data wait for 2s and failed connect() again vq->private_data!=NULL recv connecting pkt Details: 1. Host userspace sends a connect pkt, at that time, guest vsock is under initializing, hence the vhost_vsock_start has not been called. So vq->private_data==NULL, and the pkt is not been queued to send to guest 2. Then it sleeps for 2s 3. After guest vsock finishes initializing, vq->private_data is set 4. When host userspace wakes up after 2s, send connecting pkt again, everything is fine. As suggested by Stefano Garzarella, this fixes it by additional kicking the send_pkt worker in vhost_vsock_start once the virtio device is started. This makes the pending pkt sent again. After this patch, kata-runtime (with vsock enabled) boot time is reduced from 3s to 1s on a ThunderX2 arm64 server. [1] https://github.com/kata-containers/runtime/issues/1917 Reported-by: Ning Bo <n.b@live.com> Suggested-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Jia He <justin.he@arm.com> Link: https://lore.kernel.org/r/20200501043840.186557-1-justin.he@arm.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2020-04-27vsock/virtio: fix multiple packet delivery to monitoring devicesStefano Garzarella
In virtio_transport.c, if the virtqueue is full, the transmitting packet is queued up and it will be sent in the next iteration. This causes the same packet to be delivered multiple times to monitoring devices. We want to continue to deliver packets to monitoring devices before it is put in the virtqueue, to avoid that replies can appear in the packet capture before the transmitted packet. This patch fixes the issue, adding a new flag (tap_delivered) in struct virtio_vsock_pkt, to check if the packet is already delivered to monitoring devices. In vhost/vsock.c, we are splitting packets, so we must set 'tap_delivered' to false when we queue up the same virtio_vsock_pkt to handle the remaining bytes. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-27vhost/vsock: fix packet delivery order to monitoring devicesStefano Garzarella
We want to deliver packets to monitoring devices before it is put in the virtqueue, to avoid that replies can appear in the packet capture before the transmitted packet. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-20vhost: disable for OABIMichael S. Tsirkin
vhost is currently broken on the some ARM configs. The reason is that the ring element addresses are passed between components with different alignments assumptions. Thus, if guest selects a pointer and host then gets and dereferences it, then alignment assumed by the host's compiler might be greater than the actual alignment of the pointer. compiler on the host from assuming pointer is aligned. This actually triggers on ARM with -mabi=apcs-gnu - which is a deprecated configuration. With this OABI, compiler assumes that all structures are 4 byte aligned - which is stronger than virtio guarantees for available and used rings, which are merely 2 bytes. Thus a guest without -mabi=apcs-gnu running on top of host with -mabi=apcs-gnu will be broken. The correct fix is to force alignment of structures - however that is an intrusive fix that's best deferred until the next release. We didn't previously support such ancient systems at all - this surfaced after vdpa support prompted removing dependency of vhost on VIRTULIZATION. So for now, let's just add something along the lines of depends on !ARM || AEABI to the virtio Kconfig declaration, and add a comment that it has to do with struct member alignment. Note: we can't make VHOST and VHOST_RING themselves have a dependency since these are selected. Add a new symbol for that. We should be able to drop this dependency down the road. Fixes: 20c384f1ea1a0bc7 ("vhost: refine vhost and vringh kconfig") Suggested-by: Ard Biesheuvel <ardb@kernel.org> Suggested-by: Richard Earnshaw <Richard.Earnshaw@arm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-04-17vdpa: make vhost, virtio depend on menuMichael S. Tsirkin
If user did not configure any vdpa drivers, neither vhost nor virtio vdpa are going to be useful. So there's no point in prompting for these and selecting vdpa core automatically. Simplify configuration by making virtio and vhost vdpa drivers depend on vdpa menu entry. Once done, we no longer need a separate menu entry, so also get rid of this. While at it, fix up the IFC entry: VDPA->vDPA for consistency with other places. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2020-04-16virtio/test: fix up after IOTLB changesMichael S. Tsirkin
Allow building vringh without IOTLB (that's the case for userspace builds, will be useful for CAIF/VOD down the road too). Update for API tweaks. Don't include vringh with userspace builds. Cc: Jason Wang <jasowang@redhat.com> Cc: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2020-04-16vhost: Create accessors for virtqueues private_dataEugenio Pérez
Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Link: https://lore.kernel.org/r/20200331192804.6019-2-eperezma@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-04-16vhost: remove set but not used variable 'status'Jason Yan
Fix the following gcc warning: drivers/vhost/vdpa.c:299:5: warning: variable 'status' set but not used [-Wunused-but-set-variable] u8 status; ^~~~~~ Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Jason Yan <yanaijie@huawei.com> Link: https://lore.kernel.org/r/20200402065106.20108-1-yanaijie@huawei.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-04-16vhost: vdpa: remove unnecessary null checkGustavo A. R. Silva
container_of is never null, so this null check is unnecessary. Addresses-Coverity-ID: 1492006 ("Logically dead code") Fixes: 20453a45fb06 ("vhost: introduce vDPA-based backend") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Link: https://lore.kernel.org/r/20200330235040.GA9997@embeddedor Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2020-04-02vhost: introduce vDPA-based backendTiwei Bie
This patch introduces a vDPA-based vhost backend. This backend is built on top of the same interface defined in virtio-vDPA and provides a generic vhost interface for userspace to accelerate the virtio devices in guest. This backend is implemented as a vDPA device driver on top of the same ops used in virtio-vDPA. It will create char device entry named vhost-vdpa-$index for userspace to use. Userspace can use vhost ioctls on top of this char device to setup the backend. Vhost ioctls are extended to make it type agnostic and behave like a virtio device, this help to eliminate type specific API like what vhost_net/scsi/vsock did: - VHOST_VDPA_GET_DEVICE_ID: get the virtio device ID which is defined by virtio specification to differ from different type of devices - VHOST_VDPA_GET_VRING_NUM: get the maximum size of virtqueue supported by the vDPA device - VHSOT_VDPA_SET/GET_STATUS: set and get virtio status of vDPA device - VHOST_VDPA_SET/GET_CONFIG: access virtio config space - VHOST_VDPA_SET_VRING_ENABLE: enable a specific virtqueue For memory mapping, IOTLB API is mandated for vhost-vDPA which means userspace drivers are required to use VHOST_IOTLB_UPDATE/VHOST_IOTLB_INVALIDATE to add or remove mapping for a specific userspace memory region. The vhost-vDPA API is designed to be type agnostic, but it allows net device only in current stage. Due to the lacking of control virtqueue support, some features were filter out by vhost-vdpa. We will enable more features and devices in the near future. Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20200326140125.19794-8-jasowang@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-04-01vringh: IOTLB supportJason Wang
This patch implements the third memory accessor for vringh besides current kernel and userspace accessors. This idea is to allow vringh to do the address translation through an IOTLB which is implemented via vhost_map interval tree. Users should setup and IOVA to PA mapping in this IOTLB. This allows us to: - Use vringh to access virtqueues with vIOMMU - Use vringh to implement software virtqueues for vDPA devices Signed-off-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20200326140125.19794-5-jasowang@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-04-01vhost: factor out IOTLBJason Wang
This patch factors out IOTLB into a dedicated module in order to be reused by other modules like vringh. User may choose to enable the automatic retiring by specifying VHOST_IOTLB_FLAG_RETIRE flag to fit for the case of vhost device IOTLB implementation. Signed-off-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20200326140125.19794-4-jasowang@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-04-01vhost: allow per device message handlerJason Wang
This patch allow device to register its own message handler during vhost_dev_init(). vDPA device will use it to implement its own DMA mapping logic. Signed-off-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20200326140125.19794-3-jasowang@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-04-01vhost: refine vhost and vringh kconfigJason Wang
Currently, CONFIG_VHOST depends on CONFIG_VIRTUALIZATION. But vhost is not necessarily for VM since it's a generic userspace and kernel communication protocol. Such dependency may prevent archs without virtualization support from using vhost. To solve this, a dedicated vhost menu is created under drivers so CONIFG_VHOST can be decoupled out of CONFIG_VIRTUALIZATION. While at it, also squash Kconfig.vringh into vhost Kconfig file. This avoids the trick of conditional inclusion from VOP or CAIF. Then it will be easier to introduce new vringh users and common dependency for both vringh and vhost. Signed-off-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20200326140125.19794-2-jasowang@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-02-22vhost: Check docket sk_family instead of call getnameEugenio Pérez
Doing so, we save one call to get data we already have in the struct. Also, since there is no guarantee that getname use sockaddr_ll parameter beyond its size, we add a little bit of security here. It should do not do beyond MAX_ADDR_LEN, but syzbot found that ax25_getname writes more (72 bytes, the size of full_sockaddr_ax25, versus 20 + 32 bytes of sockaddr_ll + MAX_ADDR_LEN in syzbot repro). Fixes: 3a4d5c94e9593 ("vhost_net: a kernel-level virtio server") Reported-by: syzbot+f2a62d07a5198c819c7b@syzkaller.appspotmail.com Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from David Miller: 1) More jumbo frame fixes in r8169, from Heiner Kallweit. 2) Fix bpf build in minimal configuration, from Alexei Starovoitov. 3) Use after free in slcan driver, from Jouni Hogander. 4) Flower classifier port ranges don't work properly in the HW offload case, from Yoshiki Komachi. 5) Use after free in hns3_nic_maybe_stop_tx(), from Yunsheng Lin. 6) Out of bounds access in mqprio_dump(), from Vladyslav Tarasiuk. 7) Fix flow dissection in dsa TX path, from Alexander Lobakin. 8) Stale syncookie timestampe fixes from Guillaume Nault. [ Did an evil merge to silence a warning introduced by this pull - Linus ] * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (84 commits) r8169: fix rtl_hw_jumbo_disable for RTL8168evl net_sched: validate TCA_KIND attribute in tc_chain_tmplt_add() r8169: add missing RX enabling for WoL on RTL8125 vhost/vsock: accept only packets with the right dst_cid net: phy: dp83867: fix hfs boot in rgmii mode net: ethernet: ti: cpsw: fix extra rx interrupt inet: protect against too small mtu values. gre: refetch erspan header from skb->data after pskb_may_pull() pppoe: remove redundant BUG_ON() check in pppoe_pernet tcp: Protect accesses to .ts_recent_stamp with {READ,WRITE}_ONCE() tcp: tighten acceptance of ACKs not matching a child socket tcp: fix rejected syncookies due to stale timestamps lpc_eth: kernel BUG on remove tcp: md5: fix potential overestimation of TCP option space net: sched: allow indirect blocks to bind to clsact in TC net: core: rename indirect block ingress cb function net-sysfs: Call dev_hold always in netdev_queue_add_kobject net: dsa: fix flow dissection on Tx path net/tls: Fix return values to avoid ENOTSUPP net: avoid an indirect call in ____sys_recvmsg() ...
2019-12-07vhost/vsock: accept only packets with the right dst_cidStefano Garzarella
When we receive a new packet from the guest, we check if the src_cid is correct, but we forgot to check the dst_cid. The host should accept only packets where dst_cid is equal to the host CID. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>