aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/vhost
AgeCommit message (Collapse)Author
2019-09-16vhost: make sure log_num < in_numyongduan
commit 060423bfdee3f8bc6e2c1bac97de24d5415e2bc4 upstream. The code assumes log_num < in_num everywhere, and that is true as long as in_num is incremented by descriptor iov count, and log_num by 1. However this breaks if there's a zero sized descriptor. As a result, if a malicious guest creates a vring desc with desc.len = 0, it may cause the host kernel to crash by overflowing the log array. This bug can be triggered during the VM migration. There's no need to log when desc.len = 0, so just don't increment log_num in this case. Fixes: 3a4d5c94e959 ("vhost_net: a kernel-level virtio server") Cc: stable@vger.kernel.org Reviewed-by: Lidong Chen <lidongchen@tencent.com> Signed-off-by: ruippan <ruippan@tencent.com> Signed-off-by: yongduan <yongduan@tencent.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-16vhost/test: fix build for vhost test - againTiwei Bie
commit 264b563b8675771834419057cbe076c1a41fb666 upstream. Since vhost_exceeds_weight() was introduced, callers need to specify the packet weight and byte weight in vhost_dev_init(). Note that, the packet weight isn't counted in this patch to keep the original behavior unchanged. Fixes: e82b9b0727ff ("vhost: introduce vhost_exceeds_weight()") Cc: stable@vger.kernel.org Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-16vhost/test: fix build for vhost testTiwei Bie
commit 93d2c4de8d8129b97ee1e1a222aedb0719d2fcd9 upstream. Since below commit, callers need to specify the iov_limit in vhost_dev_init() explicitly. Fixes: b46a0bf78ad7 ("vhost: fix OOB in get_rx_bufs()") Cc: stable@vger.kernel.org Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-26vhost_net: disable zerocopy by defaultJason Wang
[ Upstream commit 098eadce3c622c07b328d0a43dda379b38cf7c5e ] Vhost_net was known to suffer from HOL[1] issues which is not easy to fix. Several downstream disable the feature by default. What's more, the datapath was split and datacopy path got the support of batching and XDP support recently which makes it faster than zerocopy part for small packets transmission. It looks to me that disable zerocopy by default is more appropriate. It cold be enabled by default again in the future if we fix the above issues. [1] https://patchwork.kernel.org/patch/3787671/ Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-06-19treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 482Thomas Gleixner
Based on 1 normalized pattern(s): this work is licensed under the terms of the gnu gpl version 2 extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 48 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Enrico Weigelt <info@metux.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190604081204.624030236@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-27vhost: scsi: add weight supportJason Wang
This patch will check the weight and exit the loop if we exceeds the weight. This is useful for preventing scsi kthread from hogging cpu which is guest triggerable. This addresses CVE-2019-3900. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Fixes: 057cbf49a1f0 ("tcm_vhost: Initial merge for vhost level target fabric driver") Signed-off-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-05-27vhost: vsock: add weight supportJason Wang
This patch will check the weight and exit the loop if we exceeds the weight. This is useful for preventing vsock kthread from hogging cpu which is guest triggerable. The weight can help to avoid starving the request from on direction while another direction is being processed. The value of weight is picked from vhost-net. This addresses CVE-2019-3900. Cc: Stefan Hajnoczi <stefanha@redhat.com> Fixes: 433fc58e6bf2 ("VSOCK: Introduce vhost_vsock.ko") Signed-off-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-05-27vhost_net: fix possible infinite loopJason Wang
When the rx buffer is too small for a packet, we will discard the vq descriptor and retry it for the next packet: while ((sock_len = vhost_net_rx_peek_head_len(net, sock->sk, &busyloop_intr))) { ... /* On overrun, truncate and discard */ if (unlikely(headcount > UIO_MAXIOV)) { iov_iter_init(&msg.msg_iter, READ, vq->iov, 1, 1); err = sock->ops->recvmsg(sock, &msg, 1, MSG_DONTWAIT | MSG_TRUNC); pr_debug("Discarded rx packet: len %zd\n", sock_len); continue; } ... } This makes it possible to trigger a infinite while..continue loop through the co-opreation of two VMs like: 1) Malicious VM1 allocate 1 byte rx buffer and try to slow down the vhost process as much as possible e.g using indirect descriptors or other. 2) Malicious VM2 generate packets to VM1 as fast as possible Fixing this by checking against weight at the end of RX and TX loop. This also eliminate other similar cases when: - userspace is consuming the packets in the meanwhile - theoretical TOCTOU attack if guest moving avail index back and forth to hit the continue after vhost find guest just add new buffers This addresses CVE-2019-3900. Fixes: d8316f3991d20 ("vhost: fix total length when packets are too short") Fixes: 3a4d5c94e9593 ("vhost_net: a kernel-level virtio server") Signed-off-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-05-27vhost: introduce vhost_exceeds_weight()Jason Wang
We used to have vhost_exceeds_weight() for vhost-net to: - prevent vhost kthread from hogging the cpu - balance the time spent between TX and RX This function could be useful for vsock and scsi as well. So move it to vhost.c. Device must specify a weight which counts the number of requests, or it can also specific a byte_weight which counts the number of bytes that has been processed. Signed-off-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-05-21treewide: Add SPDX license identifier - Makefile/KconfigThomas Gleixner
Add SPDX license identifiers to all Make/Kconfig files which: - Have no license information of any form These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-21treewide: Add SPDX license identifier for more missed filesThomas Gleixner
Add SPDX license identifiers to all files which: - Have no license information of any form - Have MODULE_LICENCE("GPL*") inside which was used in the initial scan/conversion to ignore the file These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-14Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio updates from Michael Tsirkin: - enable packed ring support for s390 - several fixes * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio/s390: enable packed ring virtio/s390: DMA support for virtio-ccw virtio/s390: use vring_create_virtqueue virtio/virtio_ring: do some comment fixes vhost-scsi: remove incorrect memory barrier tools/virtio/ringtest: Remove bogus definition of BUG_ON() virtio_ring: Fix potential mem leak in virtqueue_add_indirect_packed
2019-05-14mm/gup: change GUP fast to use flags rather than a write 'bool'Ira Weiny
To facilitate additional options to get_user_pages_fast() change the singular write parameter to be gup_flags. This patch does not change any functionality. New functionality will follow in subsequent patches. Some of the get_user_pages_fast() call sites were unchanged because they already passed FOLL_WRITE or 0 for the write parameter. NOTE: It was suggested to change the ordering of the get_user_pages_fast() arguments to ensure that callers were converted. This breaks the current GUP call site convention of having the returned pages be the final parameter. So the suggestion was rejected. Link: http://lkml.kernel.org/r/20190328084422.29911-4-ira.weiny@intel.com Link: http://lkml.kernel.org/r/20190317183438.2057-4-ira.weiny@intel.com Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Mike Marshall <hubcap@omnibond.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Hogan <jhogan@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Rich Felker <dalias@libc.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-12vhost-scsi: remove incorrect memory barrierPaolo Bonzini
At this point, vs_tpg is not public at all; tv_tpg_vhost_count is accessed under tpg->tv_tpg_mutex; tpg->vhost_scsi is accessed under vhost_scsi_mutex. Therefor there are no atomic operations involved at all here, just remove the barrier. Reported-by: Andrea Parri <andrea.parri@amarulasolutions.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2019-04-10vhost: reject zero size iova rangeJason Wang
We used to accept zero size iova range which will lead a infinite loop in translate_desc(). Fixing this by failing the request in this case. Reported-by: syzbot+d21e6e297322a900c128@syzkaller.appspotmail.com Fixes: 6b1e6cc7 ("vhost: new device IOTLB API") Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-10Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio updates from Michael Tsirkin: "Several fixes, most notably fix for virtio on swiotlb systems" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vhost: silence an unused-variable warning virtio: hint if callbacks surprisingly might sleep virtio-ccw: wire up ->bus_name callback s390/virtio: handle find on invalid queue gracefully virtio-ccw: diag 500 may return a negative cookie virtio_balloon: remove the unnecessary 0-initialization virtio-balloon: improve update_balloon_size_func virtio-blk: Consider virtio_max_dma_size() for maximum segment size virtio: Introduce virtio_max_dma_size() dma: Introduce dma_max_mapping_size() swiotlb: Add is_swiotlb_active() function swiotlb: Introduce swiotlb_max_mapping_size()
2019-03-09Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds
Pull SCSI updates from James Bottomley: "This is mostly update of the usual drivers: arcmsr, qla2xxx, lpfc, hisi_sas, target/iscsi and target/core. Additionally Christoph refactored gdth as part of the dma changes. The major mid-layer change this time is the removal of bidi commands and with them the whole of the osd/exofs driver and filesystem. This is a major simplification for block and mq in particular" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (240 commits) scsi: cxgb4i: validate tcp sequence number only if chip version <= T5 scsi: cxgb4i: get pf number from lldi->pf scsi: core: replace GFP_ATOMIC with GFP_KERNEL in scsi_scan.c scsi: mpt3sas: Add missing breaks in switch statements scsi: aacraid: Fix missing break in switch statement scsi: kill command serial number scsi: csiostor: drop serial_number usage scsi: mvumi: use request tag instead of serial_number scsi: dpt_i2o: remove serial number usage scsi: st: osst: Remove negative constant left-shifts scsi: ufs-bsg: Allow reading descriptors scsi: ufs: Allow reading descriptor via raw upiu scsi: ufs-bsg: Change the calling convention for write descriptor scsi: ufs: Remove unused device quirks Revert "scsi: ufs: disable vccq if it's not needed by UFS device" scsi: megaraid_sas: Remove a bunch of set but not used variables scsi: clean obsolete return values of eh_timed_out scsi: sd: Optimal I/O size should be a multiple of physical block size scsi: MAINTAINERS: SCSI initiator and target tweaks scsi: fcoe: make use of fip_mode enum complete ...
2019-03-06vhost: silence an unused-variable warningArnd Bergmann
On some architectures, the MMU can be disabled, leading to access_ok() becoming an empty macro that does not evaluate its size argument, which in turn produces an unused-variable warning: drivers/vhost/vhost.c:1191:9: error: unused variable 's' [-Werror,-Wunused-variable] size_t s = vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX) ? 2 : 0; Mark the variable as __maybe_unused to shut up that warning. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-02-19vhost: correctly check the return value of translate_desc() in log_used()Jason Wang
When fail, translate_desc() returns negative value, otherwise the number of iovs. So we should fail when the return value is negative instead of a blindly check against zero. Detected by CoverityScan, CID# 1442593: Control flow issues (DEADCODE) Fixes: cc5e71075947 ("vhost: log dirty page correctly") Acked-by: Michael S. Tsirkin <mst@redhat.com> Reported-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-04scsi: target/core: Remove the write_pending_status() callback functionBart Van Assche
Due to the patch that makes TMF handling synchronous the write_pending_status() callback function is no longer called. Hence remove it. Acked-by: Felipe Balbi <balbi@ti.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Andy Grover <agrover@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com> Cc: Nicholas Bellinger <nab@linux-iscsi.org> Cc: Mike Christie <mchristi@redhat.com> Cc: Himanshu Madhani <himanshu.madhani@qlogic.com> Cc: Quinn Tran <quinn.tran@qlogic.com> Cc: Saurav Kashyap <saurav.kashyap@qlogic.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Juergen Gross <jgross@suse.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-01-28vhost: fix OOB in get_rx_bufs()Jason Wang
After batched used ring updating was introduced in commit e2b3b35eb989 ("vhost_net: batch used ring update in rx"). We tend to batch heads in vq->heads for more than one packet. But the quota passed to get_rx_bufs() was not correctly limited, which can result a OOB write in vq->heads. headcount = get_rx_bufs(vq, vq->heads + nvq->done_idx, vhost_len, &in, vq_log, &log, likely(mergeable) ? UIO_MAXIOV : 1); UIO_MAXIOV was still used which is wrong since we could have batched used in vq->heads, this will cause OOB if the next buffer needs more than 960 (1024 (UIO_MAXIOV) - 64 (VHOST_NET_BATCH)) heads after we've batched 64 (VHOST_NET_BATCH) heads: Acked-by: Stefan Hajnoczi <stefanha@redhat.com> ============================================================================= BUG kmalloc-8k (Tainted: G B ): Redzone overwritten ----------------------------------------------------------------------------- INFO: 0x00000000fd93b7a2-0x00000000f0713384. First byte 0xa9 instead of 0xcc INFO: Allocated in alloc_pd+0x22/0x60 age=3933677 cpu=2 pid=2674 kmem_cache_alloc_trace+0xbb/0x140 alloc_pd+0x22/0x60 gen8_ppgtt_create+0x11d/0x5f0 i915_ppgtt_create+0x16/0x80 i915_gem_create_context+0x248/0x390 i915_gem_context_create_ioctl+0x4b/0xe0 drm_ioctl_kernel+0xa5/0xf0 drm_ioctl+0x2ed/0x3a0 do_vfs_ioctl+0x9f/0x620 ksys_ioctl+0x6b/0x80 __x64_sys_ioctl+0x11/0x20 do_syscall_64+0x43/0xf0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 INFO: Slab 0x00000000d13e87af objects=3 used=3 fp=0x (null) flags=0x200000000010201 INFO: Object 0x0000000003278802 @offset=17064 fp=0x00000000e2e6652b Fixing this by allocating UIO_MAXIOV + VHOST_NET_BATCH iovs for vhost-net. This is done through set the limitation through vhost_dev_init(), then set_owner can allocate the number of iov in a per device manner. This fixes CVE-2018-16880. Fixes: e2b3b35eb989 ("vhost_net: batch used ring update in rx") Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix endless loop in nf_tables, from Phil Sutter. 2) Fix cross namespace ip6_gre tunnel hash list corruption, from Olivier Matz. 3) Don't be too strict in phy_start_aneg() otherwise we might not allow restarting auto negotiation. From Heiner Kallweit. 4) Fix various KMSAN uninitialized value cases in tipc, from Ying Xue. 5) Memory leak in act_tunnel_key, from Davide Caratti. 6) Handle chip errata of mv88e6390 PHY, from Andrew Lunn. 7) Remove linear SKB assumption in fou/fou6, from Eric Dumazet. 8) Missing udplite rehash callbacks, from Alexey Kodanev. 9) Log dirty pages properly in vhost, from Jason Wang. 10) Use consume_skb() in neigh_probe() as this is a normal free not a drop, from Yang Wei. Likewise in macvlan_process_broadcast(). 11) Missing device_del() in mdiobus_register() error paths, from Thomas Petazzoni. 12) Fix checksum handling of short packets in mlx5, from Cong Wang. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (96 commits) bpf: in __bpf_redirect_no_mac pull mac only if present virtio_net: bulk free tx skbs net: phy: phy driver features are mandatory isdn: avm: Fix string plus integer warning from Clang net/mlx5e: Fix cb_ident duplicate in indirect block register net/mlx5e: Fix wrong (zero) TX drop counter indication for representor net/mlx5e: Fix wrong error code return on FEC query failure net/mlx5e: Force CHECKSUM_UNNECESSARY for short ethernet frames tools: bpftool: Cleanup license mess bpf: fix inner map masking to prevent oob under speculation bpf: pull in pkt_sched.h header for tooling to fix bpftool build selftests: forwarding: Add a test case for externally learned FDB entries selftests: mlxsw: Test FDB offload indication mlxsw: spectrum_switchdev: Do not treat static FDB entries as sticky net: bridge: Mark FDB entries that were added by user as such mlxsw: spectrum_fid: Update dummy FID index mlxsw: pci: Return error on PCI reset timeout mlxsw: pci: Increase PCI SW reset timeout mlxsw: pci: Ring CQ's doorbell before RDQ's MAINTAINERS: update email addresses of liquidio driver maintainers ...
2019-01-21Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio/vhost fixes and cleanups from Michael Tsirkin: "Fixes and cleanups all over the place" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vhost/scsi: Use copy_to_iter() to send control queue response vhost: return EINVAL if iovecs size does not match the message size virtio-balloon: tweak config_changed implementation virtio: don't allocate vqs when names[i] = NULL virtio_pci: use queue idx instead of array idx to set up the vq virtio: document virtio_config_ops restrictions virtio: fix virtio_config_ops description
2019-01-17vhost: log dirty page correctlyJason Wang
Vhost dirty page logging API is designed to sync through GPA. But we try to log GIOVA when device IOTLB is enabled. This is wrong and may lead to missing data after migration. To solve this issue, when logging with device IOTLB enabled, we will: 1) reuse the device IOTLB translation result of GIOVA->HVA mapping to get HVA, for writable descriptor, get HVA through iovec. For used ring update, translate its GIOVA to HVA 2) traverse the GPA->HVA mapping to get the possible GPA and log through GPA. Pay attention this reverse mapping is not guaranteed to be unique, so we should log each possible GPA in this case. This fix the failure of scp to guest during migration. In -next, we will probably support passing GIOVA->GPA instead of GIOVA->HVA. Fixes: 6b1e6cc7855b ("vhost: new device IOTLB API") Reported-by: Jintack Lim <jintack@cs.columbia.edu> Cc: Jintack Lim <jintack@cs.columbia.edu> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-14vhost/scsi: Use copy_to_iter() to send control queue responseBijan Mottahedeh
Uses copy_to_iter() instead of __copy_to_user() in order to ensure we support arbitrary layouts and an input buffer split across iov entries. Fixes: 0d02dbd68c47b ("vhost/scsi: Respond to control queue operations") Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-01-14vhost: return EINVAL if iovecs size does not match the message sizePavel Tikhomirov
We've failed to copy and process vhost_iotlb_msg so let userspace at least know about it. For instance before these patch the code below runs without any error: int main() { struct vhost_msg msg; struct iovec iov; int fd; fd = open("/dev/vhost-net", O_RDWR); if (fd == -1) { perror("open"); return 1; } iov.iov_base = &msg; iov.iov_len = sizeof(msg)-4; if (writev(fd, &iov,1) == -1) { perror("writev"); return 1; } return 0; } Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-01-11vhost/vsock: fix vhost vsock cid hashing inconsistentZha Bin
The vsock core only supports 32bit CID, but the Virtio-vsock spec define CID (dst_cid and src_cid) as u64 and the upper 32bits is reserved as zero. This inconsistency causes one bug in vhost vsock driver. The scenarios is: 0. A hash table (vhost_vsock_hash) is used to map an CID to a vsock object. And hash_min() is used to compute the hash key. hash_min() is defined as: (sizeof(val) <= 4 ? hash_32(val, bits) : hash_long(val, bits)). That means the hash algorithm has dependency on the size of macro argument 'val'. 0. In function vhost_vsock_set_cid(), a 64bit CID is passed to hash_min() to compute the hash key when inserting a vsock object into the hash table. 0. In function vhost_vsock_get(), a 32bit CID is passed to hash_min() to compute the hash key when looking up a vsock for an CID. Because the different size of the CID, hash_min() returns different hash key, thus fails to look up the vsock object for an CID. To fix this bug, we keep CID as u64 in the IOCTLs and virtio message headers, but explicitly convert u64 to u32 when deal with the hash table and vsock core. Fixes: 834e772c8db0 ("vhost/vsock: fix use-after-free in network stack callers") Link: https://github.com/stefanha/virtio/blob/vsock/trunk/content.tex Signed-off-by: Zha Bin <zhabin@linux.alibaba.com> Reviewed-by: Liu Jiang <gerry@linux.alibaba.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-03Remove 'type' argument from access_ok() functionLinus Torvalds
Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument of the user address range verification function since we got rid of the old racy i386-only code to walk page tables by hand. It existed because the original 80386 would not honor the write protect bit when in kernel mode, so you had to do COW by hand before doing any user access. But we haven't supported that in a long time, and these days the 'type' argument is a purely historical artifact. A discussion about extending 'user_access_begin()' to do the range checking resulted this patch, because there is no way we're going to move the old VERIFY_xyz interface to that model. And it's best done at the end of the merge window when I've done most of my merges, so let's just get this done once and for all. This patch was mostly done with a sed-script, with manual fix-ups for the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form. There were a couple of notable cases: - csky still had the old "verify_area()" name as an alias. - the iter_iov code had magical hardcoded knowledge of the actual values of VERIFY_{READ,WRITE} (not that they mattered, since nothing really used it) - microblaze used the type argument for a debug printout but other than those oddities this should be a total no-op patch. I tried to fix up all architectures, did fairly extensive grepping for access_ok() uses, and the changes are trivial, but I may have missed something. Any missed conversion should be trivially fixable, though. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-02Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio/vhost updates from Michael Tsirkin: "Features, fixes, cleanups: - discard in virtio blk - misc fixes and cleanups" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vhost: correct the related warning message vhost: split structs into a separate header file virtio: remove deprecated VIRTIO_PCI_CONFIG() vhost/vsock: switch to a mutex for vhost_vsock_hash virtio_blk: add discard and write zeroes support
2018-12-28Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds
Pull SCSI updates from James Bottomley: "This is mostly update of the usual drivers: smarpqi, lpfc, qedi, megaraid_sas, libsas, zfcp, mpt3sas, hisi_sas. Additionally, we have a pile of annotation, unused variable and minor updates. The big API change is the updates for Christoph's DMA rework which include removing the DISABLE_CLUSTERING flag. And finally there are a couple of target tree updates" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (259 commits) scsi: isci: request: mark expected switch fall-through scsi: isci: remote_node_context: mark expected switch fall-throughs scsi: isci: remote_device: Mark expected switch fall-throughs scsi: isci: phy: Mark expected switch fall-through scsi: iscsi: Capture iscsi debug messages using tracepoints scsi: myrb: Mark expected switch fall-throughs scsi: megaraid: fix out-of-bound array accesses scsi: mpt3sas: mpt3sas_scsih: Mark expected switch fall-through scsi: fcoe: remove set but not used variable 'port' scsi: smartpqi: call pqi_free_interrupts() in pqi_shutdown() scsi: smartpqi: fix build warnings scsi: smartpqi: update driver version scsi: smartpqi: add ofa support scsi: smartpqi: increase fw status register read timeout scsi: smartpqi: bump driver version scsi: smartpqi: add smp_utils support scsi: smartpqi: correct lun reset issues scsi: smartpqi: correct volume status scsi: smartpqi: do not offline disks for transient did no connect conditions scsi: smartpqi: allow for larger raid maps ...
2018-12-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: 1) New ipset extensions for matching on destination MAC addresses, from Stefano Brivio. 2) Add ipv4 ttl and tos, plus ipv6 flow label and hop limit offloads to nfp driver. From Stefano Brivio. 3) Implement GRO for plain UDP sockets, from Paolo Abeni. 4) Lots of work from Michał Mirosław to eliminate the VLAN_TAG_PRESENT bit so that we could support the entire vlan_tci value. 5) Rework the IPSEC policy lookups to better optimize more usecases, from Florian Westphal. 6) Infrastructure changes eliminating direct manipulation of SKB lists wherever possible, and to always use the appropriate SKB list helpers. This work is still ongoing... 7) Lots of PHY driver and state machine improvements and simplifications, from Heiner Kallweit. 8) Various TSO deferral refinements, from Eric Dumazet. 9) Add ntuple filter support to aquantia driver, from Dmitry Bogdanov. 10) Batch dropping of XDP packets in tuntap, from Jason Wang. 11) Lots of cleanups and improvements to the r8169 driver from Heiner Kallweit, including support for ->xmit_more. This driver has been getting some much needed love since he started working on it. 12) Lots of new forwarding selftests from Petr Machata. 13) Enable VXLAN learning in mlxsw driver, from Ido Schimmel. 14) Packed ring support for virtio, from Tiwei Bie. 15) Add new Aquantia AQtion USB driver, from Dmitry Bezrukov. 16) Add XDP support to dpaa2-eth driver, from Ioana Ciocoi Radulescu. 17) Implement coalescing on TCP backlog queue, from Eric Dumazet. 18) Implement carrier change in tun driver, from Nicolas Dichtel. 19) Support msg_zerocopy in UDP, from Willem de Bruijn. 20) Significantly improve garbage collection of neighbor objects when the table has many PERMANENT entries, from David Ahern. 21) Remove egdev usage from nfp and mlx5, and remove the facility completely from the tree as it no longer has any users. From Oz Shlomo and others. 22) Add a NETDEV_PRE_CHANGEADDR so that drivers can veto the change and therefore abort the operation before the commit phase (which is the NETDEV_CHANGEADDR event). From Petr Machata. 23) Add indirect call wrappers to avoid retpoline overhead, and use them in the GRO code paths. From Paolo Abeni. 24) Add support for netlink FDB get operations, from Roopa Prabhu. 25) Support bloom filter in mlxsw driver, from Nir Dotan. 26) Add SKB extension infrastructure. This consolidates the handling of the auxiliary SKB data used by IPSEC and bridge netfilter, and is designed to support the needs to MPTCP which could be integrated in the future. 27) Lots of XDP TX optimizations in mlx5 from Tariq Toukan. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1845 commits) net: dccp: fix kernel crash on module load drivers/net: appletalk/cops: remove redundant if statement and mask bnx2x: Fix NULL pointer dereference in bnx2x_del_all_vlans() on some hw net/net_namespace: Check the return value of register_pernet_subsys() net/netlink_compat: Fix a missing check of nla_parse_nested ieee802154: lowpan_header_create check must check daddr net/mlx4_core: drop useless LIST_HEAD mlxsw: spectrum: drop useless LIST_HEAD net/mlx5e: drop useless LIST_HEAD iptunnel: Set tun_flags in the iptunnel_metadata_reply from src net/mlx5e: fix semicolon.cocci warnings staging: octeon: fix build failure with XFRM enabled net: Revert recent Spectre-v1 patches. can: af_can: Fix Spectre v1 vulnerability packet: validate address length if non-zero nfc: af_nfc: Fix Spectre v1 vulnerability phonet: af_phonet: Fix Spectre v1 vulnerability net: core: Fix Spectre v1 vulnerability net: minor cleanup in skb_ext_add() net: drop the unused helper skb_ext_get() ...
2018-12-26Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "The biggest RCU changes in this cycle were: - Convert RCU's BUG_ON() and similar calls to WARN_ON() and similar. - Replace calls of RCU-bh and RCU-sched update-side functions to their vanilla RCU counterparts. This series is a step towards complete removal of the RCU-bh and RCU-sched update-side functions. ( Note that some of these conversions are going upstream via their respective maintainers. ) - Documentation updates, including a number of flavor-consolidation updates from Joel Fernandes. - Miscellaneous fixes. - Automate generation of the initrd filesystem used for rcutorture testing. - Convert spin_is_locked() assertions to instead use lockdep. ( Note that some of these conversions are going upstream via their respective maintainers. ) - SRCU updates, especially including a fix from Dennis Krein for a bag-on-head-class bug. - RCU torture-test updates" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (112 commits) rcutorture: Don't do busted forward-progress testing rcutorture: Use 100ms buckets for forward-progress callback histograms rcutorture: Recover from OOM during forward-progress tests rcutorture: Print forward-progress test age upon failure rcutorture: Print time since GP end upon forward-progress failure rcutorture: Print histogram of CB invocation at OOM time rcutorture: Print GP age upon forward-progress failure rcu: Print per-CPU callback counts for forward-progress failures rcu: Account for nocb-CPU callback counts in RCU CPU stall warnings rcutorture: Dump grace-period diagnostics upon forward-progress OOM rcutorture: Prepare for asynchronous access to rcu_fwd_startat torture: Remove unnecessary "ret" variables rcutorture: Affinity forward-progress test to avoid housekeeping CPUs rcutorture: Break up too-long rcu_torture_fwd_prog() function rcutorture: Remove cbflood facility torture: Bring any extra CPUs online during kernel startup rcutorture: Add call_rcu() flooding forward-progress tests rcutorture/formal: Replace synchronize_sched() with synchronize_rcu() tools/kernel.h: Replace synchronize_sched() with synchronize_rcu() net/decnet: Replace rcu_barrier_bh() with rcu_barrier() ...
2018-12-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Lots of conflicts, by happily all cases of overlapping changes, parallel adds, things of that nature. Thanks to Stephen Rothwell, Saeed Mahameed, and others for their guidance in these resolutions. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19vhost: correct the related warning messagewangyan
Fixes: 'commit d588cf8f618d ("target: Fix se_tpg_tfo->tf_subsys regression + remove tf_subsystem")' 'commit cbbd26b8b1a6 ("[iov_iter] new primitives - copy_from_iter_full() and friends")' Signed-off-by: Yan Wang <wangyan122@huawei.com> Reviewed-by: Jun Piao <piaojun@huawei.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-12-19vhost/vsock: switch to a mutex for vhost_vsock_hashStefan Hajnoczi
Now that there are no more data path users of vhost_vsock_lock, it can be turned into a mutex. It's only used by .release() and in the .ioctl() path. Depends-on: <20181105103547.22018-1-stefanha@redhat.com> Suggested-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2018-12-12Revert "net: vhost: lock the vqs one by one"Jason Wang
This reverts commit 78139c94dc8c96a478e67dab3bee84dc6eccb5fd. We don't protect device IOTLB with vq mutex, which will lead e.g use after free for device IOTLB entries. And since we've switched to use mutex_trylock() in previous patch, it's safe to revert it without having deadlock. Fixes: commit 78139c94dc8c ("net: vhost: lock the vqs one by one") Cc: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-12vhost_net: switch to use mutex_trylock() in vhost_net_busy_poll()Jason Wang
We used to hold the mutex of paired virtqueue in vhost_net_busy_poll(). But this will results an inconsistent lock order which may cause deadlock if we try to bring back the protection of device IOTLB with vq mutex that requires to hold mutex of all virtqueues at the same time. Fix this simply by switching to use mutex_trylock(), when fail just skip the busy polling. This can happen when device IOTLB is under updating which should be rare. Fixes: commit 78139c94dc8c ("net: vhost: lock the vqs one by one") Cc: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-12vhost: make sure used idx is seen before log in vhost_add_used_n()Jason Wang
We miss a write barrier that guarantees used idx is updated and seen before log. This will let userspace sync and copy used ring before used idx is update. Fix this by adding a barrier before log_write(). Fixes: 8dd014adfea6f ("vhost-net: mergeable buffers support") Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Several conflicts, seemingly all over the place. I used Stephen Rothwell's sample resolutions for many of these, if not just to double check my own work, so definitely the credit largely goes to him. The NFP conflict consisted of a bug fix (moving operations past the rhashtable operation) while chaning the initial argument in the function call in the moved code. The net/dsa/master.c conflict had to do with a bug fix intermixing of making dsa_master_set_mtu() static with the fixing of the tagging attribute location. cls_flower had a conflict because the dup reject fix from Or overlapped with the addition of port range classifiction. __set_phy_supported()'s conflict was relatively easy to resolve because Andrew fixed it in both trees, so it was just a matter of taking the net-next copy. Or at least I think it was :-) Joe Stringer's fix to the handling of netns id 0 in bpf_sk_lookup() intermixed with changes on how the sdif and caller_net are calculated in these code paths in net-next. The remaining BPF conflicts were largely about the addition of the __bpf_md_ptr stuff in 'net' overlapping with adjustments and additions to the relevant data structure where the MD pointer macros are used. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: "A decent batch of fixes here. I'd say about half are for problems that have existed for a while, and half are for new regressions added in the 4.20 merge window. 1) Fix 10G SFP phy module detection in mvpp2, from Baruch Siach. 2) Revert bogus emac driver change, from Benjamin Herrenschmidt. 3) Handle BPF exported data structure with pointers when building 32-bit userland, from Daniel Borkmann. 4) Memory leak fix in act_police, from Davide Caratti. 5) Check RX checksum offload in RX descriptors properly in aquantia driver, from Dmitry Bogdanov. 6) SKB unlink fix in various spots, from Edward Cree. 7) ndo_dflt_fdb_dump() only works with ethernet, enforce this, from Eric Dumazet. 8) Fix FID leak in mlxsw driver, from Ido Schimmel. 9) IOTLB locking fix in vhost, from Jean-Philippe Brucker. 10) Fix SKB truesize accounting in ipv4/ipv6/netfilter frag memory limits otherwise namespace exit can hang. From Jiri Wiesner. 11) Address block parsing length fixes in x25 from Martin Schiller. 12) IRQ and ring accounting fixes in bnxt_en, from Michael Chan. 13) For tun interfaces, only iface delete works with rtnl ops, enforce this by disallowing add. From Nicolas Dichtel. 14) Use after free in liquidio, from Pan Bian. 15) Fix SKB use after passing to netif_receive_skb(), from Prashant Bhole. 16) Static key accounting and other fixes in XPS from Sabrina Dubroca. 17) Partially initialized flow key passed to ip6_route_output(), from Shmulik Ladkani. 18) Fix RTNL deadlock during reset in ibmvnic driver, from Thomas Falcon. 19) Several small TCP fixes (off-by-one on window probe abort, NULL deref in tail loss probe, SNMP mis-estimations) from Yuchung Cheng" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (93 commits) net/sched: cls_flower: Reject duplicated rules also under skip_sw bnxt_en: Fix _bnxt_get_max_rings() for 57500 chips. bnxt_en: Fix NQ/CP rings accounting on the new 57500 chips. bnxt_en: Keep track of reserved IRQs. bnxt_en: Fix CNP CoS queue regression. net/mlx4_core: Correctly set PFC param if global pause is turned off. Revert "net/ibm/emac: wrong bit is used for STA control" neighbour: Avoid writing before skb->head in neigh_hh_output() ipv6: Check available headroom in ip6_xmit() even without options tcp: lack of available data can also cause TSO defer ipv6: sr: properly initialize flowi6 prior passing to ip6_route_output mlxsw: spectrum_switchdev: Fix VLAN device deletion via ioctl mlxsw: spectrum_router: Relax GRE decap matching check mlxsw: spectrum_switchdev: Avoid leaking FID's reference count mlxsw: spectrum_nve: Remove easily triggerable warnings ipv4: ipv6: netfilter: Adjust the frag mem limit when truesize changes sctp: frag_point sanity check tcp: fix NULL ref in tail loss probe tcp: Do not underestimate rwnd_limited net: use skb_list_del_init() to remove from RX sublists ...
2018-12-06vhost/vsock: fix use-after-free in network stack callersStefan Hajnoczi
If the network stack calls .send_pkt()/.cancel_pkt() during .release(), a struct vhost_vsock use-after-free is possible. This occurs because .release() does not wait for other CPUs to stop using struct vhost_vsock. Switch to an RCU-enabled hashtable (indexed by guest CID) so that .release() can wait for other CPUs by calling synchronize_rcu(). This also eliminates vhost_vsock_lock acquisition in the data path so it could have a positive effect on performance. This is CVE-2018-14625 "kernel: use-after-free Read in vhost_transport_send_pkt". Cc: stable@vger.kernel.org Reported-and-tested-by: syzbot+bd391451452fb0b93039@syzkaller.appspotmail.com Reported-by: syzbot+e3e074963495f92a89ed@syzkaller.appspotmail.com Reported-by: syzbot+d5a0a170c5069658b141@syzkaller.appspotmail.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2018-12-06vhost/vsock: fix reset orphans race with close timeoutStefan Hajnoczi
If a local process has closed a connected socket and hasn't received a RST packet yet, then the socket remains in the table until a timeout expires. When a vhost_vsock instance is released with the timeout still pending, the socket is never freed because vhost_vsock has already set the SOCK_DONE flag. Check if the close timer is pending and let it close the socket. This prevents the race which can leak sockets. Reported-by: Maximilian Riemensberger <riemensberger@cadami.net> Cc: Graham Whaley <graham.whaley@gmail.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-12-03vhost: fix IOTLB lockingJean-Philippe Brucker
Commit 78139c94dc8c ("net: vhost: lock the vqs one by one") moved the vq lock to improve scalability, but introduced a possible deadlock in vhost-iotlb. vhost_iotlb_notify_vq() now takes vq->mutex while holding the device's IOTLB spinlock. And on the vhost_iotlb_miss() path, the spinlock is taken while holding vq->mutex. Since calling vhost_poll_queue() doesn't require any lock, avoid the deadlock by not taking vq->mutex. Fixes: 78139c94dc8c ("net: vhost: lock the vqs one by one") Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-28scsi: target: replace fabric_ops.name with fabric_aliasDavid Disseldorp
iscsi_target_mod is the only LIO fabric where fabric_ops.name differs from the fabric_ops.fabric_name string. fabric_ops.name is used when matching target/$fabric ConfigFS create paths, so rename it .fabric_alias and fallback to target/$fabric vs .fabric_name comparison if .fabric_alias isn't initialised. iscsi_target_mod is the only fabric module to set .fabric_alias . All other fabric modules rely on .fabric_name matching and can drop the duplicate string. Signed-off-by: David Disseldorp <ddiss@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-11-28scsi: target: drop unnecessary get_fabric_name() accessor from fabric_opsDavid Disseldorp
All fabrics return a const string. In all cases *except* iSCSI the get_fabric_name() string matches fabric_ops.name. Both fabric_ops.get_fabric_name() and fabric_ops.name are user-facing, with the former being used for PR/ALUA state and the latter for ConfigFS (config/target/$name), so we unfortunately need to keep both strings around for now. Replace the useless .get_fabric_name() accessor function with a const string fabric_name member variable. Signed-off-by: David Disseldorp <ddiss@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-11-27drivers/vhost: Replace synchronize_rcu_bh() with synchronize_rcu()Paul E. McKenney
Now that synchronize_rcu() waits for bh-disable regions of code as well as RCU read-side critical sections, synchronize_rcu_bh() can be replaced by synchronize_rcu(). This commit therefore makes this change. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: <kvm@vger.kernel.org> Cc: <virtualization@lists.linux-foundation.org> Cc: <netdev@vger.kernel.org>
2018-11-17vhost_net: mitigate page reference counting during page frag refillJason Wang
We do a get_page() which involves a atomic operation. This patch tries to mitigate a per packet atomic operation by maintaining a reference bias which is initially USHRT_MAX. Each time a page is got, instead of calling get_page() we decrease the bias and when we find it's time to use a new page we will decrease the bias at one time through __page_cache_drain_cache(). Testpmd(virtio_user + vhost_net) + XDP_DROP on TAP shows about 1.6% improvement. Before: 4.63Mpps After: 4.71Mpps Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-01Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio/vhost updates from Michael Tsirkin: "Fixes and tweaks: - virtio balloon page hinting support - vhost scsi control queue - misc fixes" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: MAINTAINERS: remove reference to bogus vsock file vhost/scsi: Use common handling code in request queue handler vhost/scsi: Extract common handling code from control queue handler vhost/scsi: Respond to control queue operations vhost/scsi: truncate T10 PI iov_iter to prot_bytes virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON mm/page_poison: expose page_poisoning_enabled to kernel modules virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT kvm_config: add CONFIG_VIRTIO_MENU
2018-10-31vhost: Fix Spectre V1 vulnerabilityJason Wang
The idx in vhost_vring_ioctl() was controlled by userspace, hence a potential exploitation of the Spectre variant 1 vulnerability. Fixing this by sanitizing idx before using it to index d->vqs. Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-24vhost/scsi: Use common handling code in request queue handlerBijan Mottahedeh
Change the request queue handler to use common handling routines same as the control queue handler. Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
s/dnf5'>ross/dnf5 Poky Built Tool and Metadata - User Contributions TreeGrokmirror user
summaryrefslogtreecommitdiffstats
path: root/documentation/README
blob: b60472fcbf147a918a01a425eb8d4c9ac12ab299 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
documentation
=============

This is the directory that contains the Yocto Project documentation.  The Yocto
Project source repositories at https://git.yoctoproject.org/cgit.cgi have two
instances of the "documentation" directory.  You should understand each of
these instances.

  poky/documentation - The directory within the poky Git repository containing
                       the set of Yocto Project manuals.  When you clone the
                       poky Git repository, the documentation directory
                       contains the manuals.  The state of the manuals in this
                       directory is guaranteed to reflect the latest Yocto
                       Project release.  The manuals at the tip of this
                       directory will also likely contain most manual
                       development changes.

  yocto-docs/documentation - The Git repository for the Yocto Project manuals.
                             This repository is where manual development
                             occurs.  If you plan on contributing back to the
                             Yocto Project documentation, you should set up
                             a local Git repository based on this upstream
                             repository as follows:

                               git clone git://git.yoctoproject.org/yocto-docs

                             Changes and patches are first pushed to the
                             yocto-docs Git repository.  Later, they make it
                             into the poky Git repository found at
                             git://git.yoctoproject.org/poky.

Manual Organization
===================

Here the folders corresponding to individual manuals:

* brief-yoctoprojectqs - Yocto Project Quick Start
* overview-manual      - Yocto Project Overview and Concepts Manual
* contributor-guide    - Yocto Project and OpenEmbedded Contributor Guide
* ref-manual           - Yocto Project Reference Manual
* bsp-guide            - Yocto Project Board Support Package (BSP) Developer's Guide
* dev-manual           - Yocto Project Development Tasks Manual
* kernel-dev           - Yocto Project Linux Kernel Development Manual
* profile-manual       - Yocto Project Profiling and Tracing Manual
* sdk-manual           - Yocto Project Software Development Kit (SDK) Developer's Guide.
* toaster-manual       - Toaster User Manual
* test-manual          - Yocto Project Test Environment Manual
* migration-guides     - Yocto Project Release and Migration Notes

Each folder is self-contained regarding content and figures.

If you want to find HTML versions of the Yocto Project manuals on the web,
the current versions reside at https://docs.yoctoproject.org.

poky.yaml
=========

This file defines variables used for documentation production.  The variables
are used to define release pathnames, URLs for the published manuals, etc.

standards.md
============

This file specifies some rules to follow when contributing to the documentation.

template/
=========

Contains a template.svg, to make it easier to create consistent
SVG diagrams.

Sphinx
======

The Yocto Project documentation was migrated from the original DocBook
format to Sphinx based documentation for the Yocto Project 3.2
release. This section will provide additional information related to
the Sphinx migration, and guidelines for developers willing to
contribute to the Yocto Project documentation.

   Sphinx is a tool that makes it easy to create intelligent and
   beautiful documentation, written by Georg Brandl and licensed under
   the BSD license. It was originally created for the Python
   documentation.

Extensive documentation is available on the Sphinx website:
https://www.sphinx-doc.org/en/master/. Sphinx is designed to be
extensible thanks to the ability to write our own custom extensions,
as Python modules, which will be executed during the generation of the
documentation.

Yocto Project documentation website
===================================

The website hosting the Yocto Project documentation, can be found
at: https://docs.yoctoproject.org/.

The entire Yocto Project documentation, as well as the BitBake manual,
is published on this website, including all previously released
versions. A version switcher was added, as a drop-down menu on the top
of the page to switch back and forth between the various versions of
the current active Yocto Project releases.

Transition pages have been added (as rst files) to show links to old
versions of the Yocto Project documentation with links to each manual
generated with DocBook.

How to build the Yocto Project documentation
============================================

Sphinx is written in Python. While it might work with Python2, for
obvious reasons, we will only support building the Yocto Project
documentation with Python3.

Sphinx might be available in your Linux distro packages repositories,
however it is not recommended to use distro packages, as they might be
old versions, especially if you are using an LTS version of your
distro. The recommended method to install the latest versions of Sphinx
and of its required dependencies is to use the Python Package Index (pip).

To install all required packages run:

 $ pip3 install sphinx sphinx_rtd_theme pyyaml

To make sure you always have the latest versions of such packages, you
should regularly run the same command with an added "--upgrade" option:

 $ pip3 install --upgrade sphinx sphinx_rtd_theme pyyaml

Also install the "inkscape" package from your distribution.
Inkscape is need to convert SVG graphics to PNG (for EPUB
export) and to PDF (for PDF export).

Additionally install "fncychap.sty" TeX font if you want to build PDFs. Debian
and Ubuntu have it in "texlive-latex-extra" package while RedHat distributions
and OpenSUSE have it in "texlive-fncychap" package for example.

To build the documentation locally, run:

 $ cd documentation
 $ make html

The resulting HTML index page will be _build/html/index.html, and you
can browse your own copy of the locally generated documentation with
your browser.

Alternatively, you can use Pipenv to automatically install all required
dependencies in a virtual environment:

 $ cd documentation
 $ pipenv install
 $ pipenv run make html

Style checking the Yocto Project documentation
==============================================

The project is starting to use Vale (https://vale.sh/)
to validate the text style.

To install Vale:

 $ pip install vale

To run Vale:

 $ make stylecheck

Link checking the Yocto Project documentation
=============================================

To fix errors which are not reported by Sphinx itself,
the project uses sphinx-lint (https://github.com/sphinx-contrib/sphinx-lint).

To install sphinx-lint:

 $ pip install sphinx-lint

To run sphinx-lint:

 $ make sphinx-lint

Sphinx theme and CSS customization
==================================

The Yocto Project documentation is currently based on the "Read the
Docs" Sphinx theme, with a few changes to make sure the look and feel
of the project documentation is preserved.

Most of the theme changes can be done using the file
'sphinx-static/theme_overrides.css'. Most CSS changes in this file
were inherited from the DocBook CSS stylesheets.

Sphinx design guidelines and principles
=======================================

The initial Docbook to Sphinx migration was done with an automated
tool called Pandoc (https://pandoc.org/). The tool produced some clean
output markdown text files. After the initial automated conversion
additional changes were done to fix up headings, images and links. In
addition Sphinx has built in mechanisms (directives) which were used
to replace similar functions implemented in Docbook such as glossary,
variables substitutions, notes and references.

Headings
========

The layout of the Yocto Project manuals is organized as follows

    Book
      Chapter
        Section
          Subsection
            Subsubsection
              Subsubsubsection

Here are the heading styles that we use in the manuals:

    Book                       => overline ===
      Chapter                  => overline ***
        Section                => ====
          Subsection           => ----
            Subsubsection      => ~~~~
              Subsubsubsection => ^^^^

With this proposal, we preserve the same TOCs between Sphinx and Docbook.

Built-in glossary
=================

Sphinx has a glossary directive. From
https://www.sphinx-doc.org/en/master/usage/restructuredtext/directives.html#glossary:

    This directive must contain a reST definition list with terms and
    definitions. It's then possible to refer to each definition through the
    [https://www.sphinx-doc.org/en/master/usage/restructuredtext/roles.html#role-term
    'term' role].

So anywhere in any of the Yocto Project manuals, :term:`VAR` can be
used to refer to an item from the glossary, and a link is created
automatically. A general index of terms is also generated by Sphinx
automatically.

Global substitutions
====================

The Yocto Project documentation makes heavy use of global
variables. In Docbook these variables are stored in the file
poky.ent. This Docbook feature is not handled automatically with
Pandoc. Sphinx has builtin support for substitutions
(https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#substitutions),
however there are important shortcomings. For example they cannot be
used/nested inside code-block sections.

A Sphinx extension was implemented to support variable substitutions
to mimic the DocBook based documentation behavior. Variable
substitutions are done while reading/parsing the .rst files. The
pattern for variables substitutions is the same as with DocBook,
e.g. `&VAR;`.

The implementation of the extension can be found here in the file
documentation/sphinx/yocto-vars.py, this extension is enabled by
default when building the Yocto Project documentation.  All variables
are set in a file call poky.yaml, which was initially generated from
poky.ent. The file was converted into YAML so that it is easier to
process by the custom Sphinx extension (which is a Python module).

For example, the following .rst content will produce the 'expected'
content:

  .. code-block::
      $ mkdir poky-&DISTRO;
      or
      $ git clone &YOCTO_GIT_URL;/git/poky -b &DISTRO_NAME_NO_CAP;

Variables can be nested, like it was the case for DocBook:

  YOCTO_HOME_URL : "https://www.yoctoproject.org"
  YOCTO_DOCS_URL : "&YOCTO_HOME_URL;/docs"

Note directive
==============

Sphinx has a builtin 'note' directive that produces clean Note section
in the output file. There are various types of directives such as
"attention", "caution", "danger", "error", "hint", "important", "tip",
"warning", "admonition" that are supported, and additional directives
can be added as Sphinx extension if needed.

Figures
=======

The Yocto Project documentation has many figures/images. Sphinx has a
'figure' directive which is straightforward to use. To include a
figure in the body of the documentation:

  .. image:: figures/YP-flow-diagram.png

Links and References
====================

The following types of links can be used: links to other locations in
the same document, to locations in other documents and to external
websites.

More information can be found here:
https://sublime-and-sphinx-guide.readthedocs.io/en/latest/references.html.

For external links, we use this syntax:
`link text <link URL>`__

instead of:
`link text <link URL>`_

Both syntaxes work, but the latter also creates a "link text" reference
target which could conflict with other references with the same name.
So, only use this variant when you wish to make multiple references
to this link, reusing only the target name.

See https://stackoverflow.com/questions/27420317/restructured-text-rst-http-links-underscore-vs-use

Anchor (<#link>) links are forbidden as they are not checked by Sphinx during
the build and may be broken without knowing about it.

References
==========

The following extension is enabled by default:
sphinx.ext.autosectionlabel
(https://www.sphinx-doc.org/en/master/usage/extensions/autosectionlabel.html).

This extension allows you to refer sections by their titles. Note that
autosectionlabel_prefix_document is enabled by default, so that we can
insert references from any document.

For example, to insert an HTML link to a section from
documentation/manual/intro.rst, use:

  Please check this :ref:`manual/intro:Cross-References to Locations in the Same Document`

Alternatively a custom text can be used instead of using the section
text:

  Please check this :ref:`section <manual/intro:Cross-References to Locations in the Same Document>`

TIP: The following command can be used to dump all the references that
     are defined in the project documentation:

       python -msphinx.ext.intersphinx <path to build folder>/html/objects.inv

This dump contains all links and for each link it shows the default
"Link Text" that Sphinx would use. If the default link text is not
appropriate, a custom link text can be used in the ':ref:' directive.

Extlinks
========

The sphinx.ext.extlinks extension is enabled by default
(https://sublime-and-sphinx-guide.readthedocs.io/en/latest/references.html#use-the-external-links-extension),
and it is configured with the 'extlinks' definitions in
the 'documentation/conf.py' file:

  'yocto_home': ('https://yoctoproject.org%s', None),
  'yocto_wiki': ('https://wiki.yoctoproject.org%s', None),
  'yocto_dl': ('https://downloads.yoctoproject.org%s', None),
  'yocto_lists': ('https://lists.yoctoproject.org%s', None),
  'yocto_bugs': ('https://bugzilla.yoctoproject.org%s', None),
  'yocto_ab': ('https://autobuilder.yoctoproject.org%s', None),
  'yocto_docs': ('https://docs.yoctoproject.org%s', None),
  'yocto_git': ('https://git.yoctoproject.org%s', None),
  'oe_home': ('https://www.openembedded.org%s', None),
  'oe_lists': ('https://lists.openembedded.org%s', None),
  'oe_git': ('https://git.openembedded.org%s', None),
  'oe_wiki': ('https://www.openembedded.org/wiki%s', None),
  'oe_layerindex': ('https://layers.openembedded.org%s', None),
  'oe_layer': ('https://layers.openembedded.org/layerindex/branch/master/layer%s', None),

It creates convenient shortcuts which can be used throughout the
documentation rst files, as:

  Please check this :yocto_wiki:`wiki page </Weekly_Status>`

Intersphinx links
=================

The sphinx.ext.intersphinx extension is enabled by default
(https://www.sphinx-doc.org/en/master/usage/extensions/intersphinx.html),
so that we can cross reference content from other Sphinx based
documentation projects, such as the BitBake manual.

References to the BitBake manual can directly be done:
 - With a specific description instead of the section name:
  :ref:`Azure Storage fetcher (az://) <bitbake-user-manual/bitbake-user-manual-fetching:fetchers>`
 - With the section name:
  :ref:`bitbake-user-manual/bitbake-user-manual-intro:usage and syntax` option

If you want to refer to an entire document (or chapter) in the BitBake manual,
you have to use the ":doc:" macro with the "bitbake:" prefix:
 - :doc:`BitBake User Manual <bitbake:index>`
 - :doc:`bitbake:bitbake-user-manual/bitbake-user-manual-metadata`" chapter

Note that a reference to a variable (:term:`VARIABLE`) automatically points to
the BitBake manual if the variable is not described in the Reference Manual's Variable Glossary.
However, if you need to bypass this, you can explicitely refer to a description in the
BitBake manual as follows:

  :term:`bitbake:BB_NUMBER_PARSE_THREADS`

This would be the same if we had identical document filenames in
both the Yocto Project and BitBake manuals:

  :ref:`bitbake:directory/file:section title`

Submitting documentation changes
================================

Please see the top level README file in this repository for details of where
to send patches.