summaryrefslogtreecommitdiffstats
path: root/Documentation/networking
AgeCommit message (Collapse)Author
2020-10-29icmp: randomize the global rate limiterEric Dumazet
[ Upstream commit b38e7819cae946e2edf869e604af1e65a5d241c5 ] Keyu Man reported that the ICMP rate limiter could be used by attackers to get useful signal. Details will be provided in an upcoming academic publication. Our solution is to add some noise, so that the attackers no longer can get help from the predictable token bucket limiter. Fixes: 4cdf507d5452 ("icmp: add a global rate limitation") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Keyu Man <kman001@ucr.edu> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-05netfilter: nf_flowtable: fix documentationMatteo Croce
commit 78e06cf430934fc3768c342cbebdd1013dcd6fa7 upstream. In the flowtable documentation there is a missing semicolon, the command as is would give this error: nftables.conf:5:27-33: Error: syntax error, unexpected devices, expecting newline or semicolon hook ingress priority 0 devices = { br0, pppoe-data }; ^^^^^^^ nftables.conf:4:12-13: Error: invalid hook (null) flowtable ft { ^^ Fixes: 19b351f16fd9 ("netfilter: add flowtable documentation") Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-17tcp: add tcp_min_snd_mss sysctlEric Dumazet
commit 5f3e2bf008c2221478101ee72f5cb4654b9fc363 upstream. Some TCP peers announce a very small MSS option in their SYN and/or SYN/ACK messages. This forces the stack to send packets with a very high network/cpu overhead. Linux has enforced a minimal value of 48. Since this value includes the size of TCP options, and that the options can consume up to 40 bytes, this means that each segment can include only 8 bytes of payload. In some cases, it can be useful to increase the minimal value to a saner value. We still let the default to 48 (TCP_MIN_SND_MSS), for compatibility reasons. Note that TCP_MAXSEG socket option enforces a minimal value of (TCP_MIN_MSS). David Miller increased this minimal value in commit c39508d6f118 ("tcp: Make TCP_MAXSEG minimum more correct.") from 64 to 88. We might in the future merge TCP_MIN_SND_MSS and TCP_MIN_MSS. CVE-2019-11479 -- tcp mss hardcoded to 48 Signed-off-by: Eric Dumazet <edumazet@google.com> Suggested-by: Jonathan Looney <jtl@netflix.com> Acked-by: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Tyler Hicks <tyhicks@canonical.com> Cc: Bruce Curtis <brucec@netflix.com> Cc: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02ipv4: set the tcp_min_rtt_wlen range from 0 to one dayZhangXiaoxu
[ Upstream commit 19fad20d15a6494f47f85d869f00b11343ee5c78 ] There is a UBSAN report as below: UBSAN: Undefined behaviour in net/ipv4/tcp_input.c:2877:56 signed integer overflow: 2147483647 * 1000 cannot be represented in type 'int' CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.1.0-rc4-00058-g582549e #1 Call Trace: <IRQ> dump_stack+0x8c/0xba ubsan_epilogue+0x11/0x60 handle_overflow+0x12d/0x170 ? ttwu_do_wakeup+0x21/0x320 __ubsan_handle_mul_overflow+0x12/0x20 tcp_ack_update_rtt+0x76c/0x780 tcp_clean_rtx_queue+0x499/0x14d0 tcp_ack+0x69e/0x1240 ? __wake_up_sync_key+0x2c/0x50 ? update_group_capacity+0x50/0x680 tcp_rcv_established+0x4e2/0xe10 tcp_v4_do_rcv+0x22b/0x420 tcp_v4_rcv+0xfe8/0x1190 ip_protocol_deliver_rcu+0x36/0x180 ip_local_deliver+0x15b/0x1a0 ip_rcv+0xac/0xd0 __netif_receive_skb_one_core+0x7f/0xb0 __netif_receive_skb+0x33/0xc0 netif_receive_skb_internal+0x84/0x1c0 napi_gro_receive+0x2a0/0x300 receive_buf+0x3d4/0x2350 ? detach_buf_split+0x159/0x390 virtnet_poll+0x198/0x840 ? reweight_entity+0x243/0x4b0 net_rx_action+0x25c/0x770 __do_softirq+0x19b/0x66d irq_exit+0x1eb/0x230 do_IRQ+0x7a/0x150 common_interrupt+0xf/0xf </IRQ> It can be reproduced by: echo 2147483647 > /proc/sys/net/ipv4/tcp_min_rtt_wlen Fixes: f672258391b42 ("tcp: track min RTT using windowed min-filter") Signed-off-by: ZhangXiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-26net-tcp: /proc/sys/net/ipv4/tcp_probe_interval is a u32 not intMaciej Żenczykowski
(fix documentation and sysctl access to treat it as such) Tested: # zcat /proc/config.gz | egrep ^CONFIG_HZ CONFIG_HZ_1000=y CONFIG_HZ=1000 # echo $[(1<<32)/1000 + 1] | tee /proc/sys/net/ipv4/tcp_probe_interval 4294968 tee: /proc/sys/net/ipv4/tcp_probe_interval: Invalid argument # echo $[(1<<32)/1000] | tee /proc/sys/net/ipv4/tcp_probe_interval 4294967 # echo 0 | tee /proc/sys/net/ipv4/tcp_probe_interval # echo -1 | tee /proc/sys/net/ipv4/tcp_probe_interval -1 tee: /proc/sys/net/ipv4/tcp_probe_interval: Invalid argument Signed-off-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-23Merge tag 'armsoc-drivers' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC driver updates from Olof Johansson: "Some of the larger changes this merge window: - Removal of drivers for Exynos5440, a Samsung SoC that never saw widespread use. - Uniphier support for USB3 and SPI reset handling - Syste control and SRAM drivers and bindings for Allwinner platforms - Qualcomm AOSS (Always-on subsystem) reset controller drivers - Raspberry Pi hwmon driver for voltage - Mediatek pwrap (pmic) support for MT6797 SoC" * tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (52 commits) drivers/firmware: psci_checker: stash and use topology_core_cpumask for hotplug tests soc: fsl: cleanup Kconfig menu soc: fsl: dpio: Convert DPIO documentation to .rst staging: fsl-mc: Remove remaining files staging: fsl-mc: Move DPIO from staging to drivers/soc/fsl staging: fsl-dpaa2: eth: move generic FD defines to DPIO soc: fsl: qe: gpio: Add qe_gpio_set_multiple usb: host: exynos: Remove support for Exynos5440 clk: samsung: Remove support for Exynos5440 soc: sunxi: Add the A13, A23 and H3 system control compatibles reset: uniphier: add reset control support for SPI cpufreq: exynos: Remove support for Exynos5440 ata: ahci-platform: Remove support for Exynos5440 soc: imx6qp: Use GENPD_FLAG_ALWAYS_ON for PU errata soc: mediatek: pwrap: add mt6351 driver for mt6797 SoCs soc: mediatek: pwrap: add pwrap driver for mt6797 SoCs soc: mediatek: pwrap: fix cipher init setting error dt-bindings: pwrap: mediatek: add pwrap support for MT6797 reset: uniphier: add USB3 core reset control dt-bindings: reset: uniphier: add USB3 core reset support ...
2018-08-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net The following patchset contains Netfilter/IPVS fixes for your net tree: 1) Infinite loop in IPVS when net namespace is released, from Tan Hu. 2) Do not show negative timeouts in ip_vs_conn by using the new jiffies_delta_to_msecs(), patches from Matteo Croce. 3) Set F_IFACE flag for linklocal addresses in ip6t_rpfilter, from Florian Westphal. 4) Fix overflow in set size allocation, from Taehee Yoo. 5) Use netlink_dump_start() from ctnetlink to fix memleak from the error path, again from Florian. 6) Register nfnetlink_subsys in last place, otherwise netns init path may lose race and see net->nft uninitialized data. This also reverts previous attempt to fix this by increase netns refcount, patches from Florian. 7) Remove conntrack entries on layer 4 protocol tracker module removal, from Florian. 8) Use GFP_KERNEL_ACCOUNT for xtables blob allocation, from Michal Hocko. 9) Get tproxy documentation in sync with existing codebase, from Mate Eckl. 10) Honor preset layer 3 protocol via ctx->family in the new nft_ct timeout infrastructure, from Harsha Sharma. 11) Let uapi nfnetlink_osf.h compile standalone with no errors, from Dmitry V. Levin. 12) Missing braces compilation warning in nft_tproxy, patch from Mate Eclk. 13) Disregard bogus check to bail out on non-anonymous sets from the dynamic set update extension. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-16Documentation: networking: ti-cpsw: correct cbs parameters for Eth1 100MbIvan Khoronzhuk
If set cbs parameters calculated for 1000Mb, but use on 100Mb port w/o h/w offload (for cpsw offload it doesn't matter), it works incorrectly. According to the example and testing board, second port is 100Mb interface. Correct them on recalculated for 100Mb interface. It allows to use the same command for CBS software implementation for board in example. Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-16netfilter: doc: Add nf_tables part in tproxy.txtMáté Eckl
Recently, transparent proxy support has been added to nf_tables so that this document should be updated with the new information. - Nft commands are added as alternatives to iptables ones. - The link for a patched iptables is removed as it is already part of the mainline iptables implementation (and the link is dead). - tcprdr is added as an example implementation of a transparent proxy Cc: "David S. Miller" <davem@davemloft.net> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Florian Westphal <fw@strlen.de> Cc: KOVACS Krisztian <hidden@sch.bme.hu> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: linux-doc@vger.kernel.org Signed-off-by: Máté Eckl <ecklm94@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-13ipv6: Add icmp_echo_ignore_all support for ICMPv6Virgile Jarry
Preventing the kernel from responding to ICMP Echo Requests messages can be useful in several ways. The sysctl parameter 'icmp_echo_ignore_all' can be used to prevent the kernel from responding to IPv4 ICMP echo requests. For IPv6 pings, such a sysctl kernel parameter did not exist. Add the ability to prevent the kernel from responding to IPv6 ICMP echo requests through the use of the following sysctl parameter : /proc/sys/net/ipv6/icmp/echo_ignore_all. Update the documentation to reflect this change. Signed-off-by: Virgile Jarry <virgile@acceis.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-02Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/netDavid S. Miller
The BTF conflicts were simple overlapping changes. The virtio_net conflict was an overlap of a fix of statistics counter, happening alongisde a move over to a bonafide statistics structure rather than counting value on the stack. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01net: ipv4: Control SKB reprioritization after forwardingPetr Machata
After IPv4 packets are forwarded, the priority of the corresponding SKB is updated according to the TOS field of IPv4 header. This overrides any prioritization done earlier by e.g. an skbedit action or ingress-qos-map defined at a vlan device. Such overriding may not always be desirable. Even if the packet ends up being routed, which implies this is an L3 network node, an administrator may wish to preserve whatever prioritization was done earlier on in the pipeline. Therefore introduce a sysctl that controls this behavior. Keep the default value at 1 to maintain backward-compatible behavior. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01Documentation: dpaa2: Use correct heading adornmentIoana Ciornei
Add overline heading adornment to document title in order to comply with kernel doc requirements. Fixes: 60b9131 staging: fsl-mc: Convert documentation to rst format Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-27can: ucan: add driver for Theobroma Systems UCAN devicesJakob Unterwurzacher
The UCAN driver supports the microcontroller-based USB/CAN adapters from Theobroma Systems. There are two form-factors that run essentially the same firmware: * Seal: standalone USB stick ( https://www.theobroma-systems.com/seal ) * Mule: integrated on the PCB of various System-on-Modules from Theobroma Systems like the A31-µQ7 and the RK3399-Q7 ( https://www.theobroma-systems.com/rk3399-q7 ) The USB wire protocol has been designed to be as generic and hardware-indendent as possible in the hope of being useful for implementation on other microcontrollers. Signed-off-by: Martin Elshuber <martin.elshuber@theobroma-systems.com> Signed-off-by: Jakob Unterwurzacher <jakob.unterwurzacher@theobroma-systems.com> Signed-off-by: Philipp Tomsich <philipp.tomsich@theobroma-systems.com> Acked-by: Wolfgang Grandegger <wg@grandegger.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2018-07-26docs: net: Convert netdev-FAQ to restructured textTobin C. Harding
Preferred kernel docs format is now restructured text. Convert netdev-FAQ.txt to restructured text. - Add SPDX license identifier. - Change file heading 'Information you need to know about netdev' to 'netdev FAQ' to better suit displayed index (in HTML). - Change question/answer layout to suit rst. Copy format in Documentation/bpf/bpf_devel_QA.rst - Fix indentation of code snippets - If multiple consecutive URLs appear put them in a list (to maintain whitespace). - Use uniform spelling of 'bug fix' throughout document (not bugfix or bug-fix). - Add double back ticks to 'net' and 'net-next' when referring to the trees. - Use rst references for Documentation/ links. - Add rst label 'netdev-FAQ' for referencing by other docs files. - Remove stale entry from Documentation/networking/00-INDEX Signed-off-by: Tobin C. Harding <me@tobin.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-26Merge tag 'soc-fsl-for-4.19' of ↵Olof Johansson
git://git.kernel.org/pub/scm/linux/kernel/git/leo/linux into next/drivers Various updates to soc/fsl for 4.19 Moves DPAA2 DPIO driver from staging to fsl/soc Adds multiple-pin support to QE gpio driver * tag 'soc-fsl-for-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/leo/linux: soc: fsl: cleanup Kconfig menu soc: fsl: dpio: Convert DPIO documentation to .rst staging: fsl-mc: Remove remaining files staging: fsl-mc: Move DPIO from staging to drivers/soc/fsl staging: fsl-dpaa2: eth: move generic FD defines to DPIO soc: fsl: qe: gpio: Add qe_gpio_set_multiple Signed-off-by: Olof Johansson <olof@lixom.net>
2018-07-24soc: fsl: dpio: Convert DPIO documentation to .rstRoy Pledge
Convert the Datapath I/O documentation to .rst format and move to the Documation/networking/dpaa2 directory Signed-off-by: Roy Pledge <roy.pledge@nxp.com> Reviewed-by: Horia Geantă <horia.geanta@nxp.com> Reviewed-by: Ioana Radulescu <ruxandra.radulescu@nxp.com> Signed-off-by: Li Yang <leoyang.li@nxp.com>
2018-07-23Documentation: networking: cpsw: add MQPRIO & CBS offload examplesIvan Khoronzhuk
This document describes MQPRIO and CBS Qdisc offload configuration for cpsw driver based on examples. It potentially can be used in audio video bridging (AVB) and time sensitive networking (TSN). Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-20Merge ra.kernel.org:/pub/scm/linux/kernel/git/torvalds/linuxDavid S. Miller
All conflicts were trivial overlapping changes, so reasonably easy to resolve. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-18docs: networking: Convert bridge.txt to rstTobin C. Harding
The kernel documentation is now restructured text. Convert the Ethernet Bridge documentation and include it in the toplevel kernel documentation. - Fix heading adornments. - Add license identifier. Signed-off-by: Tobin C. Harding <me@tobin.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-18docs: networking: Convert alias.txt to rstTobin C. Harding
The kernel documentation is now restructured text. Convert the IP aliasing documentation and include it in the toplevel kernel documentation. - Fix heading adornments. - Correctly indent code snippets. - Limit line length to 72 characters inline with kernel documentation standards. - Add license identifier. Signed-off-by: Tobin C. Harding <me@tobin.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-16bonding: Fix a typo in bonding.txtMasanari Iida
This patch fixes a spelling typo in bonding.txt Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-16docs: networking: Fix failover build warningsTobin C. Harding
Currently building the net_failover docs causes a bunch of warnings to be emitted. These warnings are all related to indentation and correctly highlight missing '::' (for code sections). It looks, from other rst files in Documentation, that the first column should be indented 2 spaces. Add '::' before code snippets and indent all snippets uniformly starting with 2 spaces. Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Tobin C. Harding <me@tobin.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-16docs: networking: Add failover docs to indexTobin C. Harding
Currently we have rst format docs for the failover and net_failover modules however these docs are not linked to within the index. Add `failover` and `net_failover` to the networking documentation index. Signed-off-by: Tobin C. Harding <me@tobin.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12networking: e1000.rst: Get rid of Sphinx warningsMauro Carvalho Chehab
Documentation/networking/e1000.rst:83: ERROR: Unexpected indentation. Documentation/networking/e1000.rst:84: WARNING: Block quote ends without a blank line; unexpected unindent. Documentation/networking/e1000.rst:173: WARNING: Definition list ends without a blank line; unexpected unindent. Documentation/networking/e1000.rst:236: WARNING: Definition list ends without a blank line; unexpected unindent. While here, fix highlights and mark a table as such. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-07-12networking: e100.rst: Get rid of Sphinx warningsMauro Carvalho Chehab
Documentation/networking/e100.rst:57: WARNING: Literal block expected; none found. Documentation/networking/e100.rst:68: WARNING: Literal block expected; none found. Documentation/networking/e100.rst:75: WARNING: Literal block expected; none found. Documentation/networking/e100.rst:84: WARNING: Literal block expected; none found. Documentation/networking/e100.rst:93: WARNING: Inline emphasis start-string without end-string. While here, fix some highlights. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-07-11Documentation: ip-sysctl.txt: document addr_gen_modeSabrina Dubroca
addr_gen_mode was introduced in without documentation, add it now. Fixes: d35a00b8e33d ("net/ipv6: allow sysctl to change link-local address generation mode") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-02Documentation: Add explanation for XPS using Rx-queue(s) mapAmritha Nambiar
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-28skbuff: preserve sock reference when scrubbing the skb.Flavio Leitner
The sock reference is lost when scrubbing the packet and that breaks TSQ (TCP Small Queues) and XPS (Transmit Packet Steering) causing performance impacts of about 50% in a single TCP stream when crossing network namespaces. XPS breaks because the queue mapping stored in the socket is not available, so another random queue might be selected when the stack needs to transmit something like a TCP ACK, or TCP Retransmissions. That causes packet re-ordering and/or performance issues. TSQ breaks because it orphans the packet while it is still in the host, so packets are queued contributing to the buffer bloat problem. Preserving the sock reference fixes both issues. The socket is orphaned anyways in the receiving path before any relevant action and on TX side the netfilter checks if the reference is local before use it. Signed-off-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-24strparser: Corrected typo in documentation.Vakul Garg
Replaced strp_pause() with strp_unpause() to correct a seemingly copy paste documentation mistake. Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-23Documentation: e1000: Fix docs build errorTobin C. Harding
Recent patch updated e1000 docs to rst format. Docs build (`make htmldocs`) is currently failing due to this file with error: (SEVERE/4) Unexpected section title. This is because a section of the file is indented 2 spaces. Build error can be cleared by aligning the text with column 0. While we are changing these lines we can make sure line length does not exceed 72, that newlines following headings are uniform, and that full stops are followed by two spaces. Align text with column 0, limit line length to 72, ensure two spaces follow all full stops, ensure uniform use of newlines after heading. Fixes commit (228046e76189 Documentation: e1000: Update kernel documentation) CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Tobin C. Harding <me@tobin.cc> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-23Documentation: e100: Fix docs build errorTobin C. Harding
Recent patch updated e100 docs to rst format. Docs build (`make htmldocs`) is currently failing due to this file with error: (SEVERE/4) Unexpected section title. This is because a section of the file is indented 2 spaces. Build error can be cleared by aligning the text with column 0. While we are changing these lines we can make sure line length does not exceed 72, that newlines following headings are uniform, and that full stops are followed by two spaces. Align text with column 0, limit line length to 72, ensure two spaces follow all full stops, ensure uniform use of newlines after heading. Fixes commit (85d63445f411 Documentation: e100: Update the Intel 10/100 driver doc) CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Tobin C. Harding <me@tobin.cc> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-23Documentation: e1000: Use correct heading adornmentTobin C. Harding
Recently documentation file was converted to rst. The document title has the incorrect heading adornment. From kernel docs: * Please stick to this order of heading adornments: 1. ``=`` with overline for document title:: ============== Document title ============== Add overline heading adornment to document title. Fixes commit (228046e76189 Documentation: e1000: Update kernel documentation) CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Tobin C. Harding <me@tobin.cc> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-23Documentation: e100: Use correct heading adornmentTobin C. Harding
Recently documentation file was converted to rst. The document title has the incorrect heading adornment. From kernel docs: * Please stick to this order of heading adornments: 1. ``=`` with overline for document title:: ============== Document title ============== Add overline heading adornment to document title. Fixes commit (85d63445f411 Documentation: e100: Update the Intel 10/100 driver doc) CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Tobin C. Harding <me@tobin.cc> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-15docs: can.rst: fix a footnote referenceMauro Carvalho Chehab
As stated at: http://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#footnotes A footnote should contain either a number, a reference or an auto number, e. g.: [1], [#f1] or [#]. While using [*] accidentaly works for html, it fails for other document outputs. In particular, it causes an error with LaTeX output, causing all books after networking to not be built. So, replace it by a valid syntax. Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Jonathan Corbet <corbet@lwn.net>
2018-06-05netdev-FAQ: clarify DaveM's position for stable backportsCong Wang
Per discussion with David at netconf 2018, let's clarify DaveM's position of handling stable backports in netdev-FAQ. This is important for people relying on upstream -stable releases. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf-next 2018-06-05 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Add a new BPF hook for sendmsg similar to existing hooks for bind and connect: "This allows to override source IP (including the case when it's set via cmsg(3)) and destination IP:port for unconnected UDP (slow path). TCP and connected UDP (fast path) are not affected. This makes UDP support complete, that is, connected UDP is handled by connect hooks, unconnected by sendmsg ones.", from Andrey. 2) Rework of the AF_XDP API to allow extending it in future for type writer model if necessary. In this mode a memory window is passed to hardware and multiple frames might be filled into that window instead of just one that is the case in the current fixed frame-size model. With the new changes made this can be supported without having to add a new descriptor format. Also, core bits for the zero-copy support for AF_XDP have been merged as agreed upon, where i40e bits will be routed via Jeff later on. Various improvements to documentation and sample programs included as well, all from Björn and Magnus. 3) Given BPF's flexibility, a new program type has been added to implement infrared decoders. Quote: "The kernel IR decoders support the most widely used IR protocols, but there are many protocols which are not supported. [...] There is a 'long tail' of unsupported IR protocols, for which lircd is need to decode the IR. IR encoding is done in such a way that some simple circuit can decode it; therefore, BPF is ideal. [...] user-space can define a decoder in BPF, attach it to the rc device through the lirc chardev.", from Sean. 4) Several improvements and fixes to BPF core, among others, dumping map and prog IDs into fdinfo which is a straight forward way to correlate BPF objects used by applications, removing an indirect call and therefore retpoline in all map lookup/update/delete calls by invoking the callback directly for 64 bit archs, adding a new bpf_skb_cgroup_id() BPF helper for tc BPF programs to have an efficient way of looking up cgroup v2 id for policy or other use cases. Fixes to make sure we zero tunnel/xfrm state that hasn't been filled, to allow context access wrt pt_regs in 32 bit archs for tracing, and last but not least various test cases for fixes that landed in bpf earlier, from Daniel. 5) Get rid of the ndo_xdp_flush API and extend the ndo_xdp_xmit with a XDP_XMIT_FLUSH flag instead which allows to avoid one indirect call as flushing is now merged directly into ndo_xdp_xmit(), from Jesper. 6) Add a new bpf_get_current_cgroup_id() helper that can be used in tracing to retrieve the cgroup id from the current process in order to allow for e.g. aggregation of container-level events, from Yonghong. 7) Two follow-up fixes for BTF to reject invalid input values and related to that also two test cases for BPF kselftests, from Martin. 8) Various API improvements to the bpf_fib_lookup() helper, that is, dropping MPLS bits which are not fully hashed out yet, rejecting invalid helper flags, returning error for unsupported address families as well as renaming flowlabel to flowinfo, from David. 9) Various fixes and improvements to sockmap BPF kselftests in particular in proper error detection and data verification, from Prashant. 10) Two arm32 BPF JIT improvements. One is to fix imm range check with regards to whether immediate fits into 24 bits, and a naming cleanup to get functions related to rsh handling consistent to those handling lsh, from Wang. 11) Two compile warning fixes in BPF, one for BTF and a false positive to silent gcc in stack_map_get_build_id_offset(), from Arnd. 12) Add missing seg6.h header into tools include infrastructure in order to fix compilation of BPF kselftests, from Mathieu. 13) Several formatting cleanups in the BPF UAPI helper description that also fix an error during rst2man compilation, from Quentin. 14) Hide an unused variable in sk_msg_convert_ctx_access() when IPv6 is not built into the kernel, from Yue. 15) Remove a useless double assignment in dev_map_enqueue(), from Colin. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-04Merge branch '10GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== Intel Wired LAN Driver Updates 2018-06-04 This series contains a smorgasbord of updates to documentation, e1000e, igb, ixgbe, ixgbevf and i40e. Benjamin Poirier fixes a potential kernel crash due to NULL pointer dereference in e1000e. Jeff updates the kernel documentation for e100 and e1000 to correct default values and URLs which were incorrect in the documentation. Also took the time to update these to the new reStructured text format for kernel documentation. Joanna Yurdal fixes a missing PTP transmit timestamp by ensuring that TSICR gets cleared when ICR is cleared. Sergey updates igb to reset all the transmit queues at one time so that we only have to wait once for all the queues to be reset. Alex fixes ixgbevf so that malicious driver detection (MDD) can co-exist with XDP. Emil and Tony extend the RTNL lock to ensure we get the most up-to-date values for the bits and avoid a possible race condition when going down. YueHaibing from Huawei introduces a helper function in ixgbe for operation reads to simplify the code a bit more. Daniel Borkmann adds support for XDP meta data when using build SKB for i40e. Shannon Nelson provides twp fixes for the IPSec code in ixgbe, first is to make sure we do not try to offload the decryption of any incoming packet that is destined for the management engine. The other fix is to resolve a cast problem introduced by a sparse cleanup patch. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-04docs: networking: fix minor typos in various documentation filesOlivier Gayot
This patch fixes some typos/misspelling errors in the Documentation/networking files. Signed-off-by: Olivier Gayot <olivier.gayot@sigexec.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-04net-tcp: extend tcp_tw_reuse sysctl to enable loopback only optimizationMaciej Żenczykowski
This changes the /proc/sys/net/ipv4/tcp_tw_reuse from a boolean to an integer. It now takes the values 0, 1 and 2, where 0 and 1 behave as before, while 2 enables timewait socket reuse only for sockets that we can prove are loopback connections: ie. bound to 'lo' interface or where one of source or destination IPs is 127.0.0.0/8, ::ffff:127.0.0.0/104 or ::1. This enables quicker reuse of ephemeral ports for loopback connections - where tcp_tw_reuse is 100% safe from a protocol perspective (this assumes no artificially induced packet loss on 'lo'). This also makes estblishing many loopback connections *much* faster (allocating ports out of the first half of the ephemeral port range is significantly faster, then allocating from the second half) Without this change in a 32K ephemeral port space my sample program (it just establishes and closes [::1]:ephemeral -> [::1]:server_port connections in a tight loop) fails after 32765 connections in 24 seconds. With it enabled 50000 connections only take 4.7 seconds. This is particularly problematic for IPv6 where we only have one local address and cannot play tricks with varying source IP from 127.0.0.0/8 pool. Signed-off-by: Maciej Żenczykowski <maze@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Wei Wang <weiwan@google.com> Change-Id: I0377961749979d0301b7b62871a32a4b34b654e1 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-04Documentation: e1000: Update kernel documentationJeff Kirsher
Updated the e1000.txt kernel documentation with the latest information. Also convert the text file to reStructuredText (RST) format, since the Linux kernel documentation now uses this format for documentation. Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com>
2018-06-04Documentation: e100: Update the Intel 10/100 driver docJeff Kirsher
Over the years, several of the links have changed or are no longer valid so update them. In addition, the default values were incorrect for a couple of parameters. Converted the text file to the reStructuredText (RST) format, since the Linux kernel documentation now uses this format for documentation. Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com>
2018-06-04xsk: new descriptor addressing schemeBjörn Töpel
Currently, AF_XDP only supports a fixed frame-size memory scheme where each frame is referenced via an index (idx). A user passes the frame index to the kernel, and the kernel acts upon the data. Some NICs, however, do not have a fixed frame-size model, instead they have a model where a memory window is passed to the hardware and multiple frames are filled into that window (referred to as the "type-writer" model). By changing the descriptor format from the current frame index addressing scheme, AF_XDP can in the future be extended to support these kinds of NICs. In the index-based model, an idx refers to a frame of size frame_size. Addressing a frame in the UMEM is done by offseting the UMEM starting address by a global offset, idx * frame_size + offset. Communicating via the fill- and completion-rings are done by means of idx. In this commit, the idx is removed in favor of an address (addr), which is a relative address ranging over the UMEM. To convert an idx-based address to the new addr is simply: addr = idx * frame_size + offset. We also stop referring to the UMEM "frame" as a frame. Instead it is simply called a chunk. To transfer ownership of a chunk to the kernel, the addr of the chunk is passed in the fill-ring. Note, that the kernel will mask addr to make it chunk aligned, so there is no need for userspace to do that. E.g., for a chunk size of 2k, passing an addr of 2048, 2050 or 3000 to the fill-ring will refer to the same chunk. On the completion-ring, the addr will match that of the Tx descriptor, passed to the kernel. Changing the descriptor format to use chunks/addr will allow for future changes to move to a type-writer based model, where multiple frames can reside in one chunk. In this model passing one single chunk into the fill-ring, would potentially result in multiple Rx descriptors. This commit changes the uapi of AF_XDP sockets, and updates the documentation. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-28virtio_net: Extend virtio to use VF datapath when availableSridhar Samudrala
This patch enables virtio_net to switch over to a VF datapath when STANDBY feature is enabled and a VF netdev is present with the same MAC address. It allows live migration of a VM with a direct attached VF without the need to setup a bond/team between a VF and virtio net device in the guest. It uses the API that is exported by the net_failover driver to create and and destroy a master failover netdev. When STANDBY feature is enabled, an additional netdev(failover netdev) is created that acts as a master device and tracks the state of the 2 lower netdevs. The original virtio_net netdev is marked as 'standby' netdev and a passthru device with the same MAC is registered as 'primary' netdev. The hypervisor needs to unplug the VF device from the guest on the source host and reset the MAC filter of the VF to initiate failover of datapath to virtio before starting the migration. After the migration is completed, the destination hypervisor sets the MAC filter on the VF and plugs it back to the guest to switch over to VF datapath. This patch is based on the discussion initiated by Jesse on this thread. https://marc.info/?l=linux-virtualization&m=151189725224231&w=2 Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-28net: Introduce net_failover driverSridhar Samudrala
The net_failover driver provides an automated failover mechanism via APIs to create and destroy a failover master netdev and manages a primary and standby slave netdevs that get registered via the generic failover infrastructure. The failover netdev acts a master device and controls 2 slave devices. The original paravirtual interface gets registered as 'standby' slave netdev and a passthru/vf device with the same MAC gets registered as 'primary' slave netdev. Both 'standby' and 'failover' netdevs are associated with the same 'pci' device. The user accesses the network interface via 'failover' netdev. The 'failover' netdev chooses 'primary' netdev as default for transmits when it is available with link up and running. This can be used by paravirtual drivers to enable an alternate low latency datapath. It also enables hypervisor controlled live migration of a VM with direct attached VF by failing over to the paravirtual datapath when the VF is unplugged. Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-28net: Introduce generic failover moduleSridhar Samudrala
The failover module provides a generic interface for paravirtual drivers to register a netdev and a set of ops with a failover instance. The ops are used as event handlers that get called to handle netdev register/ unregister/link change/name change events on slave pci ethernet devices with the same mac address as the failover netdev. This enables paravirtual drivers to use a VF as an accelerated low latency datapath. It also allows migration of VMs with direct attached VFs by failing over to the paravirtual datapath when the VF is unplugged. Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Lots of easy overlapping changes in the confict resolutions here. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24ppp: remove the PPPIOCDETACH ioctlEric Biggers
The PPPIOCDETACH ioctl effectively tries to "close" the given ppp file before f_count has reached 0, which is fundamentally a bad idea. It does check 'f_count < 2', which excludes concurrent operations on the file since they would only be possible with a shared fd table, in which case each fdget() would take a file reference. However, it fails to account for the fact that even with 'f_count == 1' the file can still be linked into epoll instances. As reported by syzbot, this can trivially be used to cause a use-after-free. Yet, the only known user of PPPIOCDETACH is pppd versions older than ppp-2.4.2, which was released almost 15 years ago (November 2003). Also, PPPIOCDETACH apparently stopped working reliably at around the same time, when the f_count check was added to the kernel, e.g. see https://lkml.org/lkml/2002/12/31/83. Also, the current 'f_count < 2' check makes PPPIOCDETACH only work in single-threaded applications; it always fails if called from a multithreaded application. All pppd versions released in the last 15 years just close() the file descriptor instead. Therefore, instead of hacking around this bug by exporting epoll internals to modules, and probably missing other related bugs, just remove the PPPIOCDETACH ioctl and see if anyone actually notices. Leave a stub in place that prints a one-time warning and returns EINVAL. Reported-by: syzbot+16363c99d4134717c05b@syzkaller.appspotmail.com Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> Reviewed-by: Guillaume Nault <g.nault@alphalink.fr> Tested-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-18tcp: add tcp_comp_sack_nr sysctlEric Dumazet
This per netns sysctl allows for TCP SACK compression fine-tuning. This limits number of SACK that can be compressed. Using 0 disables SACK compression. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-18tcp: add tcp_comp_sack_delay_ns sysctlEric Dumazet
This per netns sysctl allows for TCP SACK compression fine-tuning. Its default value is 1,000,000, or 1 ms to meet TSO autosizing period. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>