aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/networking
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/networking')
-rw-r--r--Documentation/networking/af_xdp.rst26
-rw-r--r--Documentation/networking/bonding.txt16
-rw-r--r--Documentation/networking/caif/caif.rst (renamed from Documentation/networking/caif/README)88
-rw-r--r--Documentation/networking/conf.py10
-rw-r--r--Documentation/networking/device_drivers/amazon/ena.txt5
-rw-r--r--Documentation/networking/device_drivers/aquantia/atlantic.txt439
-rw-r--r--Documentation/networking/device_drivers/freescale/dpaa2/dpio-driver.rst4
-rw-r--r--Documentation/networking/device_drivers/google/gve.rst123
-rw-r--r--Documentation/networking/device_drivers/index.rst6
-rw-r--r--Documentation/networking/device_drivers/intel/e100.rst14
-rw-r--r--Documentation/networking/device_drivers/intel/e1000.rst12
-rw-r--r--Documentation/networking/device_drivers/intel/e1000e.rst14
-rw-r--r--Documentation/networking/device_drivers/intel/fm10k.rst10
-rw-r--r--Documentation/networking/device_drivers/intel/i40e.rst8
-rw-r--r--Documentation/networking/device_drivers/intel/iavf.rst123
-rw-r--r--Documentation/networking/device_drivers/intel/ice.rst6
-rw-r--r--Documentation/networking/device_drivers/intel/igb.rst12
-rw-r--r--Documentation/networking/device_drivers/intel/igbvf.rst6
-rw-r--r--Documentation/networking/device_drivers/intel/ixgbe.rst10
-rw-r--r--Documentation/networking/device_drivers/intel/ixgbevf.rst6
-rw-r--r--Documentation/networking/device_drivers/mellanox/mlx5.rst300
-rw-r--r--Documentation/networking/device_drivers/netronome/nfp.rst133
-rw-r--r--Documentation/networking/device_drivers/pensando/ionic.rst45
-rw-r--r--Documentation/networking/devlink-info-versions.rst16
-rw-r--r--Documentation/networking/devlink-params-nfp.txt5
-rw-r--r--Documentation/networking/devlink-params.txt16
-rw-r--r--Documentation/networking/devlink-trap-netdevsim.rst20
-rw-r--r--Documentation/networking/devlink-trap.rst209
-rw-r--r--Documentation/networking/dsa/b53.rst183
-rw-r--r--Documentation/networking/dsa/configuration.rst292
-rw-r--r--Documentation/networking/dsa/dsa.rst4
-rw-r--r--Documentation/networking/dsa/index.rst2
-rw-r--r--Documentation/networking/dsa/sja1105.rst96
-rw-r--r--Documentation/networking/index.rst5
-rw-r--r--Documentation/networking/ip-sysctl.txt63
-rw-r--r--Documentation/networking/j1939.rst422
-rw-r--r--Documentation/networking/mac80211_hwsim/mac80211_hwsim.rst (renamed from Documentation/networking/mac80211_hwsim/README)28
-rw-r--r--Documentation/networking/mpls-sysctl.txt2
-rw-r--r--Documentation/networking/net_dim.txt36
-rw-r--r--Documentation/networking/phy.rst45
-rw-r--r--Documentation/networking/sfp-phylink.rst8
-rw-r--r--Documentation/networking/timestamping.txt2
-rw-r--r--Documentation/networking/tls-offload.rst100
-rw-r--r--Documentation/networking/tuntap.txt4
44 files changed, 2763 insertions, 211 deletions
diff --git a/Documentation/networking/af_xdp.rst b/Documentation/networking/af_xdp.rst
index 50bccbf68308..83f7ae5fc045 100644
--- a/Documentation/networking/af_xdp.rst
+++ b/Documentation/networking/af_xdp.rst
@@ -153,10 +153,12 @@ an example, if the UMEM is 64k and each chunk is 4k, then the UMEM has
Frames passed to the kernel are used for the ingress path (RX rings).
-The user application produces UMEM addrs to this ring. Note that the
-kernel will mask the incoming addr. E.g. for a chunk size of 2k, the
-log2(2048) LSB of the addr will be masked off, meaning that 2048, 2050
-and 3000 refers to the same chunk.
+The user application produces UMEM addrs to this ring. Note that, if
+running the application with aligned chunk mode, the kernel will mask
+the incoming addr. E.g. for a chunk size of 2k, the log2(2048) LSB of
+the addr will be masked off, meaning that 2048, 2050 and 3000 refers
+to the same chunk. If the user application is run in the unaligned
+chunks mode, then the incoming addr will be left untouched.
UMEM Completion Ring
@@ -220,7 +222,21 @@ Usage
In order to use AF_XDP sockets there are two parts needed. The
user-space application and the XDP program. For a complete setup and
usage example, please refer to the sample application. The user-space
-side is xdpsock_user.c and the XDP side xdpsock_kern.c.
+side is xdpsock_user.c and the XDP side is part of libbpf.
+
+The XDP code sample included in tools/lib/bpf/xsk.c is the following::
+
+ SEC("xdp_sock") int xdp_sock_prog(struct xdp_md *ctx)
+ {
+ int index = ctx->rx_queue_index;
+
+ // A set entry here means that the correspnding queue_id
+ // has an active AF_XDP socket bound to it.
+ if (bpf_map_lookup_elem(&xsks_map, &index))
+ return bpf_redirect_map(&xsks_map, index, 0);
+
+ return XDP_PASS;
+ }
Naive ring dequeue and enqueue could look like this::
diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
index d3e5dd26db12..e3abfbd32f71 100644
--- a/Documentation/networking/bonding.txt
+++ b/Documentation/networking/bonding.txt
@@ -706,9 +706,9 @@ num_unsol_na
unsolicited IPv6 Neighbor Advertisements) to be issued after a
failover event. As soon as the link is up on the new slave
(possibly immediately) a peer notification is sent on the
- bonding device and each VLAN sub-device. This is repeated at
- each link monitor interval (arp_interval or miimon, whichever
- is active) if the number is greater than 1.
+ bonding device and each VLAN sub-device. This is repeated at
+ the rate specified by peer_notif_delay if the number is
+ greater than 1.
The valid range is 0 - 255; the default value is 1. These options
affect only the active-backup mode. These options were added for
@@ -727,6 +727,16 @@ packets_per_slave
The valid range is 0 - 65535; the default value is 1. This option
has effect only in balance-rr mode.
+peer_notif_delay
+
+ Specify the delay, in milliseconds, between each peer
+ notification (gratuitous ARP and unsolicited IPv6 Neighbor
+ Advertisement) when they are issued after a failover event.
+ This delay should be a multiple of the link monitor interval
+ (arp_interval or miimon, whichever is active). The default
+ value is 0 which means to match the value of the link monitor
+ interval.
+
primary
A string (eth0, eth2, etc) specifying which slave is the
diff --git a/Documentation/networking/caif/README b/Documentation/networking/caif/caif.rst
index 757ccfaa1385..07afc8063d4d 100644
--- a/Documentation/networking/caif/README
+++ b/Documentation/networking/caif/caif.rst
@@ -1,18 +1,31 @@
-Copyright (C) ST-Ericsson AB 2010
-Author: Sjur Brendeland/ sjur.brandeland@stericsson.com
-License terms: GNU General Public License (GPL) version 2
----------------------------------------------------------
+:orphan:
-=== Start ===
-If you have compiled CAIF for modules do:
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: <isonum.txt>
-$modprobe crc_ccitt
-$modprobe caif
-$modprobe caif_socket
-$modprobe chnl_net
+================
+Using Linux CAIF
+================
-=== Preparing the setup with a STE modem ===
+
+:Copyright: |copy| ST-Ericsson AB 2010
+
+:Author: Sjur Brendeland/ sjur.brandeland@stericsson.com
+
+Start
+=====
+
+If you have compiled CAIF for modules do::
+
+ $modprobe crc_ccitt
+ $modprobe caif
+ $modprobe caif_socket
+ $modprobe chnl_net
+
+
+Preparing the setup with a STE modem
+====================================
If you are working on integration of CAIF you should make sure
that the kernel is built with module support.
@@ -32,24 +45,30 @@ module parameter "ser_use_stx".
Normally Frame Checksum is always used on UART, but this is also provided as a
module parameter "ser_use_fcs".
-$ modprobe caif_serial ser_ttyname=/dev/ttyS0 ser_use_stx=yes
-$ ifconfig caif_ttyS0 up
+::
+
+ $ modprobe caif_serial ser_ttyname=/dev/ttyS0 ser_use_stx=yes
+ $ ifconfig caif_ttyS0 up
-PLEASE NOTE: There is a limitation in Android shell.
+PLEASE NOTE:
+ There is a limitation in Android shell.
It only accepts one argument to insmod/modprobe!
-=== Trouble shooting ===
+Trouble shooting
+================
There are debugfs parameters provided for serial communication.
/sys/kernel/debug/caif_serial/<tty-name>/
* ser_state: Prints the bit-mask status where
+
- 0x02 means SENDING, this is a transient state.
- 0x10 means FLOW_OFF_SENT, i.e. the previous frame has not been sent
- and is blocking further send operation. Flow OFF has been propagated
- to all CAIF Channels using this TTY.
+ and is blocking further send operation. Flow OFF has been propagated
+ to all CAIF Channels using this TTY.
* tty_status: Prints the bit-mask tty status information
+
- 0x01 - tty->warned is on.
- 0x02 - tty->low_latency is on.
- 0x04 - tty->packed is on.
@@ -58,13 +77,17 @@ There are debugfs parameters provided for serial communication.
- 0x20 - tty->stopped is on.
* last_tx_msg: Binary blob Prints the last transmitted frame.
- This can be printed with
+
+ This can be printed with::
+
$od --format=x1 /sys/kernel/debug/caif_serial/<tty>/last_rx_msg.
- The first two tx messages sent look like this. Note: The initial
- byte 02 is start of frame extension (STX) used for re-syncing
- upon errors.
- - Enumeration:
+ The first two tx messages sent look like this. Note: The initial
+ byte 02 is start of frame extension (STX) used for re-syncing
+ upon errors.
+
+ - Enumeration::
+
0000000 02 05 00 00 03 01 d2 02
| | | | | |
STX(1) | | | |
@@ -73,7 +96,9 @@ There are debugfs parameters provided for serial communication.
Command:Enumeration(1)
Link-ID(1)
Checksum(2)
- - Channel Setup:
+
+ - Channel Setup::
+
0000000 02 07 00 00 00 21 a1 00 48 df
| | | | | | | |
STX(1) | | | | | |
@@ -86,13 +111,18 @@ There are debugfs parameters provided for serial communication.
Checksum(2)
* last_rx_msg: Prints the last transmitted frame.
- The RX messages for LinkSetup look almost identical but they have the
- bit 0x20 set in the command bit, and Channel Setup has added one byte
- before Checksum containing Channel ID.
- NOTE: Several CAIF Messages might be concatenated. The maximum debug
+
+ The RX messages for LinkSetup look almost identical but they have the
+ bit 0x20 set in the command bit, and Channel Setup has added one byte
+ before Checksum containing Channel ID.
+
+ NOTE:
+ Several CAIF Messages might be concatenated. The maximum debug
buffer size is 128 bytes.
-== Error Scenarios:
+Error Scenarios
+===============
+
- last_tx_msg contains channel setup message and last_rx_msg is empty ->
The host seems to be able to send over the UART, at least the CAIF ldisc get
notified that sending is completed.
@@ -103,7 +133,9 @@ There are debugfs parameters provided for serial communication.
- if /sys/kernel/debug/caif_serial/<tty>/tty_status is non-zero there
might be problems transmitting over UART.
+
E.g. host and modem wiring is not correct you will typically see
tty_status = 0x10 (hw_stopped) and ser_state = 0x10 (FLOW_OFF_SENT).
+
You will probably see the enumeration message in last_tx_message
and empty last_rx_message.
diff --git a/Documentation/networking/conf.py b/Documentation/networking/conf.py
deleted file mode 100644
index 40f69e67a883..000000000000
--- a/Documentation/networking/conf.py
+++ /dev/null
@@ -1,10 +0,0 @@
-# -*- coding: utf-8; mode: python -*-
-
-project = "Linux Networking Documentation"
-
-tags.add("subproject")
-
-latex_documents = [
- ('index', 'networking.tex', project,
- 'The kernel development community', 'manual'),
-]
diff --git a/Documentation/networking/device_drivers/amazon/ena.txt b/Documentation/networking/device_drivers/amazon/ena.txt
index 2b4b6f57e549..1bb55c7b604c 100644
--- a/Documentation/networking/device_drivers/amazon/ena.txt
+++ b/Documentation/networking/device_drivers/amazon/ena.txt
@@ -73,7 +73,7 @@ operation.
AQ is used for submitting management commands, and the
results/responses are reported asynchronously through ACQ.
-ENA introduces a very small set of management commands with room for
+ENA introduces a small set of management commands with room for
vendor-specific extensions. Most of the management operations are
framed in a generic Get/Set feature command.
@@ -202,11 +202,14 @@ delay value to each level.
The user can enable/disable adaptive moderation, modify the interrupt
delay table and restore its default values through sysfs.
+RX copybreak:
+=============
The rx_copybreak is initialized by default to ENA_DEFAULT_RX_COPYBREAK
and can be configured by the ETHTOOL_STUNABLE command of the
SIOCETHTOOL ioctl.
SKB:
+====
The driver-allocated SKB for frames received from Rx handling using
NAPI context. The allocation method depends on the size of the packet.
If the frame length is larger than rx_copybreak, napi_get_frags()
diff --git a/Documentation/networking/device_drivers/aquantia/atlantic.txt b/Documentation/networking/device_drivers/aquantia/atlantic.txt
new file mode 100644
index 000000000000..d235cbaeccc6
--- /dev/null
+++ b/Documentation/networking/device_drivers/aquantia/atlantic.txt
@@ -0,0 +1,439 @@
+aQuantia AQtion Driver for the aQuantia Multi-Gigabit PCI Express Family of
+Ethernet Adapters
+=============================================================================
+
+Contents
+========
+
+- Identifying Your Adapter
+- Configuration
+- Supported ethtool options
+- Command Line Parameters
+- Config file parameters
+- Support
+- License
+
+Identifying Your Adapter
+========================
+
+The driver in this release is compatible with AQC-100, AQC-107, AQC-108 based ethernet adapters.
+
+
+SFP+ Devices (for AQC-100 based adapters)
+----------------------------------
+
+This release tested with passive Direct Attach Cables (DAC) and SFP+/LC Optical Transceiver.
+
+Configuration
+=========================
+ Viewing Link Messages
+ ---------------------
+ Link messages will not be displayed to the console if the distribution is
+ restricting system messages. In order to see network driver link messages on
+ your console, set dmesg to eight by entering the following:
+
+ dmesg -n 8
+
+ NOTE: This setting is not saved across reboots.
+
+ Jumbo Frames
+ ------------
+ The driver supports Jumbo Frames for all adapters. Jumbo Frames support is
+ enabled by changing the MTU to a value larger than the default of 1500.
+ The maximum value for the MTU is 16000. Use the `ip` command to
+ increase the MTU size. For example:
+
+ ip link set mtu 16000 dev enp1s0
+
+ ethtool
+ -------
+ The driver utilizes the ethtool interface for driver configuration and
+ diagnostics, as well as displaying statistical information. The latest
+ ethtool version is required for this functionality.
+
+ NAPI
+ ----
+ NAPI (Rx polling mode) is supported in the atlantic driver.
+
+Supported ethtool options
+============================
+ Viewing adapter settings
+ ---------------------
+ ethtool <ethX>
+
+ Output example:
+
+ Settings for enp1s0:
+ Supported ports: [ TP ]
+ Supported link modes: 100baseT/Full
+ 1000baseT/Full
+ 10000baseT/Full
+ 2500baseT/Full
+ 5000baseT/Full
+ Supported pause frame use: Symmetric
+ Supports auto-negotiation: Yes
+ Supported FEC modes: Not reported
+ Advertised link modes: 100baseT/Full
+ 1000baseT/Full
+ 10000baseT/Full
+ 2500baseT/Full
+ 5000baseT/Full
+ Advertised pause frame use: Symmetric
+ Advertised auto-negotiation: Yes
+ Advertised FEC modes: Not reported
+ Speed: 10000Mb/s
+ Duplex: Full
+ Port: Twisted Pair
+ PHYAD: 0
+ Transceiver: internal
+ Auto-negotiation: on
+ MDI-X: Unknown
+ Supports Wake-on: g
+ Wake-on: d
+ Link detected: yes
+
+ ---
+ Note: AQrate speeds (2.5/5 Gb/s) will be displayed only with linux kernels > 4.10.
+ But you can still use these speeds:
+ ethtool -s eth0 autoneg off speed 2500
+
+ Viewing adapter information
+ ---------------------
+ ethtool -i <ethX>
+
+ Output example:
+
+ driver: atlantic
+ version: 5.2.0-050200rc5-generic-kern
+ firmware-version: 3.1.78
+ expansion-rom-version:
+ bus-info: 0000:01:00.0
+ supports-statistics: yes
+ supports-test: no
+ supports-eeprom-access: no
+ supports-register-dump: yes
+ supports-priv-flags: no
+
+
+ Viewing Ethernet adapter statistics:
+ ---------------------
+ ethtool -S <ethX>
+
+ Output example:
+ NIC statistics:
+ InPackets: 13238607
+ InUCast: 13293852
+ InMCast: 52
+ InBCast: 3
+ InErrors: 0
+ OutPackets: 23703019
+ OutUCast: 23704941
+ OutMCast: 67
+ OutBCast: 11
+ InUCastOctects: 213182760
+ OutUCastOctects: 22698443
+ InMCastOctects: 6600
+ OutMCastOctects: 8776
+ InBCastOctects: 192
+ OutBCastOctects: 704
+ InOctects: 2131839552
+ OutOctects: 226938073
+ InPacketsDma: 95532300
+ OutPacketsDma: 59503397
+ InOctetsDma: 1137102462
+ OutOctetsDma: 2394339518
+ InDroppedDma: 0
+ Queue[0] InPackets: 23567131
+ Queue[0] OutPackets: 20070028
+ Queue[0] InJumboPackets: 0
+ Queue[0] InLroPackets: 0
+ Queue[0] InErrors: 0
+ Queue[1] InPackets: 45428967
+ Queue[1] OutPackets: 11306178
+ Queue[1] InJumboPackets: 0
+ Queue[1] InLroPackets: 0
+ Queue[1] InErrors: 0
+ Queue[2] InPackets: 3187011
+ Queue[2] OutPackets: 13080381
+ Queue[2] InJumboPackets: 0
+ Queue[2] InLroPackets: 0
+ Queue[2] InErrors: 0
+ Queue[3] InPackets: 23349136
+ Queue[3] OutPackets: 15046810
+ Queue[3] InJumboPackets: 0
+ Queue[3] InLroPackets: 0
+ Queue[3] InErrors: 0
+
+ Interrupt coalescing support
+ ---------------------------------
+ ITR mode, TX/RX coalescing timings could be viewed with:
+
+ ethtool -c <ethX>
+
+ and changed with:
+
+ ethtool -C <ethX> tx-usecs <usecs> rx-usecs <usecs>
+
+ To disable coalescing:
+
+ ethtool -C <ethX> tx-usecs 0 rx-usecs 0 tx-max-frames 1 tx-max-frames 1
+
+ Wake on LAN support
+ ---------------------------------
+
+ WOL support by magic packet:
+
+ ethtool -s <ethX> wol g
+
+ To disable WOL:
+
+ ethtool -s <ethX> wol d
+
+ Set and check the driver message level
+ ---------------------------------
+
+ Set message level
+
+ ethtool -s <ethX> msglvl <level>
+
+ Level values:
+
+ 0x0001 - general driver status.
+ 0x0002 - hardware probing.
+ 0x0004 - link state.
+ 0x0008 - periodic status check.
+ 0x0010 - interface being brought down.
+ 0x0020 - interface being brought up.
+ 0x0040 - receive error.
+ 0x0080 - transmit error.
+ 0x0200 - interrupt handling.
+ 0x0400 - transmit completion.
+ 0x0800 - receive completion.
+ 0x1000 - packet contents.
+ 0x2000 - hardware status.
+ 0x4000 - Wake-on-LAN status.
+
+ By default, the level of debugging messages is set 0x0001(general driver status).
+
+ Check message level
+
+ ethtool <ethX> | grep "Current message level"
+
+ If you want to disable the output of messages
+
+ ethtool -s <ethX> msglvl 0
+
+ RX flow rules (ntuple filters)
+ ---------------------------------
+ There are separate rules supported, that applies in that order:
+ 1. 16 VLAN ID rules
+ 2. 16 L2 EtherType rules
+ 3. 8 L3/L4 5-Tuple rules
+
+
+ The driver utilizes the ethtool interface for configuring ntuple filters,
+ via "ethtool -N <device> <filter>".
+
+ To enable or disable the RX flow rules:
+
+ ethtool -K ethX ntuple <on|off>
+
+ When disabling ntuple filters, all the user programed filters are
+ flushed from the driver cache and hardware. All needed filters must
+ be re-added when ntuple is re-enabled.
+
+ Because of the fixed order of the rules, the location of filters is also fixed:
+ - Locations 0 - 15 for VLAN ID filters
+ - Locations 16 - 31 for L2 EtherType filters
+ - Locations 32 - 39 for L3/L4 5-tuple filters (locations 32, 36 for IPv6)
+
+ The L3/L4 5-tuple (protocol, source and destination IP address, source and
+ destination TCP/UDP/SCTP port) is compared against 8 filters. For IPv4, up to
+ 8 source and destination addresses can be matched. For IPv6, up to 2 pairs of
+ addresses can be supported. Source and destination ports are only compared for
+ TCP/UDP/SCTP packets.
+
+ To add a filter that directs packet to queue 5, use <-N|-U|--config-nfc|--config-ntuple> switch:
+
+ ethtool -N <ethX> flow-type udp4 src-ip 10.0.0.1 dst-ip 10.0.0.2 src-port 2000 dst-port 2001 action 5 <loc 32>
+
+ - action is the queue number.
+ - loc is the rule number.
+
+ For "flow-type ip4|udp4|tcp4|sctp4|ip6|udp6|tcp6|sctp6" you must set the loc
+ number within 32 - 39.
+ For "flow-type ip4|udp4|tcp4|sctp4|ip6|udp6|tcp6|sctp6" you can set 8 rules
+ for traffic IPv4 or you can set 2 rules for traffic IPv6. Loc number traffic
+ IPv6 is 32 and 36.
+ At the moment you can not use IPv4 and IPv6 filters at the same time.
+
+ Example filter for IPv6 filter traffic:
+
+ sudo ethtool -N <ethX> flow-type tcp6 src-ip 2001:db8:0:f101::1 dst-ip 2001:db8:0:f101::2 action 1 loc 32
+ sudo ethtool -N <ethX> flow-type ip6 src-ip 2001:db8:0:f101::2 dst-ip 2001:db8:0:f101::5 action -1 loc 36
+
+ Example filter for IPv4 filter traffic:
+
+ sudo ethtool -N <ethX> flow-type udp4 src-ip 10.0.0.4 dst-ip 10.0.0.7 src-port 2000 dst-port 2001 loc 32
+ sudo ethtool -N <ethX> flow-type tcp4 src-ip 10.0.0.3 dst-ip 10.0.0.9 src-port 2000 dst-port 2001 loc 33
+ sudo ethtool -N <ethX> flow-type ip4 src-ip 10.0.0.6 dst-ip 10.0.0.4 loc 34
+
+ If you set action -1, then all traffic corresponding to the filter will be discarded.
+ The maximum value action is 31.
+
+
+ The VLAN filter (VLAN id) is compared against 16 filters.
+ VLAN id must be accompanied by mask 0xF000. That is to distinguish VLAN filter
+ from L2 Ethertype filter with UserPriority since both User Priority and VLAN ID
+ are passed in the same 'vlan' parameter.
+
+ To add a filter that directs packets from VLAN 2001 to queue 5:
+ ethtool -N <ethX> flow-type ip4 vlan 2001 m 0xF000 action 1 loc 0
+
+
+ L2 EtherType filters allows filter packet by EtherType field or both EtherType
+ and User Priority (PCP) field of 802.1Q.
+ UserPriority (vlan) parameter must be accompanied by mask 0x1FFF. That is to
+ distinguish VLAN filter from L2 Ethertype filter with UserPriority since both
+ User Priority and VLAN ID are passed in the same 'vlan' parameter.
+
+ To add a filter that directs IP4 packess of priority 3 to queue 3:
+ ethtool -N <ethX> flow-type ether proto 0x800 vlan 0x600 m 0x1FFF action 3 loc 16
+
+
+ To see the list of filters currently present:
+
+ ethtool <-u|-n|--show-nfc|--show-ntuple> <ethX>
+
+ Rules may be deleted from the table itself. This is done using:
+
+ sudo ethtool <-N|-U|--config-nfc|--config-ntuple> <ethX> delete <loc>
+
+ - loc is the rule number to be deleted.
+
+ Rx filters is an interface to load the filter table that funnels all flow
+ into queue 0 unless an alternative queue is specified using "action". In that
+ case, any flow that matches the filter criteria will be directed to the
+ appropriate queue. RX filters is supported on all kernels 2.6.30 and later.
+
+ RSS for UDP
+ ---------------------------------
+ Currently, NIC does not support RSS for fragmented IP packets, which leads to
+ incorrect working of RSS for fragmented UDP traffic. To disable RSS for UDP the
+ RX Flow L3/L4 rule may be used.
+
+ Example:
+ ethtool -N eth0 flow-type udp4 action 0 loc 32
+
+Command Line Parameters
+=======================
+The following command line parameters are available on atlantic driver:
+
+aq_itr -Interrupt throttling mode
+----------------------------------------
+Accepted values: 0, 1, 0xFFFF
+Default value: 0xFFFF
+0 - Disable interrupt throttling.
+1 - Enable interrupt throttling and use specified tx and rx rates.
+0xFFFF - Auto throttling mode. Driver will choose the best RX and TX
+ interrupt throtting settings based on link speed.
+
+aq_itr_tx - TX interrupt throttle rate
+----------------------------------------
+Accepted values: 0 - 0x1FF
+Default value: 0
+TX side throttling in microseconds. Adapter will setup maximum interrupt delay
+to this value. Minimum interrupt delay will be a half of this value
+
+aq_itr_rx - RX interrupt throttle rate
+----------------------------------------
+Accepted values: 0 - 0x1FF
+Default value: 0
+RX side throttling in microseconds. Adapter will setup maximum interrupt delay
+to this value. Minimum interrupt delay will be a half of this value
+
+Note: ITR settings could be changed in runtime by ethtool -c means (see below)
+
+Config file parameters
+=======================
+For some fine tuning and performance optimizations,
+some parameters can be changed in the {source_dir}/aq_cfg.h file.
+
+AQ_CFG_RX_PAGEORDER
+----------------------------------------
+Default value: 0
+RX page order override. Thats a power of 2 number of RX pages allocated for
+each descriptor. Received descriptor size is still limited by AQ_CFG_RX_FRAME_MAX.
+Increasing pageorder makes page reuse better (actual on iommu enabled systems).
+
+AQ_CFG_RX_REFILL_THRES
+----------------------------------------
+Default value: 32
+RX refill threshold. RX path will not refill freed descriptors until the
+specified number of free descriptors is observed. Larger values may help
+better page reuse but may lead to packet drops as well.
+
+AQ_CFG_VECS_DEF
+------------------------------------------------------------
+Number of queues
+Valid Range: 0 - 8 (up to AQ_CFG_VECS_MAX)
+Default value: 8
+Notice this value will be capped by the number of cores available on the system.
+
+AQ_CFG_IS_RSS_DEF
+------------------------------------------------------------
+Enable/disable Receive Side Scaling
+
+This feature allows the adapter to distribute receive processing
+across multiple CPU-cores and to prevent from overloading a single CPU core.
+
+Valid values
+0 - disabled
+1 - enabled
+
+Default value: 1
+
+AQ_CFG_NUM_RSS_QUEUES_DEF
+------------------------------------------------------------
+Number of queues for Receive Side Scaling
+Valid Range: 0 - 8 (up to AQ_CFG_VECS_DEF)
+
+Default value: AQ_CFG_VECS_DEF
+
+AQ_CFG_IS_LRO_DEF
+------------------------------------------------------------
+Enable/disable Large Receive Offload
+
+This offload enables the adapter to coalesce multiple TCP segments and indicate
+them as a single coalesced unit to the OS networking subsystem.
+The system consumes less energy but it also introduces more latency in packets processing.
+
+Valid values
+0 - disabled
+1 - enabled
+
+Default value: 1
+
+AQ_CFG_TX_CLEAN_BUDGET
+----------------------------------------
+Maximum descriptors to cleanup on TX at once.
+Default value: 256
+
+After the aq_cfg.h file changed the driver must be rebuilt to take effect.
+
+Support
+=======
+
+If an issue is identified with the released source code on the supported
+kernel with a supported adapter, email the specific information related
+to the issue to support@aquantia.com
+
+License
+=======
+
+aQuantia Corporation Network Driver
+Copyright(c) 2014 - 2019 aQuantia Corporation.
+
+This program is free software; you can redistribute it and/or modify it
+under the terms and conditions of the GNU General Public License,
+version 2, as published by the Free Software Foundation.
diff --git a/Documentation/networking/device_drivers/freescale/dpaa2/dpio-driver.rst b/Documentation/networking/device_drivers/freescale/dpaa2/dpio-driver.rst
index 5045df990a4c..17dbee1ac53e 100644
--- a/Documentation/networking/device_drivers/freescale/dpaa2/dpio-driver.rst
+++ b/Documentation/networking/device_drivers/freescale/dpaa2/dpio-driver.rst
@@ -39,8 +39,7 @@ The Linux DPIO driver consists of 3 primary components--
DPIO service-- provides APIs to other Linux drivers for services
- QBman portal interface-- sends portal commands, gets responses
-::
+ QBman portal interface-- sends portal commands, gets responses::
fsl-mc other
bus drivers
@@ -60,6 +59,7 @@ The Linux DPIO driver consists of 3 primary components--
The diagram below shows how the DPIO driver components fit with the other
DPAA2 Linux driver components::
+
+------------+
| OS Network |
| Stack |
diff --git a/Documentation/networking/device_drivers/google/gve.rst b/Documentation/networking/device_drivers/google/gve.rst
new file mode 100644
index 000000000000..793693cef6e3
--- /dev/null
+++ b/Documentation/networking/device_drivers/google/gve.rst
@@ -0,0 +1,123 @@
+.. SPDX-License-Identifier: GPL-2.0+
+
+==============================================================
+Linux kernel driver for Compute Engine Virtual Ethernet (gve):
+==============================================================
+
+Supported Hardware
+===================
+The GVE driver binds to a single PCI device id used by the virtual
+Ethernet device found in some Compute Engine VMs.
+
++--------------+----------+---------+
+|Field | Value | Comments|
++==============+==========+=========+
+|Vendor ID | `0x1AE0` | Google |
++--------------+----------+---------+
+|Device ID | `0x0042` | |
++--------------+----------+---------+
+|Sub-vendor ID | `0x1AE0` | Google |
++--------------+----------+---------+
+|Sub-device ID | `0x0058` | |
++--------------+----------+---------+
+|Revision ID | `0x0` | |
++--------------+----------+---------+
+|Device Class | `0x200` | Ethernet|
++--------------+----------+---------+
+
+PCI Bars
+========
+The gVNIC PCI device exposes three 32-bit memory BARS:
+- Bar0 - Device configuration and status registers.
+- Bar1 - MSI-X vector table
+- Bar2 - IRQ, RX and TX doorbells
+
+Device Interactions
+===================
+The driver interacts with the device in the following ways:
+ - Registers
+ - A block of MMIO registers
+ - See gve_register.h for more detail
+ - Admin Queue
+ - See description below
+ - Reset
+ - At any time the device can be reset
+ - Interrupts
+ - See supported interrupts below
+ - Transmit and Receive Queues
+ - See description below
+
+Registers
+---------
+All registers are MMIO and big endian.
+
+The registers are used for initializing and configuring the device as well as
+querying device status in response to management interrupts.
+
+Admin Queue (AQ)
+----------------
+The Admin Queue is a PAGE_SIZE memory block, treated as an array of AQ
+commands, used by the driver to issue commands to the device and set up
+resources.The driver and the device maintain a count of how many commands
+have been submitted and executed. To issue AQ commands, the driver must do
+the following (with proper locking):
+
+1) Copy new commands into next available slots in the AQ array
+2) Increment its counter by he number of new commands
+3) Write the counter into the GVE_ADMIN_QUEUE_DOORBELL register
+4) Poll the ADMIN_QUEUE_EVENT_COUNTER register until it equals
+ the value written to the doorbell, or until a timeout.
+
+The device will update the status field in each AQ command reported as
+executed through the ADMIN_QUEUE_EVENT_COUNTER register.
+
+Device Resets
+-------------
+A device reset is triggered by writing 0x0 to the AQ PFN register.
+This causes the device to release all resources allocated by the
+driver, including the AQ itself.
+
+Interrupts
+----------
+The following interrupts are supported by the driver:
+
+Management Interrupt
+~~~~~~~~~~~~~~~~~~~~
+The management interrupt is used by the device to tell the driver to
+look at the GVE_DEVICE_STATUS register.
+
+The handler for the management irq simply queues the service task in
+the workqueue to check the register and acks the irq.
+
+Notification Block Interrupts
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+The notification block interrupts are used to tell the driver to poll
+the queues associated with that interrupt.
+
+The handler for these irqs schedule the napi for that block to run
+and poll the queues.
+
+Traffic Queues
+--------------
+gVNIC's queues are composed of a descriptor ring and a buffer and are
+assigned to a notification block.
+
+The descriptor rings are power-of-two-sized ring buffers consisting of
+fixed-size descriptors. They advance their head pointer using a __be32
+doorbell located in Bar2. The tail pointers are advanced by consuming
+descriptors in-order and updating a __be32 counter. Both the doorbell
+and the counter overflow to zero.
+
+Each queue's buffers must be registered in advance with the device as a
+queue page list, and packet data can only be put in those pages.
+
+Transmit
+~~~~~~~~
+gve maps the buffers for transmit rings into a FIFO and copies the packets
+into the FIFO before sending them to the NIC.
+
+Receive
+~~~~~~~
+The buffers for receive rings are put into a data ring that is the same
+length as the descriptor ring and the head and tail pointers advance over
+the rings together.
diff --git a/Documentation/networking/device_drivers/index.rst b/Documentation/networking/device_drivers/index.rst
index 75fa537763a4..c1f7f75e5fd9 100644
--- a/Documentation/networking/device_drivers/index.rst
+++ b/Documentation/networking/device_drivers/index.rst
@@ -21,8 +21,12 @@ Contents:
intel/i40e
intel/iavf
intel/ice
+ google/gve
+ mellanox/mlx5
+ netronome/nfp
+ pensando/ionic
-.. only:: subproject
+.. only:: subproject and html
Indices
=======
diff --git a/Documentation/networking/device_drivers/intel/e100.rst b/Documentation/networking/device_drivers/intel/e100.rst
index 2b9f4887beda..caf023cc88de 100644
--- a/Documentation/networking/device_drivers/intel/e100.rst
+++ b/Documentation/networking/device_drivers/intel/e100.rst
@@ -1,8 +1,8 @@
.. SPDX-License-Identifier: GPL-2.0+
-==============================================================
-Linux* Base Driver for the Intel(R) PRO/100 Family of Adapters
-==============================================================
+=============================================================
+Linux Base Driver for the Intel(R) PRO/100 Family of Adapters
+=============================================================
June 1, 2018
@@ -21,7 +21,7 @@ Contents
In This Release
===============
-This file describes the Linux* Base Driver for the Intel(R) PRO/100 Family of
+This file describes the Linux Base Driver for the Intel(R) PRO/100 Family of
Adapters. This driver includes support for Itanium(R)2-based systems.
For questions related to hardware requirements, refer to the documentation
@@ -138,9 +138,9 @@ version 1.6 or later is required for this functionality.
The latest release of ethtool can be found from
https://www.kernel.org/pub/software/network/ethtool/
-Enabling Wake on LAN* (WoL)
----------------------------
-WoL is provided through the ethtool* utility. For instructions on
+Enabling Wake on LAN (WoL)
+--------------------------
+WoL is provided through the ethtool utility. For instructions on
enabling WoL with ethtool, refer to the ethtool man page. WoL will be
enabled on the system during the next shut down or reboot. For this
driver version, in order to enable WoL, the e100 driver must be loaded
diff --git a/Documentation/networking/device_drivers/intel/e1000.rst b/Documentation/networking/device_drivers/intel/e1000.rst
index 956560b6e745..4aaae0f7d6ba 100644
--- a/Documentation/networking/device_drivers/intel/e1000.rst
+++ b/Documentation/networking/device_drivers/intel/e1000.rst
@@ -1,8 +1,8 @@
.. SPDX-License-Identifier: GPL-2.0+
-===========================================================
-Linux* Base Driver for Intel(R) Ethernet Network Connection
-===========================================================
+==========================================================
+Linux Base Driver for Intel(R) Ethernet Network Connection
+==========================================================
Intel Gigabit Linux driver.
Copyright(c) 1999 - 2013 Intel Corporation.
@@ -438,10 +438,10 @@ ethtool
The latest release of ethtool can be found from
https://www.kernel.org/pub/software/network/ethtool/
-Enabling Wake on LAN* (WoL)
----------------------------
+Enabling Wake on LAN (WoL)
+--------------------------
- WoL is configured through the ethtool* utility.
+ WoL is configured through the ethtool utility.
WoL will be enabled on the system during the next shut down or reboot.
For this driver version, in order to enable WoL, the e1000 driver must be
diff --git a/Documentation/networking/device_drivers/intel/e1000e.rst b/Documentation/networking/device_drivers/intel/e1000e.rst
index 01999f05509c..f49cd370e7bf 100644
--- a/Documentation/networking/device_drivers/intel/e1000e.rst
+++ b/Documentation/networking/device_drivers/intel/e1000e.rst
@@ -1,8 +1,8 @@
.. SPDX-License-Identifier: GPL-2.0+
-======================================================
-Linux* Driver for Intel(R) Ethernet Network Connection
-======================================================
+=====================================================
+Linux Driver for Intel(R) Ethernet Network Connection
+=====================================================
Intel Gigabit Linux driver.
Copyright(c) 2008-2018 Intel Corporation.
@@ -338,7 +338,7 @@ and higher cannot be forced. Use the autonegotiation advertising setting to
manually set devices for 1 Gbps and higher.
Speed, duplex, and autonegotiation advertising are configured through the
-ethtool* utility.
+ethtool utility.
Caution: Only experienced network administrators should force speed and duplex
or change autonegotiation advertising manually. The settings at the switch must
@@ -351,9 +351,9 @@ will not attempt to auto-negotiate with its link partner since those adapters
operate only in full duplex and only at their native speed.
-Enabling Wake on LAN* (WoL)
----------------------------
-WoL is configured through the ethtool* utility.
+Enabling Wake on LAN (WoL)
+--------------------------
+WoL is configured through the ethtool utility.
WoL will be enabled on the system during the next shut down or reboot. For
this driver version, in order to enable WoL, the e1000e driver must be loaded
diff --git a/Documentation/networking/device_drivers/intel/fm10k.rst b/Documentation/networking/device_drivers/intel/fm10k.rst
index ac3269e34f55..4d279e64e221 100644
--- a/Documentation/networking/device_drivers/intel/fm10k.rst
+++ b/Documentation/networking/device_drivers/intel/fm10k.rst
@@ -1,8 +1,8 @@
.. SPDX-License-Identifier: GPL-2.0+
-==============================================================
-Linux* Base Driver for Intel(R) Ethernet Multi-host Controller
-==============================================================
+=============================================================
+Linux Base Driver for Intel(R) Ethernet Multi-host Controller
+=============================================================
August 20, 2018
Copyright(c) 2015-2018 Intel Corporation.
@@ -120,8 +120,8 @@ rx-flow-hash tcp4|udp4|ah4|esp4|sctp4|tcp6|udp6|ah6|esp6|sctp6 m|v|t|s|d|f|n|r
Known Issues/Troubleshooting
============================
-Enabling SR-IOV in a 64-bit Microsoft* Windows Server* 2012/R2 guest OS under Linux KVM
----------------------------------------------------------------------------------------
+Enabling SR-IOV in a 64-bit Microsoft Windows Server 2012/R2 guest OS under Linux KVM
+-------------------------------------------------------------------------------------
KVM Hypervisor/VMM supports direct assignment of a PCIe device to a VM. This
includes traditional PCIe devices, as well as SR-IOV-capable devices based on
the Intel Ethernet Controller XL710.
diff --git a/Documentation/networking/device_drivers/intel/i40e.rst b/Documentation/networking/device_drivers/intel/i40e.rst
index 848fd388fa6e..8a9b18573688 100644
--- a/Documentation/networking/device_drivers/intel/i40e.rst
+++ b/Documentation/networking/device_drivers/intel/i40e.rst
@@ -1,8 +1,8 @@
.. SPDX-License-Identifier: GPL-2.0+
-==================================================================
-Linux* Base Driver for the Intel(R) Ethernet Controller 700 Series
-==================================================================
+=================================================================
+Linux Base Driver for the Intel(R) Ethernet Controller 700 Series
+=================================================================
Intel 40 Gigabit Linux driver.
Copyright(c) 1999-2018 Intel Corporation.
@@ -384,7 +384,7 @@ NOTE: You cannot set the speed for devices based on the Intel(R) Ethernet
Network Adapter XXV710 based devices.
Speed, duplex, and autonegotiation advertising are configured through the
-ethtool* utility.
+ethtool utility.
Caution: Only experienced network administrators should force speed and duplex
or change autonegotiation advertising manually. The settings at the switch must
diff --git a/Documentation/networking/device_drivers/intel/iavf.rst b/Documentation/networking/device_drivers/intel/iavf.rst
index 2d0c3baa1752..84ac7e75f363 100644
--- a/Documentation/networking/device_drivers/intel/iavf.rst
+++ b/Documentation/networking/device_drivers/intel/iavf.rst
@@ -1,8 +1,8 @@
.. SPDX-License-Identifier: GPL-2.0+
-==================================================================
-Linux* Base Driver for Intel(R) Ethernet Adaptive Virtual Function
-==================================================================
+=================================================================
+Linux Base Driver for Intel(R) Ethernet Adaptive Virtual Function
+=================================================================
Intel Ethernet Adaptive Virtual Function Linux driver.
Copyright(c) 2013-2018 Intel Corporation.
@@ -10,12 +10,16 @@ Copyright(c) 2013-2018 Intel Corporation.
Contents
========
+- Overview
- Identifying Your Adapter
- Additional Configurations
- Known Issues/Troubleshooting
- Support
-This file describes the iavf Linux* Base Driver. This driver was formerly
+Overview
+========
+
+This file describes the iavf Linux Base Driver. This driver was formerly
called i40evf.
The iavf driver supports the below mentioned virtual function devices and
@@ -27,6 +31,7 @@ The guest OS loading the iavf driver must support MSI-X interrupts.
Identifying Your Adapter
========================
+
The driver in this kernel is compatible with devices based on the following:
* Intel(R) XL710 X710 Virtual Function
* Intel(R) X722 Virtual Function
@@ -50,9 +55,10 @@ Link messages will not be displayed to the console if the distribution is
restricting system messages. In order to see network driver link messages on
your console, set dmesg to eight by entering the following::
- dmesg -n 8
+ # dmesg -n 8
-NOTE: This setting is not saved across reboots.
+NOTE:
+ This setting is not saved across reboots.
ethtool
-------
@@ -72,11 +78,11 @@ then requests from that VF to set VLAN tag stripping will be ignored.
To enable/disable VLAN tag stripping for a VF, issue the following command
from inside the VM in which you are running the VF::
- ethtool -K <if_name> rxvlan on/off
+ # ethtool -K <if_name> rxvlan on/off
or alternatively::
- ethtool --offload <if_name> rxvlan on/off
+ # ethtool --offload <if_name> rxvlan on/off
Adaptive Virtual Function
-------------------------
@@ -91,21 +97,21 @@ additional features depending on what features are available in the PF with
which the AVF is associated. The following are base mode features:
- 4 Queue Pairs (QP) and associated Configuration Status Registers (CSRs)
- for Tx/Rx.
-- i40e descriptors and ring format.
-- Descriptor write-back completion.
-- 1 control queue, with i40e descriptors, CSRs and ring format.
-- 5 MSI-X interrupt vectors and corresponding i40e CSRs.
-- 1 Interrupt Throttle Rate (ITR) index.
-- 1 Virtual Station Interface (VSI) per VF.
+ for Tx/Rx
+- i40e descriptors and ring format
+- Descriptor write-back completion
+- 1 control queue, with i40e descriptors, CSRs and ring format
+- 5 MSI-X interrupt vectors and corresponding i40e CSRs
+- 1 Interrupt Throttle Rate (ITR) index
+- 1 Virtual Station Interface (VSI) per VF
- 1 Traffic Class (TC), TC0
- Receive Side Scaling (RSS) with 64 entry indirection table and key,
- configured through the PF.
-- 1 unicast MAC address reserved per VF.
-- 16 MAC address filters for each VF.
-- Stateless offloads - non-tunneled checksums.
-- AVF device ID.
-- HW mailbox is used for VF to PF communications (including on Windows).
+ configured through the PF
+- 1 unicast MAC address reserved per VF
+- 16 MAC address filters for each VF
+- Stateless offloads - non-tunneled checksums
+- AVF device ID
+- HW mailbox is used for VF to PF communications (including on Windows)
IEEE 802.1ad (QinQ) Support
---------------------------
@@ -117,8 +123,8 @@ VLAN ID, among other uses.
The following are examples of how to configure 802.1ad (QinQ)::
- ip link add link eth0 eth0.24 type vlan proto 802.1ad id 24
- ip link add link eth0.24 eth0.24.371 type vlan proto 802.1Q id 371
+ # ip link add link eth0 eth0.24 type vlan proto 802.1ad id 24
+ # ip link add link eth0.24 eth0.24.371 type vlan proto 802.1Q id 371
Where "24" and "371" are example VLAN IDs.
@@ -133,6 +139,19 @@ specific application. This can reduce latency for the specified application,
and allow Tx traffic to be rate limited per application. Follow the steps below
to set ADq.
+Requirements:
+
+- The sch_mqprio, act_mirred and cls_flower modules must be loaded
+- The latest version of iproute2
+- If another driver (for example, DPDK) has set cloud filters, you cannot
+ enable ADQ
+- Depending on the underlying PF device, ADQ cannot be enabled when the
+ following features are enabled:
+
+ + Data Center Bridging (DCB)
+ + Multiple Functions per Port (MFP)
+ + Sideband Filters
+
1. Create traffic classes (TCs). Maximum of 8 TCs can be created per interface.
The shaper bw_rlimit parameter is optional.
@@ -141,9 +160,9 @@ to 1Gbit for tc0 and 3Gbit for tc1.
::
- # tc qdisc add dev <interface> root mqprio num_tc 2 map 0 0 0 0 1 1 1 1
- queues 16@0 16@16 hw 1 mode channel shaper bw_rlimit min_rate 1Gbit 2Gbit
- max_rate 1Gbit 3Gbit
+ tc qdisc add dev <interface> root mqprio num_tc 2 map 0 0 0 0 1 1 1 1
+ queues 16@0 16@16 hw 1 mode channel shaper bw_rlimit min_rate 1Gbit 2Gbit
+ max_rate 1Gbit 3Gbit
map: priority mapping for up to 16 priorities to tcs (e.g. map 0 0 0 0 1 1 1 1
sets priorities 0-3 to use tc0 and 4-7 to use tc1)
@@ -162,6 +181,10 @@ Totals must be equal or less than port speed.
For example: min_rate 1Gbit 3Gbit: Verify bandwidth limit using network
monitoring tools such as ifstat or sar –n DEV [interval] [number of samples]
+NOTE:
+ Setting up channels via ethtool (ethtool -L) is not supported when the
+ TCs are configured using mqprio.
+
2. Enable HW TC offload on interface::
# ethtool -K <interface> hw-tc-offload on
@@ -171,16 +194,16 @@ monitoring tools such as ifstat or sar –n DEV [interval] [number of samples]
# tc qdisc add dev <interface> ingress
NOTES:
- - Run all tc commands from the iproute2 <pathtoiproute2>/tc/ directory.
- - ADq is not compatible with cloud filters.
+ - Run all tc commands from the iproute2 <pathtoiproute2>/tc/ directory
+ - ADq is not compatible with cloud filters
- Setting up channels via ethtool (ethtool -L) is not supported when the TCs
- are configured using mqprio.
+ are configured using mqprio
- You must have iproute2 latest version
- - NVM version 6.01 or later is required.
+ - NVM version 6.01 or later is required
- ADq cannot be enabled when any the following features are enabled: Data
- Center Bridging (DCB), Multiple Functions per Port (MFP), or Sideband Filters.
+ Center Bridging (DCB), Multiple Functions per Port (MFP), or Sideband Filters
- If another driver (for example, DPDK) has set cloud filters, you cannot
- enable ADq.
+ enable ADq
- Tunnel filters are not supported in ADq. If encapsulated packets do arrive
in non-tunnel mode, filtering will be done on the inner headers. For example,
for VXLAN traffic in non-tunnel mode, PCTYPE is identified as a VXLAN
@@ -198,6 +221,16 @@ NOTES:
Known Issues/Troubleshooting
============================
+Bonding fails with VFs bound to an Intel(R) Ethernet Controller 700 series device
+---------------------------------------------------------------------------------
+If you bind Virtual Functions (VFs) to an Intel(R) Ethernet Controller 700
+series based device, the VF slaves may fail when they become the active slave.
+If the MAC address of the VF is set by the PF (Physical Function) of the
+device, when you add a slave, or change the active-backup slave, Linux bonding
+tries to sync the backup slave's MAC address to the same MAC address as the
+active slave. Linux bonding will fail at this point. This issue will not occur
+if the VF's MAC address is not set by the PF.
+
Traffic Is Not Being Passed Between VM and Client
-------------------------------------------------
You may not be able to pass traffic between a client system and a
@@ -215,13 +248,28 @@ Do not unload a port's driver if a Virtual Function (VF) with an active Virtual
Machine (VM) is bound to it. Doing so will cause the port to appear to hang.
Once the VM shuts down, or otherwise releases the VF, the command will complete.
+Using four traffic classes fails
+--------------------------------
+Do not try to reserve more than three traffic classes in the iavf driver. Doing
+so will fail to set any traffic classes and will cause the driver to write
+errors to stdout. Use a maximum of three queues to avoid this issue.
+
+Multiple log error messages on iavf driver removal
+--------------------------------------------------
+If you have several VFs and you remove the iavf driver, several instances of
+the following log errors are written to the log::
+
+ Unable to send opcode 2 to PF, err I40E_ERR_QUEUE_EMPTY, aq_err ok
+ Unable to send the message to VF 2 aq_err 12
+ ARQ Overflow Error detected
+
Virtual machine does not get link
---------------------------------
If the virtual machine has more than one virtual port assigned to it, and those
virtual ports are bound to different physical ports, you may not get link on
all of the virtual ports. The following command may work around the issue::
- ethtool -r <PF>
+ # ethtool -r <PF>
Where <PF> is the PF interface in the host, for example: p5p1. You may need to
run the command more than once to get link on all virtual ports.
@@ -251,12 +299,13 @@ traffic.
If you have multiple interfaces in a server, either turn on ARP filtering by
entering::
- echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter
+ # echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter
-NOTE: This setting is not saved across reboots. The configuration change can be
-made permanent by adding the following line to the file /etc/sysctl.conf::
+NOTE:
+ This setting is not saved across reboots. The configuration change can be
+ made permanent by adding the following line to the file /etc/sysctl.conf::
- net.ipv4.conf.all.arp_filter = 1
+ net.ipv4.conf.all.arp_filter = 1
Another alternative is to install the interfaces in separate broadcast domains
(either in different switches or in a switch partitioned to VLANs).
diff --git a/Documentation/networking/device_drivers/intel/ice.rst b/Documentation/networking/device_drivers/intel/ice.rst
index c220aa2711c6..ee43ea57d443 100644
--- a/Documentation/networking/device_drivers/intel/ice.rst
+++ b/Documentation/networking/device_drivers/intel/ice.rst
@@ -1,8 +1,8 @@
.. SPDX-License-Identifier: GPL-2.0+
-===================================================================
-Linux* Base Driver for the Intel(R) Ethernet Connection E800 Series
-===================================================================
+==================================================================
+Linux Base Driver for the Intel(R) Ethernet Connection E800 Series
+==================================================================
Intel ice Linux driver.
Copyright(c) 2018 Intel Corporation.
diff --git a/Documentation/networking/device_drivers/intel/igb.rst b/Documentation/networking/device_drivers/intel/igb.rst
index fc8cfaa5dcfa..87e560fe5eaa 100644
--- a/Documentation/networking/device_drivers/intel/igb.rst
+++ b/Documentation/networking/device_drivers/intel/igb.rst
@@ -1,8 +1,8 @@
.. SPDX-License-Identifier: GPL-2.0+
-===========================================================
-Linux* Base Driver for Intel(R) Ethernet Network Connection
-===========================================================
+==========================================================
+Linux Base Driver for Intel(R) Ethernet Network Connection
+==========================================================
Intel Gigabit Linux driver.
Copyright(c) 1999-2018 Intel Corporation.
@@ -129,9 +129,9 @@ version is required for this functionality. Download it at:
https://www.kernel.org/pub/software/network/ethtool/
-Enabling Wake on LAN* (WoL)
----------------------------
-WoL is configured through the ethtool* utility.
+Enabling Wake on LAN (WoL)
+--------------------------
+WoL is configured through the ethtool utility.
WoL will be enabled on the system during the next shut down or reboot. For
this driver version, in order to enable WoL, the igb driver must be loaded
diff --git a/Documentation/networking/device_drivers/intel/igbvf.rst b/Documentation/networking/device_drivers/intel/igbvf.rst
index 9cddabe8108e..557fc020ef31 100644
--- a/Documentation/networking/device_drivers/intel/igbvf.rst
+++ b/Documentation/networking/device_drivers/intel/igbvf.rst
@@ -1,8 +1,8 @@
.. SPDX-License-Identifier: GPL-2.0+
-============================================================
-Linux* Base Virtual Function Driver for Intel(R) 1G Ethernet
-============================================================
+===========================================================
+Linux Base Virtual Function Driver for Intel(R) 1G Ethernet
+===========================================================
Intel Gigabit Virtual Function Linux driver.
Copyright(c) 1999-2018 Intel Corporation.
diff --git a/Documentation/networking/device_drivers/intel/ixgbe.rst b/Documentation/networking/device_drivers/intel/ixgbe.rst
index c7d25483fedb..f1d5233e5e51 100644
--- a/Documentation/networking/device_drivers/intel/ixgbe.rst
+++ b/Documentation/networking/device_drivers/intel/ixgbe.rst
@@ -1,8 +1,8 @@
.. SPDX-License-Identifier: GPL-2.0+
-=============================================================================
-Linux* Base Driver for the Intel(R) Ethernet 10 Gigabit PCI Express Adapters
-=============================================================================
+===========================================================================
+Linux Base Driver for the Intel(R) Ethernet 10 Gigabit PCI Express Adapters
+===========================================================================
Intel 10 Gigabit Linux driver.
Copyright(c) 1999-2018 Intel Corporation.
@@ -519,8 +519,8 @@ The offload is also supported for ixgbe's VFs, but the VF must be set as
Known Issues/Troubleshooting
============================
-Enabling SR-IOV in a 64-bit Microsoft* Windows Server* 2012/R2 guest OS
------------------------------------------------------------------------
+Enabling SR-IOV in a 64-bit Microsoft Windows Server 2012/R2 guest OS
+---------------------------------------------------------------------
Linux KVM Hypervisor/VMM supports direct assignment of a PCIe device to a VM.
This includes traditional PCIe devices, as well as SR-IOV-capable devices based
on the Intel Ethernet Controller XL710.
diff --git a/Documentation/networking/device_drivers/intel/ixgbevf.rst b/Documentation/networking/device_drivers/intel/ixgbevf.rst
index 5d4977360157..76bbde736f21 100644
--- a/Documentation/networking/device_drivers/intel/ixgbevf.rst
+++ b/Documentation/networking/device_drivers/intel/ixgbevf.rst
@@ -1,8 +1,8 @@
.. SPDX-License-Identifier: GPL-2.0+
-=============================================================
-Linux* Base Virtual Function Driver for Intel(R) 10G Ethernet
-=============================================================
+============================================================
+Linux Base Virtual Function Driver for Intel(R) 10G Ethernet
+============================================================
Intel 10 Gigabit Virtual Function Linux driver.
Copyright(c) 1999-2018 Intel Corporation.
diff --git a/Documentation/networking/device_drivers/mellanox/mlx5.rst b/Documentation/networking/device_drivers/mellanox/mlx5.rst
new file mode 100644
index 000000000000..d071c6b49e1f
--- /dev/null
+++ b/Documentation/networking/device_drivers/mellanox/mlx5.rst
@@ -0,0 +1,300 @@
+.. SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+
+=================================================
+Mellanox ConnectX(R) mlx5 core VPI Network Driver
+=================================================
+
+Copyright (c) 2019, Mellanox Technologies LTD.
+
+Contents
+========
+
+- `Enabling the driver and kconfig options`_
+- `Devlink info`_
+- `Devlink parameters`_
+- `Devlink health reporters`_
+- `mlx5 tracepoints`_
+
+Enabling the driver and kconfig options
+================================================
+
+| mlx5 core is modular and most of the major mlx5 core driver features can be selected (compiled in/out)
+| at build time via kernel Kconfig flags.
+| Basic features, ethernet net device rx/tx offloads and XDP, are available with the most basic flags
+| CONFIG_MLX5_CORE=y/m and CONFIG_MLX5_CORE_EN=y.
+| For the list of advanced features please see below.
+
+**CONFIG_MLX5_CORE=(y/m/n)** (module mlx5_core.ko)
+
+| The driver can be enabled by choosing CONFIG_MLX5_CORE=y/m in kernel config.
+| This will provide mlx5 core driver for mlx5 ulps to interface with (mlx5e, mlx5_ib).
+
+
+**CONFIG_MLX5_CORE_EN=(y/n)**
+
+| Choosing this option will allow basic ethernet netdevice support with all of the standard rx/tx offloads.
+| mlx5e is the mlx5 ulp driver which provides netdevice kernel interface, when chosen, mlx5e will be
+| built-in into mlx5_core.ko.
+
+
+**CONFIG_MLX5_EN_ARFS=(y/n)**
+
+| Enables Hardware-accelerated receive flow steering (arfs) support, and ntuple filtering.
+| https://community.mellanox.com/s/article/howto-configure-arfs-on-connectx-4
+
+
+**CONFIG_MLX5_EN_RXNFC=(y/n)**
+
+| Enables ethtool receive network flow classification, which allows user defined
+| flow rules to direct traffic into arbitrary rx queue via ethtool set/get_rxnfc API.
+
+
+**CONFIG_MLX5_CORE_EN_DCB=(y/n)**:
+
+| Enables `Data Center Bridging (DCB) Support <https://community.mellanox.com/s/article/howto-auto-config-pfc-and-ets-on-connectx-4-via-lldp-dcbx>`_.
+
+
+**CONFIG_MLX5_MPFS=(y/n)**
+
+| Ethernet Multi-Physical Function Switch (MPFS) support in ConnectX NIC.
+| MPFs is required for when `Multi-Host <http://www.mellanox.com/page/multihost>`_ configuration is enabled to allow passing
+| user configured unicast MAC addresses to the requesting PF.
+
+
+**CONFIG_MLX5_ESWITCH=(y/n)**
+
+| Ethernet SRIOV E-Switch support in ConnectX NIC. E-Switch provides internal SRIOV packet steering
+| and switching for the enabled VFs and PF in two available modes:
+| 1) `Legacy SRIOV mode (L2 mac vlan steering based) <https://community.mellanox.com/s/article/howto-configure-sr-iov-for-connectx-4-connectx-5-with-kvm--ethernet-x>`_.
+| 2) `Switchdev mode (eswitch offloads) <https://www.mellanox.com/related-docs/prod_software/ASAP2_Hardware_Offloading_for_vSwitches_User_Manual_v4.4.pdf>`_.
+
+
+**CONFIG_MLX5_CORE_IPOIB=(y/n)**
+
+| IPoIB offloads & acceleration support.
+| Requires CONFIG_MLX5_CORE_EN to provide an accelerated interface for the rdma
+| IPoIB ulp netdevice.
+
+
+**CONFIG_MLX5_FPGA=(y/n)**
+
+| Build support for the Innova family of network cards by Mellanox Technologies.
+| Innova network cards are comprised of a ConnectX chip and an FPGA chip on one board.
+| If you select this option, the mlx5_core driver will include the Innova FPGA core and allow
+| building sandbox-specific client drivers.
+
+
+**CONFIG_MLX5_EN_IPSEC=(y/n)**
+
+| Enables `IPSec XFRM cryptography-offload accelaration <http://www.mellanox.com/related-docs/prod_software/Mellanox_Innova_IPsec_Ethernet_Adapter_Card_User_Manual.pdf>`_.
+
+**CONFIG_MLX5_EN_TLS=(y/n)**
+
+| TLS cryptography-offload accelaration.
+
+
+**CONFIG_MLX5_INFINIBAND=(y/n/m)** (module mlx5_ib.ko)
+
+| Provides low-level InfiniBand/RDMA and `RoCE <https://community.mellanox.com/s/article/recommended-network-configuration-examples-for-roce-deployment>`_ support.
+
+
+**External options** ( Choose if the corresponding mlx5 feature is required )
+
+- CONFIG_PTP_1588_CLOCK: When chosen, mlx5 ptp support will be enabled
+- CONFIG_VXLAN: When chosen, mlx5 vxaln support will be enabled.
+- CONFIG_MLXFW: When chosen, mlx5 firmware flashing support will be enabled (via devlink and ethtool).
+
+Devlink info
+============
+
+The devlink info reports the running and stored firmware versions on device.
+It also prints the device PSID which represents the HCA board type ID.
+
+User command example::
+
+ $ devlink dev info pci/0000:00:06.0
+ pci/0000:00:06.0:
+ driver mlx5_core
+ versions:
+ fixed:
+ fw.psid MT_0000000009
+ running:
+ fw.version 16.26.0100
+ stored:
+ fw.version 16.26.0100
+
+Devlink parameters
+==================
+
+flow_steering_mode: Device flow steering mode
+---------------------------------------------
+The flow steering mode parameter controls the flow steering mode of the driver.
+Two modes are supported:
+1. 'dmfs' - Device managed flow steering.
+2. 'smfs - Software/Driver managed flow steering.
+
+In DMFS mode, the HW steering entities are created and managed through the
+Firmware.
+In SMFS mode, the HW steering entities are created and managed though by
+the driver directly into Hardware without firmware intervention.
+
+SMFS mode is faster and provides better rule inserstion rate compared to default DMFS mode.
+
+User command examples:
+
+- Set SMFS flow steering mode::
+
+ $ devlink dev param set pci/0000:06:00.0 name flow_steering_mode value "smfs" cmode runtime
+
+- Read device flow steering mode::
+
+ $ devlink dev param show pci/0000:06:00.0 name flow_steering_mode
+ pci/0000:06:00.0:
+ name flow_steering_mode type driver-specific
+ values:
+ cmode runtime value smfs
+
+
+Devlink health reporters
+========================
+
+tx reporter
+-----------
+The tx reporter is responsible for reporting and recovering of the following two error scenarios:
+
+- TX timeout
+ Report on kernel tx timeout detection.
+ Recover by searching lost interrupts.
+- TX error completion
+ Report on error tx completion.
+ Recover by flushing the TX queue and reset it.
+
+TX reporter also support on demand diagnose callback, on which it provides
+real time information of its send queues status.
+
+User commands examples:
+
+- Diagnose send queues status::
+
+ $ devlink health diagnose pci/0000:82:00.0 reporter tx
+
+NOTE: This command has valid output only when interface is up, otherwise the command has empty output.
+
+- Show number of tx errors indicated, number of recover flows ended successfully,
+ is autorecover enabled and graceful period from last recover::
+
+ $ devlink health show pci/0000:82:00.0 reporter tx
+
+rx reporter
+-----------
+The rx reporter is responsible for reporting and recovering of the following two error scenarios:
+
+- RX queues initialization (population) timeout
+ RX queues descriptors population on ring initialization is done in
+ napi context via triggering an irq, in case of a failure to get
+ the minimum amount of descriptors, a timeout would occur and it
+ could be recoverable by polling the EQ (Event Queue).
+- RX completions with errors (reported by HW on interrupt context)
+ Report on rx completion error.
+ Recover (if needed) by flushing the related queue and reset it.
+
+RX reporter also supports on demand diagnose callback, on which it
+provides real time information of its receive queues status.
+
+- Diagnose rx queues status, and corresponding completion queue::
+
+ $ devlink health diagnose pci/0000:82:00.0 reporter rx
+
+NOTE: This command has valid output only when interface is up, otherwise the command has empty output.
+
+- Show number of rx errors indicated, number of recover flows ended successfully,
+ is autorecover enabled and graceful period from last recover::
+
+ $ devlink health show pci/0000:82:00.0 reporter rx
+
+fw reporter
+-----------
+The fw reporter implements diagnose and dump callbacks.
+It follows symptoms of fw error such as fw syndrome by triggering
+fw core dump and storing it into the dump buffer.
+The fw reporter diagnose command can be triggered any time by the user to check
+current fw status.
+
+User commands examples:
+
+- Check fw heath status::
+
+ $ devlink health diagnose pci/0000:82:00.0 reporter fw
+
+- Read FW core dump if already stored or trigger new one::
+
+ $ devlink health dump show pci/0000:82:00.0 reporter fw
+
+NOTE: This command can run only on the PF which has fw tracer ownership,
+running it on other PF or any VF will return "Operation not permitted".
+
+fw fatal reporter
+-----------------
+The fw fatal reporter implements dump and recover callbacks.
+It follows fatal errors indications by CR-space dump and recover flow.
+The CR-space dump uses vsc interface which is valid even if the FW command
+interface is not functional, which is the case in most FW fatal errors.
+The recover function runs recover flow which reloads the driver and triggers fw
+reset if needed.
+
+User commands examples:
+
+- Run fw recover flow manually::
+
+ $ devlink health recover pci/0000:82:00.0 reporter fw_fatal
+
+- Read FW CR-space dump if already strored or trigger new one::
+
+ $ devlink health dump show pci/0000:82:00.1 reporter fw_fatal
+
+NOTE: This command can run only on PF.
+
+mlx5 tracepoints
+================
+
+mlx5 driver provides internal trace points for tracking and debugging using
+kernel tracepoints interfaces (refer to Documentation/trace/ftrase.rst).
+
+For the list of support mlx5 events check /sys/kernel/debug/tracing/events/mlx5/
+
+tc and eswitch offloads tracepoints:
+
+- mlx5e_configure_flower: trace flower filter actions and cookies offloaded to mlx5::
+
+ $ echo mlx5:mlx5e_configure_flower >> /sys/kernel/debug/tracing/set_event
+ $ cat /sys/kernel/debug/tracing/trace
+ ...
+ tc-6535 [019] ...1 2672.404466: mlx5e_configure_flower: cookie=0000000067874a55 actions= REDIRECT
+
+- mlx5e_delete_flower: trace flower filter actions and cookies deleted from mlx5::
+
+ $ echo mlx5:mlx5e_delete_flower >> /sys/kernel/debug/tracing/set_event
+ $ cat /sys/kernel/debug/tracing/trace
+ ...
+ tc-6569 [010] .N.1 2686.379075: mlx5e_delete_flower: cookie=0000000067874a55 actions= NULL
+
+- mlx5e_stats_flower: trace flower stats request::
+
+ $ echo mlx5:mlx5e_stats_flower >> /sys/kernel/debug/tracing/set_event
+ $ cat /sys/kernel/debug/tracing/trace
+ ...
+ tc-6546 [010] ...1 2679.704889: mlx5e_stats_flower: cookie=0000000060eb3d6a bytes=0 packets=0 lastused=4295560217
+
+- mlx5e_tc_update_neigh_used_value: trace tunnel rule neigh update value offloaded to mlx5::
+
+ $ echo mlx5:mlx5e_tc_update_neigh_used_value >> /sys/kernel/debug/tracing/set_event
+ $ cat /sys/kernel/debug/tracing/trace
+ ...
+ kworker/u48:4-8806 [009] ...1 55117.882428: mlx5e_tc_update_neigh_used_value: netdev: ens1f0 IPv4: 1.1.1.10 IPv6: ::ffff:1.1.1.10 neigh_used=1
+
+- mlx5e_rep_neigh_update: trace neigh update tasks scheduled due to neigh state change events::
+
+ $ echo mlx5:mlx5e_rep_neigh_update >> /sys/kernel/debug/tracing/set_event
+ $ cat /sys/kernel/debug/tracing/trace
+ ...
+ kworker/u48:7-2221 [009] ...1 1475.387435: mlx5e_rep_neigh_update: netdev: ens1f0 MAC: 24:8a:07:9a:17:9a IPv4: 1.1.1.10 IPv6: ::ffff:1.1.1.10 neigh_connected=1
diff --git a/Documentation/networking/device_drivers/netronome/nfp.rst b/Documentation/networking/device_drivers/netronome/nfp.rst
new file mode 100644
index 000000000000..6c08ac8b5147
--- /dev/null
+++ b/Documentation/networking/device_drivers/netronome/nfp.rst
@@ -0,0 +1,133 @@
+.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+
+=============================================
+Netronome Flow Processor (NFP) Kernel Drivers
+=============================================
+
+Copyright (c) 2019, Netronome Systems, Inc.
+
+Contents
+========
+
+- `Overview`_
+- `Acquiring Firmware`_
+
+Overview
+========
+
+This driver supports Netronome's line of Flow Processor devices,
+including the NFP4000, NFP5000, and NFP6000 models, which are also
+incorporated in the company's family of Agilio SmartNICs. The SR-IOV
+physical and virtual functions for these devices are supported by
+the driver.
+
+Acquiring Firmware
+==================
+
+The NFP4000 and NFP6000 devices require application specific firmware
+to function. Application firmware can be located either on the host file system
+or in the device flash (if supported by management firmware).
+
+Firmware files on the host filesystem contain card type (`AMDA-*` string), media
+config etc. They should be placed in `/lib/firmware/netronome` directory to
+load firmware from the host file system.
+
+Firmware for basic NIC operation is available in the upstream
+`linux-firmware.git` repository.
+
+Firmware in NVRAM
+-----------------
+
+Recent versions of management firmware supports loading application
+firmware from flash when the host driver gets probed. The firmware loading
+policy configuration may be used to configure this feature appropriately.
+
+Devlink or ethtool can be used to update the application firmware on the device
+flash by providing the appropriate `nic_AMDA*.nffw` file to the respective
+command. Users need to take care to write the correct firmware image for the
+card and media configuration to flash.
+
+Available storage space in flash depends on the card being used.
+
+Dealing with multiple projects
+------------------------------
+
+NFP hardware is fully programmable therefore there can be different
+firmware images targeting different applications.
+
+When using application firmware from host, we recommend placing
+actual firmware files in application-named subdirectories in
+`/lib/firmware/netronome` and linking the desired files, e.g.::
+
+ $ tree /lib/firmware/netronome/
+ /lib/firmware/netronome/
+ ├── bpf
+ │   ├── nic_AMDA0081-0001_1x40.nffw
+ │   └── nic_AMDA0081-0001_4x10.nffw
+ ├── flower
+ │   ├── nic_AMDA0081-0001_1x40.nffw
+ │   └── nic_AMDA0081-0001_4x10.nffw
+ ├── nic
+ │   ├── nic_AMDA0081-0001_1x40.nffw
+ │   └── nic_AMDA0081-0001_4x10.nffw
+ ├── nic_AMDA0081-0001_1x40.nffw -> bpf/nic_AMDA0081-0001_1x40.nffw
+ └── nic_AMDA0081-0001_4x10.nffw -> bpf/nic_AMDA0081-0001_4x10.nffw
+
+ 3 directories, 8 files
+
+You may need to use hard instead of symbolic links on distributions
+which use old `mkinitrd` command instead of `dracut` (e.g. Ubuntu).
+
+After changing firmware files you may need to regenerate the initramfs
+image. Initramfs contains drivers and firmware files your system may
+need to boot. Refer to the documentation of your distribution to find
+out how to update initramfs. Good indication of stale initramfs
+is system loading wrong driver or firmware on boot, but when driver is
+later reloaded manually everything works correctly.
+
+Selecting firmware per device
+-----------------------------
+
+Most commonly all cards on the system use the same type of firmware.
+If you want to load specific firmware image for a specific card, you
+can use either the PCI bus address or serial number. Driver will print
+which files it's looking for when it recognizes a NFP device::
+
+ nfp: Looking for firmware file in order of priority:
+ nfp: netronome/serial-00-12-34-aa-bb-cc-10-ff.nffw: not found
+ nfp: netronome/pci-0000:02:00.0.nffw: not found
+ nfp: netronome/nic_AMDA0081-0001_1x40.nffw: found, loading...
+
+In this case if file (or link) called *serial-00-12-34-aa-bb-5d-10-ff.nffw*
+or *pci-0000:02:00.0.nffw* is present in `/lib/firmware/netronome` this
+firmware file will take precedence over `nic_AMDA*` files.
+
+Note that `serial-*` and `pci-*` files are **not** automatically included
+in initramfs, you will have to refer to documentation of appropriate tools
+to find out how to include them.
+
+Firmware loading policy
+-----------------------
+
+Firmware loading policy is controlled via three HWinfo parameters
+stored as key value pairs in the device flash:
+
+app_fw_from_flash
+ Defines which firmware should take precedence, 'Disk' (0), 'Flash' (1) or
+ the 'Preferred' (2) firmware. When 'Preferred' is selected, the management
+ firmware makes the decision over which firmware will be loaded by comparing
+ versions of the flash firmware and the host supplied firmware.
+ This variable is configurable using the 'fw_load_policy'
+ devlink parameter.
+
+abi_drv_reset
+ Defines if the driver should reset the firmware when
+ the driver is probed, either 'Disk' (0) if firmware was found on disk,
+ 'Always' (1) reset or 'Never' (2) reset. Note that the device is always
+ reset on driver unload if firmware was loaded when the driver was probed.
+ This variable is configurable using the 'reset_dev_on_drv_probe'
+ devlink parameter.
+
+abi_drv_load_ifc
+ Defines a list of PF devices allowed to load FW on the device.
+ This variable is not currently user configurable.
diff --git a/Documentation/networking/device_drivers/pensando/ionic.rst b/Documentation/networking/device_drivers/pensando/ionic.rst
new file mode 100644
index 000000000000..c17d680cf334
--- /dev/null
+++ b/Documentation/networking/device_drivers/pensando/ionic.rst
@@ -0,0 +1,45 @@
+.. SPDX-License-Identifier: GPL-2.0+
+
+========================================================
+Linux Driver for the Pensando(R) Ethernet adapter family
+========================================================
+
+Pensando Linux Ethernet driver.
+Copyright(c) 2019 Pensando Systems, Inc
+
+Contents
+========
+
+- Identifying the Adapter
+- Support
+
+Identifying the Adapter
+=======================
+
+To find if one or more Pensando PCI Ethernet devices are installed on the
+host, check for the PCI devices::
+
+ $ lspci -d 1dd8:
+ b5:00.0 Ethernet controller: Device 1dd8:1002
+ b6:00.0 Ethernet controller: Device 1dd8:1002
+
+If such devices are listed as above, then the ionic.ko driver should find
+and configure them for use. There should be log entries in the kernel
+messages such as these::
+
+ $ dmesg | grep ionic
+ ionic Pensando Ethernet NIC Driver, ver 0.15.0-k
+ ionic 0000:b5:00.0 enp181s0: renamed from eth0
+ ionic 0000:b6:00.0 enp182s0: renamed from eth0
+
+Support
+=======
+For general Linux networking support, please use the netdev mailing
+list, which is monitored by Pensando personnel::
+
+ netdev@vger.kernel.org
+
+For more specific support needs, please use the Pensando driver support
+email::
+
+ drivers@pensando.io
diff --git a/Documentation/networking/devlink-info-versions.rst b/Documentation/networking/devlink-info-versions.rst
index 4316342b7746..4914f581b1fd 100644
--- a/Documentation/networking/devlink-info-versions.rst
+++ b/Documentation/networking/devlink-info-versions.rst
@@ -14,11 +14,27 @@ board.rev
Board design revision.
+asic.id
+=======
+
+ASIC design identifier.
+
+asic.rev
+========
+
+ASIC design revision.
+
board.manufacture
=================
An identifier of the company or the facility which produced the part.
+fw
+==
+
+Overall firmware version, often representing the collection of
+fw.mgmt, fw.app, etc.
+
fw.mgmt
=======
diff --git a/Documentation/networking/devlink-params-nfp.txt b/Documentation/networking/devlink-params-nfp.txt
new file mode 100644
index 000000000000..43e4d4034865
--- /dev/null
+++ b/Documentation/networking/devlink-params-nfp.txt
@@ -0,0 +1,5 @@
+fw_load_policy [DEVICE, GENERIC]
+ Configuration mode: permanent
+
+reset_dev_on_drv_probe [DEVICE, GENERIC]
+ Configuration mode: permanent
diff --git a/Documentation/networking/devlink-params.txt b/Documentation/networking/devlink-params.txt
index 2d26434ddcf8..ddba3e9b55b1 100644
--- a/Documentation/networking/devlink-params.txt
+++ b/Documentation/networking/devlink-params.txt
@@ -48,4 +48,20 @@ fw_load_policy [DEVICE, GENERIC]
Load firmware version preferred by the driver.
* DEVLINK_PARAM_FW_LOAD_POLICY_VALUE_FLASH (1)
Load firmware currently stored in flash.
+ * DEVLINK_PARAM_FW_LOAD_POLICY_VALUE_DISK (2)
+ Load firmware currently available on host's disk.
+ Type: u8
+
+reset_dev_on_drv_probe [DEVICE, GENERIC]
+ Controls the device's reset policy on driver probe.
+ Valid values:
+ * DEVLINK_PARAM_RESET_DEV_ON_DRV_PROBE_VALUE_UNKNOWN (0)
+ Unknown or invalid value.
+ * DEVLINK_PARAM_RESET_DEV_ON_DRV_PROBE_VALUE_ALWAYS (1)
+ Always reset device on driver probe.
+ * DEVLINK_PARAM_RESET_DEV_ON_DRV_PROBE_VALUE_NEVER (2)
+ Never reset device on driver probe.
+ * DEVLINK_PARAM_RESET_DEV_ON_DRV_PROBE_VALUE_DISK (3)
+ Reset only if device firmware can be found in the
+ filesystem.
Type: u8
diff --git a/Documentation/networking/devlink-trap-netdevsim.rst b/Documentation/networking/devlink-trap-netdevsim.rst
new file mode 100644
index 000000000000..b721c9415473
--- /dev/null
+++ b/Documentation/networking/devlink-trap-netdevsim.rst
@@ -0,0 +1,20 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+======================
+Devlink Trap netdevsim
+======================
+
+Driver-specific Traps
+=====================
+
+.. list-table:: List of Driver-specific Traps Registered by ``netdevsim``
+ :widths: 5 5 90
+
+ * - Name
+ - Type
+ - Description
+ * - ``fid_miss``
+ - ``exception``
+ - When a packet enters the device it is classified to a filtering
+ indentifier (FID) based on the ingress port and VLAN. This trap is used
+ to trap packets for which a FID could not be found
diff --git a/Documentation/networking/devlink-trap.rst b/Documentation/networking/devlink-trap.rst
new file mode 100644
index 000000000000..8e90a85f3bd5
--- /dev/null
+++ b/Documentation/networking/devlink-trap.rst
@@ -0,0 +1,209 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+============
+Devlink Trap
+============
+
+Background
+==========
+
+Devices capable of offloading the kernel's datapath and perform functions such
+as bridging and routing must also be able to send specific packets to the
+kernel (i.e., the CPU) for processing.
+
+For example, a device acting as a multicast-aware bridge must be able to send
+IGMP membership reports to the kernel for processing by the bridge module.
+Without processing such packets, the bridge module could never populate its
+MDB.
+
+As another example, consider a device acting as router which has received an IP
+packet with a TTL of 1. Upon routing the packet the device must send it to the
+kernel so that it will route it as well and generate an ICMP Time Exceeded
+error datagram. Without letting the kernel route such packets itself, utilities
+such as ``traceroute`` could never work.
+
+The fundamental ability of sending certain packets to the kernel for processing
+is called "packet trapping".
+
+Overview
+========
+
+The ``devlink-trap`` mechanism allows capable device drivers to register their
+supported packet traps with ``devlink`` and report trapped packets to
+``devlink`` for further analysis.
+
+Upon receiving trapped packets, ``devlink`` will perform a per-trap packets and
+bytes accounting and potentially report the packet to user space via a netlink
+event along with all the provided metadata (e.g., trap reason, timestamp, input
+port). This is especially useful for drop traps (see :ref:`Trap-Types`)
+as it allows users to obtain further visibility into packet drops that would
+otherwise be invisible.
+
+The following diagram provides a general overview of ``devlink-trap``::
+
+ Netlink event: Packet w/ metadata
+ Or a summary of recent drops
+ ^
+ |
+ Userspace |
+ +---------------------------------------------------+
+ Kernel |
+ |
+ +-------+--------+
+ | |
+ | drop_monitor |
+ | |
+ +-------^--------+
+ |
+ |
+ |
+ +----+----+
+ | | Kernel's Rx path
+ | devlink | (non-drop traps)
+ | |
+ +----^----+ ^
+ | |
+ +-----------+
+ |
+ +-------+-------+
+ | |
+ | Device driver |
+ | |
+ +-------^-------+
+ Kernel |
+ +---------------------------------------------------+
+ Hardware |
+ | Trapped packet
+ |
+ +--+---+
+ | |
+ | ASIC |
+ | |
+ +------+
+
+.. _Trap-Types:
+
+Trap Types
+==========
+
+The ``devlink-trap`` mechanism supports the following packet trap types:
+
+ * ``drop``: Trapped packets were dropped by the underlying device. Packets
+ are only processed by ``devlink`` and not injected to the kernel's Rx path.
+ The trap action (see :ref:`Trap-Actions`) can be changed.
+ * ``exception``: Trapped packets were not forwarded as intended by the
+ underlying device due to an exception (e.g., TTL error, missing neighbour
+ entry) and trapped to the control plane for resolution. Packets are
+ processed by ``devlink`` and injected to the kernel's Rx path. Changing the
+ action of such traps is not allowed, as it can easily break the control
+ plane.
+
+.. _Trap-Actions:
+
+Trap Actions
+============
+
+The ``devlink-trap`` mechanism supports the following packet trap actions:
+
+ * ``trap``: The sole copy of the packet is sent to the CPU.
+ * ``drop``: The packet is dropped by the underlying device and a copy is not
+ sent to the CPU.
+
+Generic Packet Traps
+====================
+
+Generic packet traps are used to describe traps that trap well-defined packets
+or packets that are trapped due to well-defined conditions (e.g., TTL error).
+Such traps can be shared by multiple device drivers and their description must
+be added to the following table:
+
+.. list-table:: List of Generic Packet Traps
+ :widths: 5 5 90
+
+ * - Name
+ - Type
+ - Description
+ * - ``source_mac_is_multicast``
+ - ``drop``
+ - Traps incoming packets that the device decided to drop because of a
+ multicast source MAC
+ * - ``vlan_tag_mismatch``
+ - ``drop``
+ - Traps incoming packets that the device decided to drop in case of VLAN
+ tag mismatch: The ingress bridge port is not configured with a PVID and
+ the packet is untagged or prio-tagged
+ * - ``ingress_vlan_filter``
+ - ``drop``
+ - Traps incoming packets that the device decided to drop in case they are
+ tagged with a VLAN that is not configured on the ingress bridge port
+ * - ``ingress_spanning_tree_filter``
+ - ``drop``
+ - Traps incoming packets that the device decided to drop in case the STP
+ state of the ingress bridge port is not "forwarding"
+ * - ``port_list_is_empty``
+ - ``drop``
+ - Traps packets that the device decided to drop in case they need to be
+ flooded (e.g., unknown unicast, unregistered multicast) and there are
+ no ports the packets should be flooded to
+ * - ``port_loopback_filter``
+ - ``drop``
+ - Traps packets that the device decided to drop in case after layer 2
+ forwarding the only port from which they should be transmitted through
+ is the port from which they were received
+ * - ``blackhole_route``
+ - ``drop``
+ - Traps packets that the device decided to drop in case they hit a
+ blackhole route
+ * - ``ttl_value_is_too_small``
+ - ``exception``
+ - Traps unicast packets that should be forwarded by the device whose TTL
+ was decremented to 0 or less
+ * - ``tail_drop``
+ - ``drop``
+ - Traps packets that the device decided to drop because they could not be
+ enqueued to a transmission queue which is full
+
+Driver-specific Packet Traps
+============================
+
+Device drivers can register driver-specific packet traps, but these must be
+clearly documented. Such traps can correspond to device-specific exceptions and
+help debug packet drops caused by these exceptions. The following list includes
+links to the description of driver-specific traps registered by various device
+drivers:
+
+ * :doc:`/devlink-trap-netdevsim`
+
+Generic Packet Trap Groups
+==========================
+
+Generic packet trap groups are used to aggregate logically related packet
+traps. These groups allow the user to batch operations such as setting the trap
+action of all member traps. In addition, ``devlink-trap`` can report aggregated
+per-group packets and bytes statistics, in case per-trap statistics are too
+narrow. The description of these groups must be added to the following table:
+
+.. list-table:: List of Generic Packet Trap Groups
+ :widths: 10 90
+
+ * - Name
+ - Description
+ * - ``l2_drops``
+ - Contains packet traps for packets that were dropped by the device during
+ layer 2 forwarding (i.e., bridge)
+ * - ``l3_drops``
+ - Contains packet traps for packets that were dropped by the device or hit
+ an exception (e.g., TTL error) during layer 3 forwarding
+ * - ``buffer_drops``
+ - Contains packet traps for packets that were dropped by the device due to
+ an enqueue decision
+
+Testing
+=======
+
+See ``tools/testing/selftests/drivers/net/netdevsim/devlink_trap.sh`` for a
+test covering the core infrastructure. Test cases should be added for any new
+functionality.
+
+Device drivers should focus their tests on device-specific functionality, such
+as the triggering of supported packet traps.
diff --git a/Documentation/networking/dsa/b53.rst b/Documentation/networking/dsa/b53.rst
new file mode 100644
index 000000000000..b41637cdb82b
--- /dev/null
+++ b/Documentation/networking/dsa/b53.rst
@@ -0,0 +1,183 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==========================================
+Broadcom RoboSwitch Ethernet switch driver
+==========================================
+
+The Broadcom RoboSwitch Ethernet switch family is used in quite a range of
+xDSL router, cable modems and other multimedia devices.
+
+The actual implementation supports the devices BCM5325E, BCM5365, BCM539x,
+BCM53115 and BCM53125 as well as BCM63XX.
+
+Implementation details
+======================
+
+The driver is located in ``drivers/net/dsa/b53/`` and is implemented as a
+DSA driver; see ``Documentation/networking/dsa/dsa.rst`` for details on the
+subsystem and what it provides.
+
+The switch is, if possible, configured to enable a Broadcom specific 4-bytes
+switch tag which gets inserted by the switch for every packet forwarded to the
+CPU interface, conversely, the CPU network interface should insert a similar
+tag for packets entering the CPU port. The tag format is described in
+``net/dsa/tag_brcm.c``.
+
+The configuration of the device depends on whether or not tagging is
+supported.
+
+The interface names and example network configuration are used according the
+configuration described in the :ref:`dsa-config-showcases`.
+
+Configuration with tagging support
+----------------------------------
+
+The tagging based configuration is desired. It is not specific to the b53
+DSA driver and will work like all DSA drivers which supports tagging.
+
+See :ref:`dsa-tagged-configuration`.
+
+Configuration without tagging support
+-------------------------------------
+
+Older models (5325, 5365) support a different tag format that is not supported
+yet. 539x and 531x5 require managed mode and some special handling, which is
+also not yet supported. The tagging support is disabled in these cases and the
+switch need a different configuration.
+
+The configuration slightly differ from the :ref:`dsa-vlan-configuration`.
+
+The b53 tags the CPU port in all VLANs, since otherwise any PVID untagged
+VLAN programming would basically change the CPU port's default PVID and make
+it untagged, undesirable.
+
+In difference to the configuration described in :ref:`dsa-vlan-configuration`
+the default VLAN 1 has to be removed from the slave interface configuration in
+single port and gateway configuration, while there is no need to add an extra
+VLAN configuration in the bridge showcase.
+
+single port
+~~~~~~~~~~~
+The configuration can only be set up via VLAN tagging and bridge setup.
+By default packages are tagged with vid 1:
+
+.. code-block:: sh
+
+ # tag traffic on CPU port
+ ip link add link eth0 name eth0.1 type vlan id 1
+ ip link add link eth0 name eth0.2 type vlan id 2
+ ip link add link eth0 name eth0.3 type vlan id 3
+
+ # The master interface needs to be brought up before the slave ports.
+ ip link set eth0 up
+ ip link set eth0.1 up
+ ip link set eth0.2 up
+ ip link set eth0.3 up
+
+ # bring up the slave interfaces
+ ip link set wan up
+ ip link set lan1 up
+ ip link set lan2 up
+
+ # create bridge
+ ip link add name br0 type bridge
+
+ # activate VLAN filtering
+ ip link set dev br0 type bridge vlan_filtering 1
+
+ # add ports to bridges
+ ip link set dev wan master br0
+ ip link set dev lan1 master br0
+ ip link set dev lan2 master br0
+
+ # tag traffic on ports
+ bridge vlan add dev lan1 vid 2 pvid untagged
+ bridge vlan del dev lan1 vid 1
+ bridge vlan add dev lan2 vid 3 pvid untagged
+ bridge vlan del dev lan2 vid 1
+
+ # configure the VLANs
+ ip addr add 192.0.2.1/30 dev eth0.1
+ ip addr add 192.0.2.5/30 dev eth0.2
+ ip addr add 192.0.2.9/30 dev eth0.3
+
+ # bring up the bridge devices
+ ip link set br0 up
+
+
+bridge
+~~~~~~
+
+.. code-block:: sh
+
+ # tag traffic on CPU port
+ ip link add link eth0 name eth0.1 type vlan id 1
+
+ # The master interface needs to be brought up before the slave ports.
+ ip link set eth0 up
+ ip link set eth0.1 up
+
+ # bring up the slave interfaces
+ ip link set wan up
+ ip link set lan1 up
+ ip link set lan2 up
+
+ # create bridge
+ ip link add name br0 type bridge
+
+ # activate VLAN filtering
+ ip link set dev br0 type bridge vlan_filtering 1
+
+ # add ports to bridge
+ ip link set dev wan master br0
+ ip link set dev lan1 master br0
+ ip link set dev lan2 master br0
+ ip link set eth0.1 master br0
+
+ # configure the bridge
+ ip addr add 192.0.2.129/25 dev br0
+
+ # bring up the bridge
+ ip link set dev br0 up
+
+gateway
+~~~~~~~
+
+.. code-block:: sh
+
+ # tag traffic on CPU port
+ ip link add link eth0 name eth0.1 type vlan id 1
+ ip link add link eth0 name eth0.2 type vlan id 2
+
+ # The master interface needs to be brought up before the slave ports.
+ ip link set eth0 up
+ ip link set eth0.1 up
+ ip link set eth0.2 up
+
+ # bring up the slave interfaces
+ ip link set wan up
+ ip link set lan1 up
+ ip link set lan2 up
+
+ # create bridge
+ ip link add name br0 type bridge
+
+ # activate VLAN filtering
+ ip link set dev br0 type bridge vlan_filtering 1
+
+ # add ports to bridges
+ ip link set dev wan master br0
+ ip link set eth0.1 master br0
+ ip link set dev lan1 master br0
+ ip link set dev lan2 master br0
+
+ # tag traffic on ports
+ bridge vlan add dev wan vid 2 pvid untagged
+ bridge vlan del dev wan vid 1
+
+ # configure the VLANs
+ ip addr add 192.0.2.1/30 dev eth0.2
+ ip addr add 192.0.2.129/25 dev br0
+
+ # bring up the bridge devices
+ ip link set br0 up
diff --git a/Documentation/networking/dsa/configuration.rst b/Documentation/networking/dsa/configuration.rst
new file mode 100644
index 000000000000..af029b3ca2ab
--- /dev/null
+++ b/Documentation/networking/dsa/configuration.rst
@@ -0,0 +1,292 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=======================================
+DSA switch configuration from userspace
+=======================================
+
+The DSA switch configuration is not integrated into the main userspace
+network configuration suites by now and has to be performed manualy.
+
+.. _dsa-config-showcases:
+
+Configuration showcases
+-----------------------
+
+To configure a DSA switch a couple of commands need to be executed. In this
+documentation some common configuration scenarios are handled as showcases:
+
+*single port*
+ Every switch port acts as a different configurable Ethernet port
+
+*bridge*
+ Every switch port is part of one configurable Ethernet bridge
+
+*gateway*
+ Every switch port except one upstream port is part of a configurable
+ Ethernet bridge.
+ The upstream port acts as different configurable Ethernet port.
+
+All configurations are performed with tools from iproute2, which is available
+at https://www.kernel.org/pub/linux/utils/net/iproute2/
+
+Through DSA every port of a switch is handled like a normal linux Ethernet
+interface. The CPU port is the switch port connected to an Ethernet MAC chip.
+The corresponding linux Ethernet interface is called the master interface.
+All other corresponding linux interfaces are called slave interfaces.
+
+The slave interfaces depend on the master interface. They can only brought up,
+when the master interface is up.
+
+In this documentation the following Ethernet interfaces are used:
+
+*eth0*
+ the master interface
+
+*lan1*
+ a slave interface
+
+*lan2*
+ another slave interface
+
+*lan3*
+ a third slave interface
+
+*wan*
+ A slave interface dedicated for upstream traffic
+
+Further Ethernet interfaces can be configured similar.
+The configured IPs and networks are:
+
+*single port*
+ * lan1: 192.0.2.1/30 (192.0.2.0 - 192.0.2.3)
+ * lan2: 192.0.2.5/30 (192.0.2.4 - 192.0.2.7)
+ * lan3: 192.0.2.9/30 (192.0.2.8 - 192.0.2.11)
+
+*bridge*
+ * br0: 192.0.2.129/25 (192.0.2.128 - 192.0.2.255)
+
+*gateway*
+ * br0: 192.0.2.129/25 (192.0.2.128 - 192.0.2.255)
+ * wan: 192.0.2.1/30 (192.0.2.0 - 192.0.2.3)
+
+.. _dsa-tagged-configuration:
+
+Configuration with tagging support
+----------------------------------
+
+The tagging based configuration is desired and supported by the majority of
+DSA switches. These switches are capable to tag incoming and outgoing traffic
+without using a VLAN based configuration.
+
+single port
+~~~~~~~~~~~
+
+.. code-block:: sh
+
+ # configure each interface
+ ip addr add 192.0.2.1/30 dev lan1
+ ip addr add 192.0.2.5/30 dev lan2
+ ip addr add 192.0.2.9/30 dev lan3
+
+ # The master interface needs to be brought up before the slave ports.
+ ip link set eth0 up
+
+ # bring up the slave interfaces
+ ip link set lan1 up
+ ip link set lan2 up
+ ip link set lan3 up
+
+bridge
+~~~~~~
+
+.. code-block:: sh
+
+ # The master interface needs to be brought up before the slave ports.
+ ip link set eth0 up
+
+ # bring up the slave interfaces
+ ip link set lan1 up
+ ip link set lan2 up
+ ip link set lan3 up
+
+ # create bridge
+ ip link add name br0 type bridge
+
+ # add ports to bridge
+ ip link set dev lan1 master br0
+ ip link set dev lan2 master br0
+ ip link set dev lan3 master br0
+
+ # configure the bridge
+ ip addr add 192.0.2.129/25 dev br0
+
+ # bring up the bridge
+ ip link set dev br0 up
+
+gateway
+~~~~~~~
+
+.. code-block:: sh
+
+ # The master interface needs to be brought up before the slave ports.
+ ip link set eth0 up
+
+ # bring up the slave interfaces
+ ip link set wan up
+ ip link set lan1 up
+ ip link set lan2 up
+
+ # configure the upstream port
+ ip addr add 192.0.2.1/30 dev wan
+
+ # create bridge
+ ip link add name br0 type bridge
+
+ # add ports to bridge
+ ip link set dev lan1 master br0
+ ip link set dev lan2 master br0
+
+ # configure the bridge
+ ip addr add 192.0.2.129/25 dev br0
+
+ # bring up the bridge
+ ip link set dev br0 up
+
+.. _dsa-vlan-configuration:
+
+Configuration without tagging support
+-------------------------------------
+
+A minority of switches are not capable to use a taging protocol
+(DSA_TAG_PROTO_NONE). These switches can be configured by a VLAN based
+configuration.
+
+single port
+~~~~~~~~~~~
+The configuration can only be set up via VLAN tagging and bridge setup.
+
+.. code-block:: sh
+
+ # tag traffic on CPU port
+ ip link add link eth0 name eth0.1 type vlan id 1
+ ip link add link eth0 name eth0.2 type vlan id 2
+ ip link add link eth0 name eth0.3 type vlan id 3
+
+ # The master interface needs to be brought up before the slave ports.
+ ip link set eth0 up
+ ip link set eth0.1 up
+ ip link set eth0.2 up
+ ip link set eth0.3 up
+
+ # bring up the slave interfaces
+ ip link set lan1 up
+ ip link set lan1 up
+ ip link set lan3 up
+
+ # create bridge
+ ip link add name br0 type bridge
+
+ # activate VLAN filtering
+ ip link set dev br0 type bridge vlan_filtering 1
+
+ # add ports to bridges
+ ip link set dev lan1 master br0
+ ip link set dev lan2 master br0
+ ip link set dev lan3 master br0
+
+ # tag traffic on ports
+ bridge vlan add dev lan1 vid 1 pvid untagged
+ bridge vlan add dev lan2 vid 2 pvid untagged
+ bridge vlan add dev lan3 vid 3 pvid untagged
+
+ # configure the VLANs
+ ip addr add 192.0.2.1/30 dev eth0.1
+ ip addr add 192.0.2.5/30 dev eth0.2
+ ip addr add 192.0.2.9/30 dev eth0.3
+
+ # bring up the bridge devices
+ ip link set br0 up
+
+
+bridge
+~~~~~~
+
+.. code-block:: sh
+
+ # tag traffic on CPU port
+ ip link add link eth0 name eth0.1 type vlan id 1
+
+ # The master interface needs to be brought up before the slave ports.
+ ip link set eth0 up
+ ip link set eth0.1 up
+
+ # bring up the slave interfaces
+ ip link set lan1 up
+ ip link set lan2 up
+ ip link set lan3 up
+
+ # create bridge
+ ip link add name br0 type bridge
+
+ # activate VLAN filtering
+ ip link set dev br0 type bridge vlan_filtering 1
+
+ # add ports to bridge
+ ip link set dev lan1 master br0
+ ip link set dev lan2 master br0
+ ip link set dev lan3 master br0
+ ip link set eth0.1 master br0
+
+ # tag traffic on ports
+ bridge vlan add dev lan1 vid 1 pvid untagged
+ bridge vlan add dev lan2 vid 1 pvid untagged
+ bridge vlan add dev lan3 vid 1 pvid untagged
+
+ # configure the bridge
+ ip addr add 192.0.2.129/25 dev br0
+
+ # bring up the bridge
+ ip link set dev br0 up
+
+gateway
+~~~~~~~
+
+.. code-block:: sh
+
+ # tag traffic on CPU port
+ ip link add link eth0 name eth0.1 type vlan id 1
+ ip link add link eth0 name eth0.2 type vlan id 2
+
+ # The master interface needs to be brought up before the slave ports.
+ ip link set eth0 up
+ ip link set eth0.1 up
+ ip link set eth0.2 up
+
+ # bring up the slave interfaces
+ ip link set wan up
+ ip link set lan1 up
+ ip link set lan2 up
+
+ # create bridge
+ ip link add name br0 type bridge
+
+ # activate VLAN filtering
+ ip link set dev br0 type bridge vlan_filtering 1
+
+ # add ports to bridges
+ ip link set dev wan master br0
+ ip link set eth0.1 master br0
+ ip link set dev lan1 master br0
+ ip link set dev lan2 master br0
+
+ # tag traffic on ports
+ bridge vlan add dev lan1 vid 1 pvid untagged
+ bridge vlan add dev lan2 vid 1 pvid untagged
+ bridge vlan add dev wan vid 2 pvid untagged
+
+ # configure the VLANs
+ ip addr add 192.0.2.1/30 dev eth0.2
+ ip addr add 192.0.2.129/25 dev br0
+
+ # bring up the bridge devices
+ ip link set br0 up
diff --git a/Documentation/networking/dsa/dsa.rst b/Documentation/networking/dsa/dsa.rst
index ca87068b9ab9..563d56c6a25c 100644
--- a/Documentation/networking/dsa/dsa.rst
+++ b/Documentation/networking/dsa/dsa.rst
@@ -531,7 +531,7 @@ Bridge VLAN filtering
a software implementation.
.. note:: VLAN ID 0 corresponds to the port private database, which, in the context
- of DSA, would be the its port-based VLAN, used by the associated bridge device.
+ of DSA, would be its port-based VLAN, used by the associated bridge device.
- ``port_fdb_del``: bridge layer function invoked when the bridge wants to remove a
Forwarding Database entry, the switch hardware should be programmed to delete
@@ -554,7 +554,7 @@ Bridge VLAN filtering
associated with this VLAN ID.
.. note:: VLAN ID 0 corresponds to the port private database, which, in the context
- of DSA, would be the its port-based VLAN, used by the associated bridge device.
+ of DSA, would be its port-based VLAN, used by the associated bridge device.
- ``port_mdb_del``: bridge layer function invoked when the bridge wants to remove a
multicast database entry, the switch hardware should be programmed to delete
diff --git a/Documentation/networking/dsa/index.rst b/Documentation/networking/dsa/index.rst
index 0e5b7a9be406..ee631e2d646f 100644
--- a/Documentation/networking/dsa/index.rst
+++ b/Documentation/networking/dsa/index.rst
@@ -6,6 +6,8 @@ Distributed Switch Architecture
:maxdepth: 1
dsa
+ b53
bcm_sf2
lan9303
sja1105
+ configuration
diff --git a/Documentation/networking/dsa/sja1105.rst b/Documentation/networking/dsa/sja1105.rst
index ea7bac438cfd..eef20d0bcf7c 100644
--- a/Documentation/networking/dsa/sja1105.rst
+++ b/Documentation/networking/dsa/sja1105.rst
@@ -86,13 +86,13 @@ functionality.
The following traffic modes are supported over the switch netdevices:
+--------------------+------------+------------------+------------------+
-| | Standalone | Bridged with | Bridged with |
-| | ports | vlan_filtering 0 | vlan_filtering 1 |
+| | Standalone | Bridged with | Bridged with |
+| | ports | vlan_filtering 0 | vlan_filtering 1 |
+====================+============+==================+==================+
| Regular traffic | Yes | Yes | No (use master) |
+--------------------+------------+------------------+------------------+
| Management traffic | Yes | Yes | Yes |
-| (BPDU, PTP) | | | |
+| (BPDU, PTP) | | | |
+--------------------+------------+------------------+------------------+
Switching features
@@ -146,6 +146,96 @@ enslaves eth0 and eth1 (the DSA master of the switch ports). This is because in
this mode, the switch ports beneath br0 are not capable of regular traffic, and
are only used as a conduit for switchdev operations.
+Offloads
+========
+
+Time-aware scheduling
+---------------------
+
+The switch supports a variation of the enhancements for scheduled traffic
+specified in IEEE 802.1Q-2018 (formerly 802.1Qbv). This means it can be used to
+ensure deterministic latency for priority traffic that is sent in-band with its
+gate-open event in the network schedule.
+
+This capability can be managed through the tc-taprio offload ('flags 2'). The
+difference compared to the software implementation of taprio is that the latter
+would only be able to shape traffic originated from the CPU, but not
+autonomously forwarded flows.
+
+The device has 8 traffic classes, and maps incoming frames to one of them based
+on the VLAN PCP bits (if no VLAN is present, the port-based default is used).
+As described in the previous sections, depending on the value of
+``vlan_filtering``, the EtherType recognized by the switch as being VLAN can
+either be the typical 0x8100 or a custom value used internally by the driver
+for tagging. Therefore, the switch ignores the VLAN PCP if used in standalone
+or bridge mode with ``vlan_filtering=0``, as it will not recognize the 0x8100
+EtherType. In these modes, injecting into a particular TX queue can only be
+done by the DSA net devices, which populate the PCP field of the tagging header
+on egress. Using ``vlan_filtering=1``, the behavior is the other way around:
+offloaded flows can be steered to TX queues based on the VLAN PCP, but the DSA
+net devices are no longer able to do that. To inject frames into a hardware TX
+queue with VLAN awareness active, it is necessary to create a VLAN
+sub-interface on the DSA master port, and send normal (0x8100) VLAN-tagged
+towards the switch, with the VLAN PCP bits set appropriately.
+
+Management traffic (having DMAC 01-80-C2-xx-xx-xx or 01-19-1B-xx-xx-xx) is the
+notable exception: the switch always treats it with a fixed priority and
+disregards any VLAN PCP bits even if present. The traffic class for management
+traffic has a value of 7 (highest priority) at the moment, which is not
+configurable in the driver.
+
+Below is an example of configuring a 500 us cyclic schedule on egress port
+``swp5``. The traffic class gate for management traffic (7) is open for 100 us,
+and the gates for all other traffic classes are open for 400 us::
+
+ #!/bin/bash
+
+ set -e -u -o pipefail
+
+ NSEC_PER_SEC="1000000000"
+
+ gatemask() {
+ local tc_list="$1"
+ local mask=0
+
+ for tc in ${tc_list}; do
+ mask=$((${mask} | (1 << ${tc})))
+ done
+
+ printf "%02x" ${mask}
+ }
+
+ if ! systemctl is-active --quiet ptp4l; then
+ echo "Please start the ptp4l service"
+ exit
+ fi
+
+ now=$(phc_ctl /dev/ptp1 get | gawk '/clock time is/ { print $5; }')
+ # Phase-align the base time to the start of the next second.
+ sec=$(echo "${now}" | gawk -F. '{ print $1; }')
+ base_time="$(((${sec} + 1) * ${NSEC_PER_SEC}))"
+
+ tc qdisc add dev swp5 parent root handle 100 taprio \
+ num_tc 8 \
+ map 0 1 2 3 5 6 7 \
+ queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
+ base-time ${base_time} \
+ sched-entry S $(gatemask 7) 100000 \
+ sched-entry S $(gatemask "0 1 2 3 4 5 6") 400000 \
+ flags 2
+
+It is possible to apply the tc-taprio offload on multiple egress ports. There
+are hardware restrictions related to the fact that no gate event may trigger
+simultaneously on two ports. The driver checks the consistency of the schedules
+against this restriction and errors out when appropriate. Schedule analysis is
+needed to avoid this, which is outside the scope of the document.
+
+At the moment, the time-aware scheduler can only be triggered based on a
+standalone clock and not based on PTP time. This means the base-time argument
+from tc-taprio is ignored and the schedule starts right away. It also means it
+is more difficult to phase-align the scheduler with the other devices in the
+network.
+
Device Tree bindings and board design
=====================================
diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst
index a46fca264bee..d4dca42910d0 100644
--- a/Documentation/networking/index.rst
+++ b/Documentation/networking/index.rst
@@ -14,7 +14,10 @@ Contents:
device_drivers/index
dsa/index
devlink-info-versions
+ devlink-trap
+ devlink-trap-netdevsim
ieee802154
+ j1939
kapi
z8530book
msg_zerocopy
@@ -31,7 +34,7 @@ Contents:
tls
tls-offload
-.. only:: subproject
+.. only:: subproject and html
Indices
=======
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 22f6b8b1110a..8d4ad1d1ae26 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -80,6 +80,7 @@ fib_multipath_hash_policy - INTEGER
Possible values:
0 - Layer 3
1 - Layer 4
+ 2 - Layer 3 or inner Layer 3 if present
fib_sync_mem - UNSIGNED INTEGER
Amount of dirty memory from fib entries that can be backlogged before
@@ -206,8 +207,8 @@ TCP variables:
somaxconn - INTEGER
Limit of socket listen() backlog, known in userspace as SOMAXCONN.
- Defaults to 128. See also tcp_max_syn_backlog for additional tuning
- for TCP sockets.
+ Defaults to 4096. (Was 128 before linux-5.4)
+ See also tcp_max_syn_backlog for additional tuning for TCP sockets.
tcp_abort_on_overflow - BOOLEAN
If listening service is too slow to accept new connections,
@@ -255,6 +256,12 @@ tcp_base_mss - INTEGER
Path MTU discovery (MTU probing). If MTU probing is enabled,
this is the initial MSS used by the connection.
+tcp_mtu_probe_floor - INTEGER
+ If MTU probing is enabled this caps the minimum MSS used for search_low
+ for the connection.
+
+ Default : 48
+
tcp_min_snd_mss - INTEGER
TCP SYN and SYNACK messages usually advertise an ADVMSS option,
as described in RFC 1122 and RFC 6691.
@@ -401,11 +408,14 @@ tcp_max_orphans - INTEGER
up to ~64K of unswappable memory.
tcp_max_syn_backlog - INTEGER
- Maximal number of remembered connection requests, which have not
- received an acknowledgment from connecting client.
+ Maximal number of remembered connection requests (SYN_RECV),
+ which have not received an acknowledgment from connecting client.
+ This is a per-listener limit.
The minimal value is 128 for low memory machines, and it will
increase in proportion to the memory of machine.
If server suffers from overload, try increasing this number.
+ Remember to also check /proc/sys/net/core/somaxconn
+ A SYN_RECV request socket consumes about 304 bytes of memory.
tcp_max_tw_buckets - INTEGER
Maximal number of timewait sockets held by system simultaneously.
@@ -656,6 +666,26 @@ tcp_fastopen_blackhole_timeout_sec - INTEGER
0 to disable the blackhole detection.
By default, it is set to 1hr.
+tcp_fastopen_key - list of comma separated 32-digit hexadecimal INTEGERs
+ The list consists of a primary key and an optional backup key. The
+ primary key is used for both creating and validating cookies, while the
+ optional backup key is only used for validating cookies. The purpose of
+ the backup key is to maximize TFO validation when keys are rotated.
+
+ A randomly chosen primary key may be configured by the kernel if
+ the tcp_fastopen sysctl is set to 0x400 (see above), or if the
+ TCP_FASTOPEN setsockopt() optname is set and a key has not been
+ previously configured via sysctl. If keys are configured via
+ setsockopt() by using the TCP_FASTOPEN_KEY optname, then those
+ per-socket keys will be used instead of any keys that are specified via
+ sysctl.
+
+ A key is specified as 4 8-digit hexadecimal integers which are separated
+ by a '-' as: xxxxxxxx-xxxxxxxx-xxxxxxxx-xxxxxxxx. Leading zeros may be
+ omitted. A primary and a backup key may be specified by separating them
+ by a comma. If only one key is specified, it becomes the primary key and
+ any previously configured backup keys are removed.
+
tcp_syn_retries - INTEGER
Number of times initial SYNs for an active TCP connection attempt
will be retransmitted. Should not be higher than 127. Default value
@@ -1425,14 +1455,26 @@ flowlabel_state_ranges - BOOLEAN
FALSE: disabled
Default: true
-flowlabel_reflect - BOOLEAN
- Automatically reflect the flow label. Needed for Path MTU
+flowlabel_reflect - INTEGER
+ Control flow label reflection. Needed for Path MTU
Discovery to work with Equal Cost Multipath Routing in anycast
environments. See RFC 7690 and:
https://tools.ietf.org/html/draft-wang-6man-flow-label-reflection-01
- TRUE: enabled
- FALSE: disabled
- Default: FALSE
+
+ This is a bitmask.
+ 1: enabled for established flows
+
+ Note that this prevents automatic flowlabel changes, as done
+ in "tcp: change IPv6 flow-label upon receiving spurious retransmission"
+ and "tcp: Change txhash on every SYN and RTO retransmit"
+
+ 2: enabled for TCP RESET packets (no active listener)
+ If set, a RST packet sent in response to a SYN packet on a closed
+ port will reflect the incoming flow label.
+
+ 4: enabled for ICMPv6 echo reply messages.
+
+ Default: 0
fib_multipath_hash_policy - INTEGER
Controls which hash policy to use for multipath routes.
@@ -1440,6 +1482,7 @@ fib_multipath_hash_policy - INTEGER
Possible values:
0 - Layer 3 (source and destination addresses plus flow label)
1 - Layer 4 (standard 5-tuple)
+ 2 - Layer 3 or inner Layer 3 if present
anycast_src_echo_reply - BOOLEAN
Controls the use of anycast addresses as source addresses for ICMPv6
@@ -2253,7 +2296,7 @@ addr_scope_policy - INTEGER
/proc/sys/net/core/*
- Please see: Documentation/sysctl/net.txt for descriptions of these entries.
+ Please see: Documentation/admin-guide/sysctl/net.rst for descriptions of these entries.
/proc/sys/net/unix/*
diff --git a/Documentation/networking/j1939.rst b/Documentation/networking/j1939.rst
new file mode 100644
index 000000000000..dc60b13fcd09
--- /dev/null
+++ b/Documentation/networking/j1939.rst
@@ -0,0 +1,422 @@
+.. SPDX-License-Identifier: (GPL-2.0 OR MIT)
+
+===================
+J1939 Documentation
+===================
+
+Overview / What Is J1939
+========================
+
+SAE J1939 defines a higher layer protocol on CAN. It implements a more
+sophisticated addressing scheme and extends the maximum packet size above 8
+bytes. Several derived specifications exist, which differ from the original
+J1939 on the application level, like MilCAN A, NMEA2000 and especially
+ISO-11783 (ISOBUS). This last one specifies the so-called ETP (Extended
+Transport Protocol) which is has been included in this implementation. This
+results in a maximum packet size of ((2 ^ 24) - 1) * 7 bytes == 111 MiB.
+
+Specifications used
+-------------------
+
+* SAE J1939-21 : data link layer
+* SAE J1939-81 : network management
+* ISO 11783-6 : Virtual Terminal (Extended Transport Protocol)
+
+.. _j1939-motivation:
+
+Motivation
+==========
+
+Given the fact there's something like SocketCAN with an API similar to BSD
+sockets, we found some reasons to justify a kernel implementation for the
+addressing and transport methods used by J1939.
+
+* **Addressing:** when a process on an ECU communicates via J1939, it should
+ not necessarily know its source address. Although at least one process per
+ ECU should know the source address. Other processes should be able to reuse
+ that address. This way, address parameters for different processes
+ cooperating for the same ECU, are not duplicated. This way of working is
+ closely related to the UNIX concept where programs do just one thing, and do
+ it well.
+
+* **Dynamic addressing:** Address Claiming in J1939 is time critical.
+ Furthermore data transport should be handled properly during the address
+ negotiation. Putting this functionality in the kernel eliminates it as a
+ requirement for _every_ user space process that communicates via J1939. This
+ results in a consistent J1939 bus with proper addressing.
+
+* **Transport:** both TP & ETP reuse some PGNs to relay big packets over them.
+ Different processes may thus use the same TP & ETP PGNs without actually
+ knowing it. The individual TP & ETP sessions _must_ be serialized
+ (synchronized) between different processes. The kernel solves this problem
+ properly and eliminates the serialization (synchronization) as a requirement
+ for _every_ user space process that communicates via J1939.
+
+J1939 defines some other features (relaying, gateway, fast packet transport,
+...). In-kernel code for these would not contribute to protocol stability.
+Therefore, these parts are left to user space.
+
+The J1939 sockets operate on CAN network devices (see SocketCAN). Any J1939
+user space library operating on CAN raw sockets will still operate properly.
+Since such library does not communicate with the in-kernel implementation, care
+must be taken that these two do not interfere. In practice, this means they
+cannot share ECU addresses. A single ECU (or virtual ECU) address is used by
+the library exclusively, or by the in-kernel system exclusively.
+
+J1939 concepts
+==============
+
+PGN
+---
+
+The PGN (Parameter Group Number) is a number to identify a packet. The PGN
+is composed as follows:
+1 bit : Reserved Bit
+1 bit : Data Page
+8 bits : PF (PDU Format)
+8 bits : PS (PDU Specific)
+
+In J1939-21 distinction is made between PDU1 format (where PF < 240) and PDU2
+format (where PF >= 240). Furthermore, when using PDU2 format, the PS-field
+contains a so-called Group Extension, which is part of the PGN. When using PDU2
+format, the Group Extension is set in the PS-field.
+
+On the other hand, when using PDU1 format, the PS-field contains a so-called
+Destination Address, which is _not_ part of the PGN. When communicating a PGN
+from user space to kernel (or visa versa) and PDU2 format is used, the PS-field
+of the PGN shall be set to zero. The Destination Address shall be set
+elsewhere.
+
+Regarding PGN mapping to 29-bit CAN identifier, the Destination Address shall
+be get/set from/to the appropriate bits of the identifier by the kernel.
+
+
+Addressing
+----------
+
+Both static and dynamic addressing methods can be used.
+
+For static addresses, no extra checks are made by the kernel, and provided
+addresses are considered right. This responsibility is for the OEM or system
+integrator.
+
+For dynamic addressing, so-called Address Claiming, extra support is foreseen
+in the kernel. In J1939 any ECU is known by it's 64-bit NAME. At the moment of
+a successful address claim, the kernel keeps track of both NAME and source
+address being claimed. This serves as a base for filter schemes. By default,
+packets with a destination that is not locally, will be rejected.
+
+Mixed mode packets (from a static to a dynamic address or vice versa) are
+allowed. The BSD sockets define separate API calls for getting/setting the
+local & remote address and are applicable for J1939 sockets.
+
+Filtering
+---------
+
+J1939 defines white list filters per socket that a user can set in order to
+receive a subset of the J1939 traffic. Filtering can be based on:
+
+* SA
+* SOURCE_NAME
+* PGN
+
+When multiple filters are in place for a single socket, and a packet comes in
+that matches several of those filters, the packet is only received once for
+that socket.
+
+How to Use J1939
+================
+
+API Calls
+---------
+
+On CAN, you first need to open a socket for communicating over a CAN network.
+To use J1939, #include <linux/can/j1939.h>. From there, <linux/can.h> will be
+included too. To open a socket, use:
+
+.. code-block:: C
+
+ s = socket(PF_CAN, SOCK_DGRAM, CAN_J1939);
+
+J1939 does use SOCK_DGRAM sockets. In the J1939 specification, connections are
+mentioned in the context of transport protocol sessions. These still deliver
+packets to the other end (using several CAN packets). SOCK_STREAM is not
+supported.
+
+After the successful creation of the socket, you would normally use the bind(2)
+and/or connect(2) system call to bind the socket to a CAN interface. After
+binding and/or connecting the socket, you can read(2) and write(2) from/to the
+socket or use send(2), sendto(2), sendmsg(2) and the recv*() counterpart
+operations on the socket as usual. There are also J1939 specific socket options
+described below.
+
+In order to send data, a bind(2) must have been successful. bind(2) assigns a
+local address to a socket.
+
+Different from CAN is that the payload data is just the data that get send,
+without it's header info. The header info is derived from the sockaddr supplied
+to bind(2), connect(2), sendto(2) and recvfrom(2). A write(2) with size 4 will
+result in a packet with 4 bytes.
+
+The sockaddr structure has extensions for use with J1939 as specified below:
+
+.. code-block:: C
+
+ struct sockaddr_can {
+ sa_family_t can_family;
+ int can_ifindex;
+ union {
+ struct {
+ __u64 name;
+ /* pgn:
+ * 8 bit: PS in PDU2 case, else 0
+ * 8 bit: PF
+ * 1 bit: DP
+ * 1 bit: reserved
+ */
+ __u32 pgn;
+ __u8 addr;
+ } j1939;
+ } can_addr;
+ }
+
+can_family & can_ifindex serve the same purpose as for other SocketCAN sockets.
+
+can_addr.j1939.pgn specifies the PGN (max 0x3ffff). Individual bits are
+specified above.
+
+can_addr.j1939.name contains the 64-bit J1939 NAME.
+
+can_addr.j1939.addr contains the address.
+
+The bind(2) system call assigns the local address, i.e. the source address when
+sending packages. If a PGN during bind(2) is set, it's used as a RX filter.
+I.e. only packets with a matching PGN are received. If an ADDR or NAME is set
+it is used as a receive filter, too. It will match the destination NAME or ADDR
+of the incoming packet. The NAME filter will work only if appropriate Address
+Claiming for this name was done on the CAN bus and registered/cached by the
+kernel.
+
+On the other hand connect(2) assigns the remote address, i.e. the destination
+address. The PGN from connect(2) is used as the default PGN when sending
+packets. If ADDR or NAME is set it will be used as the default destination ADDR
+or NAME. Further a set ADDR or NAME during connect(2) is used as a receive
+filter. It will match the source NAME or ADDR of the incoming packet.
+
+Both write(2) and send(2) will send a packet with local address from bind(2) and
+the remote address from connect(2). Use sendto(2) to overwrite the destination
+address.
+
+If can_addr.j1939.name is set (!= 0) the NAME is looked up by the kernel and
+the corresponding ADDR is used. If can_addr.j1939.name is not set (== 0),
+can_addr.j1939.addr is used.
+
+When creating a socket, reasonable defaults are set. Some options can be
+modified with setsockopt(2) & getsockopt(2).
+
+RX path related options:
+
+- SO_J1939_FILTER - configure array of filters
+- SO_J1939_PROMISC - disable filters set by bind(2) and connect(2)
+
+By default no broadcast packets can be send or received. To enable sending or
+receiving broadcast packets use the socket option SO_BROADCAST:
+
+.. code-block:: C
+
+ int value = 1;
+ setsockopt(sock, SOL_SOCKET, SO_BROADCAST, &value, sizeof(value));
+
+The following diagram illustrates the RX path:
+
+.. code::
+
+ +--------------------+
+ | incoming packet |
+ +--------------------+
+ |
+ V
+ +--------------------+
+ | SO_J1939_PROMISC? |
+ +--------------------+
+ | |
+ no | | yes
+ | |
+ .---------' `---------.
+ | |
+ +---------------------------+ |
+ | bind() + connect() + | |
+ | SOCK_BROADCAST filter | |
+ +---------------------------+ |
+ | |
+ |<---------------------'
+ V
+ +---------------------------+
+ | SO_J1939_FILTER |
+ +---------------------------+
+ |
+ V
+ +---------------------------+
+ | socket recv() |
+ +---------------------------+
+
+TX path related options:
+SO_J1939_SEND_PRIO - change default send priority for the socket
+
+Message Flags during send() and Related System Calls
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+send(2), sendto(2) and sendmsg(2) take a 'flags' argument. Currently
+supported flags are:
+
+* MSG_DONTWAIT, i.e. non-blocking operation.
+
+recvmsg(2)
+^^^^^^^^^^
+
+In most cases recvmsg(2) is needed if you want to extract more information than
+recvfrom(2) can provide. For example package priority and timestamp. The
+Destination Address, name and packet priority (if applicable) are attached to
+the msghdr in the recvmsg(2) call. They can be extracted using cmsg(3) macros,
+with cmsg_level == SOL_J1939 && cmsg_type == SCM_J1939_DEST_ADDR,
+SCM_J1939_DEST_NAME or SCM_J1939_PRIO. The returned data is a uint8_t for
+priority and dst_addr, and uint64_t for dst_name.
+
+.. code-block:: C
+
+ uint8_t priority, dst_addr;
+ uint64_t dst_name;
+
+ for (cmsg = CMSG_FIRSTHDR(&msg); cmsg; cmsg = CMSG_NXTHDR(&msg, cmsg)) {
+ switch (cmsg->cmsg_level) {
+ case SOL_CAN_J1939:
+ if (cmsg->cmsg_type == SCM_J1939_DEST_ADDR)
+ dst_addr = *CMSG_DATA(cmsg);
+ else if (cmsg->cmsg_type == SCM_J1939_DEST_NAME)
+ memcpy(&dst_name, CMSG_DATA(cmsg), cmsg->cmsg_len - CMSG_LEN(0));
+ else if (cmsg->cmsg_type == SCM_J1939_PRIO)
+ priority = *CMSG_DATA(cmsg);
+ break;
+ }
+ }
+
+Dynamic Addressing
+------------------
+
+Distinction has to be made between using the claimed address and doing an
+address claim. To use an already claimed address, one has to fill in the
+j1939.name member and provide it to bind(2). If the name had claimed an address
+earlier, all further messages being sent will use that address. And the
+j1939.addr member will be ignored.
+
+An exception on this is PGN 0x0ee00. This is the "Address Claim/Cannot Claim
+Address" message and the kernel will use the j1939.addr member for that PGN if
+necessary.
+
+To claim an address following code example can be used:
+
+.. code-block:: C
+
+ struct sockaddr_can baddr = {
+ .can_family = AF_CAN,
+ .can_addr.j1939 = {
+ .name = name,
+ .addr = J1939_IDLE_ADDR,
+ .pgn = J1939_NO_PGN, /* to disable bind() rx filter for PGN */
+ },
+ .can_ifindex = if_nametoindex("can0"),
+ };
+
+ bind(sock, (struct sockaddr *)&baddr, sizeof(baddr));
+
+ /* for Address Claiming broadcast must be allowed */
+ int value = 1;
+ setsockopt(sock, SOL_SOCKET, SO_BROADCAST, &value, sizeof(value));
+
+ /* configured advanced RX filter with PGN needed for Address Claiming */
+ const struct j1939_filter filt[] = {
+ {
+ .pgn = J1939_PGN_ADDRESS_CLAIMED,
+ .pgn_mask = J1939_PGN_PDU1_MAX,
+ }, {
+ .pgn = J1939_PGN_ADDRESS_REQUEST,
+ .pgn_mask = J1939_PGN_PDU1_MAX,
+ }, {
+ .pgn = J1939_PGN_ADDRESS_COMMANDED,
+ .pgn_mask = J1939_PGN_MAX,
+ },
+ };
+
+ setsockopt(sock, SOL_CAN_J1939, SO_J1939_FILTER, &filt, sizeof(filt));
+
+ uint64_t dat = htole64(name);
+ const struct sockaddr_can saddr = {
+ .can_family = AF_CAN,
+ .can_addr.j1939 = {
+ .pgn = J1939_PGN_ADDRESS_CLAIMED,
+ .addr = J1939_NO_ADDR,
+ },
+ };
+
+ /* Afterwards do a sendto(2) with data set to the NAME (Little Endian). If the
+ * NAME provided, does not match the j1939.name provided to bind(2), EPROTO
+ * will be returned.
+ */
+ sendto(sock, dat, sizeof(dat), 0, (const struct sockaddr *)&saddr, sizeof(saddr));
+
+If no-one else contests the address claim within 250ms after transmission, the
+kernel marks the NAME-SA assignment as valid. The valid assignment will be kept
+among other valid NAME-SA assignments. From that point, any socket bound to the
+NAME can send packets.
+
+If another ECU claims the address, the kernel will mark the NAME-SA expired.
+No socket bound to the NAME can send packets (other than address claims). To
+claim another address, some socket bound to NAME, must bind(2) again, but with
+only j1939.addr changed to the new SA, and must then send a valid address claim
+packet. This restarts the state machine in the kernel (and any other
+participant on the bus) for this NAME.
+
+can-utils also include the jacd tool, so it can be used as code example or as
+default Address Claiming daemon.
+
+Send Examples
+-------------
+
+Static Addressing
+^^^^^^^^^^^^^^^^^
+
+This example will send a PGN (0x12300) from SA 0x20 to DA 0x30.
+
+Bind:
+
+.. code-block:: C
+
+ struct sockaddr_can baddr = {
+ .can_family = AF_CAN,
+ .can_addr.j1939 = {
+ .name = J1939_NO_NAME,
+ .addr = 0x20,
+ .pgn = J1939_NO_PGN,
+ },
+ .can_ifindex = if_nametoindex("can0"),
+ };
+
+ bind(sock, (struct sockaddr *)&baddr, sizeof(baddr));
+
+Now, the socket 'sock' is bound to the SA 0x20. Since no connect(2) was called,
+at this point we can use only sendto(2) or sendmsg(2).
+
+Send:
+
+.. code-block:: C
+
+ const struct sockaddr_can saddr = {
+ .can_family = AF_CAN,
+ .can_addr.j1939 = {
+ .name = J1939_NO_NAME;
+ .pgn = 0x30,
+ .addr = 0x12300,
+ },
+ };
+
+ sendto(sock, dat, sizeof(dat), 0, (const struct sockaddr *)&saddr, sizeof(saddr));
diff --git a/Documentation/networking/mac80211_hwsim/README b/Documentation/networking/mac80211_hwsim/mac80211_hwsim.rst
index 3566a725d19c..d2266ce5534e 100644
--- a/Documentation/networking/mac80211_hwsim/README
+++ b/Documentation/networking/mac80211_hwsim/mac80211_hwsim.rst
@@ -1,5 +1,13 @@
+:orphan:
+
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: <isonum.txt>
+
+===================================================================
mac80211_hwsim - software simulator of 802.11 radio(s) for mac80211
-Copyright (c) 2008, Jouni Malinen <j@w1.fi>
+===================================================================
+
+:Copyright: |copy| 2008, Jouni Malinen <j@w1.fi>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License version 2 as
@@ -7,6 +15,7 @@ published by the Free Software Foundation.
Introduction
+============
mac80211_hwsim is a Linux kernel module that can be used to simulate
arbitrary number of IEEE 802.11 radios for mac80211. It can be used to
@@ -43,6 +52,7 @@ regardless of channel.
Simple example
+==============
This example shows how to use mac80211_hwsim to simulate two radios:
one to act as an access point and the other as a station that
@@ -50,17 +60,19 @@ associates with the AP. hostapd and wpa_supplicant are used to take
care of WPA2-PSK authentication. In addition, hostapd is also
processing access point side of association.
+::
+
-# Build mac80211_hwsim as part of kernel configuration
+ # Build mac80211_hwsim as part of kernel configuration
-# Load the module
-modprobe mac80211_hwsim
+ # Load the module
+ modprobe mac80211_hwsim
-# Run hostapd (AP) for wlan0
-hostapd hostapd.conf
+ # Run hostapd (AP) for wlan0
+ hostapd hostapd.conf
-# Run wpa_supplicant (station) for wlan1
-wpa_supplicant -Dnl80211 -iwlan1 -c wpa_supplicant.conf
+ # Run wpa_supplicant (station) for wlan1
+ wpa_supplicant -Dnl80211 -iwlan1 -c wpa_supplicant.conf
More test cases are available in hostap.git:
diff --git a/Documentation/networking/mpls-sysctl.txt b/Documentation/networking/mpls-sysctl.txt
index 2f24a1912a48..025cc9b96992 100644
--- a/Documentation/networking/mpls-sysctl.txt
+++ b/Documentation/networking/mpls-sysctl.txt
@@ -30,7 +30,7 @@ ip_ttl_propagate - BOOL
0 - disabled / RFC 3443 [Short] Pipe Model
1 - enabled / RFC 3443 Uniform Model (default)
-default_ttl - BOOL
+default_ttl - INTEGER
Default TTL value to use for MPLS packets where it cannot be
propagated from an IP header, either because one isn't present
or ip_ttl_propagate has been disabled.
diff --git a/Documentation/networking/net_dim.txt b/Documentation/networking/net_dim.txt
index 9cb31c5e2dcd..9bdb7d5a3ba3 100644
--- a/Documentation/networking/net_dim.txt
+++ b/Documentation/networking/net_dim.txt
@@ -92,16 +92,16 @@ under some conditions.
Part III: Registering a Network Device to DIM
==============================================
-Net DIM API exposes the main function net_dim(struct net_dim *dim,
-struct net_dim_sample end_sample). This function is the entry point to the Net
+Net DIM API exposes the main function net_dim(struct dim *dim,
+struct dim_sample end_sample). This function is the entry point to the Net
DIM algorithm and has to be called every time the driver would like to check if
it should change interrupt moderation parameters. The driver should provide two
-data structures: struct net_dim and struct net_dim_sample. Struct net_dim
+data structures: struct dim and struct dim_sample. Struct dim
describes the state of DIM for a specific object (RX queue, TX queue,
other queues, etc.). This includes the current selected profile, previous data
samples, the callback function provided by the driver and more.
-Struct net_dim_sample describes a data sample, which will be compared to the
-data sample stored in struct net_dim in order to decide on the algorithm's next
+Struct dim_sample describes a data sample, which will be compared to the
+data sample stored in struct dim in order to decide on the algorithm's next
step. The sample should include bytes, packets and interrupts, measured by
the driver.
@@ -110,9 +110,9 @@ main net_dim() function. The recommended method is to call net_dim() on each
interrupt. Since Net DIM has a built-in moderation and it might decide to skip
iterations under certain conditions, there is no need to moderate the net_dim()
calls as well. As mentioned above, the driver needs to provide an object of type
-struct net_dim to the net_dim() function call. It is advised for each entity
-using Net DIM to hold a struct net_dim as part of its data structure and use it
-as the main Net DIM API object. The struct net_dim_sample should hold the latest
+struct dim to the net_dim() function call. It is advised for each entity
+using Net DIM to hold a struct dim as part of its data structure and use it
+as the main Net DIM API object. The struct dim_sample should hold the latest
bytes, packets and interrupts count. No need to perform any calculations, just
include the raw data.
@@ -132,19 +132,19 @@ usage is not complete but it should make the outline of the usage clear.
my_driver.c:
-#include <linux/net_dim.h>
+#include <linux/dim.h>
/* Callback for net DIM to schedule on a decision to change moderation */
void my_driver_do_dim_work(struct work_struct *work)
{
- /* Get struct net_dim from struct work_struct */
- struct net_dim *dim = container_of(work, struct net_dim,
- work);
+ /* Get struct dim from struct work_struct */
+ struct dim *dim = container_of(work, struct dim,
+ work);
/* Do interrupt moderation related stuff */
...
/* Signal net DIM work is done and it should move to next iteration */
- dim->state = NET_DIM_START_MEASURE;
+ dim->state = DIM_START_MEASURE;
}
/* My driver's interrupt handler */
@@ -152,13 +152,13 @@ int my_driver_handle_interrupt(struct my_driver_entity *my_entity, ...)
{
...
/* A struct to hold current measured data */
- struct net_dim_sample dim_sample;
+ struct dim_sample dim_sample;
...
/* Initiate data sample struct with current data */
- net_dim_sample(my_entity->events,
- my_entity->packets,
- my_entity->bytes,
- &dim_sample);
+ dim_update_sample(my_entity->events,
+ my_entity->packets,
+ my_entity->bytes,
+ &dim_sample);
/* Call net DIM */
net_dim(&my_entity->dim, dim_sample);
...
diff --git a/Documentation/networking/phy.rst b/Documentation/networking/phy.rst
index 0dd90d7df5ec..a689966bc4be 100644
--- a/Documentation/networking/phy.rst
+++ b/Documentation/networking/phy.rst
@@ -202,7 +202,8 @@ the PHY/controller, of which the PHY needs to be aware.
*interface* is a u32 which specifies the connection type used
between the controller and the PHY. Examples are GMII, MII,
-RGMII, and SGMII. For a full list, see include/linux/phy.h
+RGMII, and SGMII. See "PHY interface mode" below. For a full
+list, see include/linux/phy.h
Now just make sure that phydev->supported and phydev->advertising have any
values pruned from them which don't make sense for your controller (a 10/100
@@ -225,6 +226,48 @@ When you want to disconnect from the network (even if just briefly), you call
phy_stop(phydev). This function also stops the phylib state machine and
disables PHY interrupts.
+PHY interface modes
+===================
+
+The PHY interface mode supplied in the phy_connect() family of functions
+defines the initial operating mode of the PHY interface. This is not
+guaranteed to remain constant; there are PHYs which dynamically change
+their interface mode without software interaction depending on the
+negotiation results.
+
+Some of the interface modes are described below:
+
+``PHY_INTERFACE_MODE_1000BASEX``
+ This defines the 1000BASE-X single-lane serdes link as defined by the
+ 802.3 standard section 36. The link operates at a fixed bit rate of
+ 1.25Gbaud using a 10B/8B encoding scheme, resulting in an underlying
+ data rate of 1Gbps. Embedded in the data stream is a 16-bit control
+ word which is used to negotiate the duplex and pause modes with the
+ remote end. This does not include "up-clocked" variants such as 2.5Gbps
+ speeds (see below.)
+
+``PHY_INTERFACE_MODE_2500BASEX``
+ This defines a variant of 1000BASE-X which is clocked 2.5 times faster,
+ than the 802.3 standard giving a fixed bit rate of 3.125Gbaud.
+
+``PHY_INTERFACE_MODE_SGMII``
+ This is used for Cisco SGMII, which is a modification of 1000BASE-X
+ as defined by the 802.3 standard. The SGMII link consists of a single
+ serdes lane running at a fixed bit rate of 1.25Gbaud with 10B/8B
+ encoding. The underlying data rate is 1Gbps, with the slower speeds of
+ 100Mbps and 10Mbps being achieved through replication of each data symbol.
+ The 802.3 control word is re-purposed to send the negotiated speed and
+ duplex information from to the MAC, and for the MAC to acknowledge
+ receipt. This does not include "up-clocked" variants such as 2.5Gbps
+ speeds.
+
+ Note: mismatched SGMII vs 1000BASE-X configuration on a link can
+ successfully pass data in some circumstances, but the 16-bit control
+ word will not be correctly interpreted, which may cause mismatches in
+ duplex, pause or other settings. This is dependent on the MAC and/or
+ PHY behaviour.
+
+
Pause frames / flow control
===========================
diff --git a/Documentation/networking/sfp-phylink.rst b/Documentation/networking/sfp-phylink.rst
index 5bd26cb07244..a5e00a159d21 100644
--- a/Documentation/networking/sfp-phylink.rst
+++ b/Documentation/networking/sfp-phylink.rst
@@ -8,7 +8,8 @@ Overview
========
phylink is a mechanism to support hot-pluggable networking modules
-without needing to re-initialise the adapter on hot-plug events.
+directly connected to a MAC without needing to re-initialise the
+adapter on hot-plug events.
phylink supports conventional phylib-based setups, fixed link setups
and SFP (Small Formfactor Pluggable) modules at present.
@@ -98,6 +99,7 @@ this documentation.
4. Add::
struct phylink *phylink;
+ struct phylink_config phylink_config;
to the driver's private data structure. We shall refer to the
driver's private data pointer as ``priv`` below, and the driver's
@@ -223,8 +225,10 @@ this documentation.
.. code-block:: c
struct phylink *phylink;
+ priv->phylink_config.dev = &dev.dev;
+ priv->phylink_config.type = PHYLINK_NETDEV;
- phylink = phylink_create(dev, node, phy_mode, &phylink_ops);
+ phylink = phylink_create(&priv->phylink_config, node, phy_mode, &phylink_ops);
if (IS_ERR(phylink)) {
err = PTR_ERR(phylink);
fail probe;
diff --git a/Documentation/networking/timestamping.txt b/Documentation/networking/timestamping.txt
index bbdaf8990031..8dd6333c3270 100644
--- a/Documentation/networking/timestamping.txt
+++ b/Documentation/networking/timestamping.txt
@@ -368,7 +368,7 @@ ts[1] used to hold hardware timestamps converted to system time.
Instead, expose the hardware clock device on the NIC directly as
a HW PTP clock source, to allow time conversion in userspace and
optionally synchronize system time with a userspace PTP stack such
-as linuxptp. For the PTP clock API, see Documentation/ptp/ptp.txt.
+as linuxptp. For the PTP clock API, see Documentation/driver-api/ptp.rst.
Note that if the SO_TIMESTAMP or SO_TIMESTAMPNS option is enabled
together with SO_TIMESTAMPING using SOF_TIMESTAMPING_SOFTWARE, a false
diff --git a/Documentation/networking/tls-offload.rst b/Documentation/networking/tls-offload.rst
index 178f4104f5cf..f914e81fd3a6 100644
--- a/Documentation/networking/tls-offload.rst
+++ b/Documentation/networking/tls-offload.rst
@@ -206,7 +206,11 @@ TX
Segments transmitted from an offloaded socket can get out of sync
in similar ways to the receive side-retransmissions - local drops
-are possible, though network reorders are not.
+are possible, though network reorders are not. There are currently
+two mechanisms for dealing with out of order segments.
+
+Crypto state rebuilding
+~~~~~~~~~~~~~~~~~~~~~~~
Whenever an out of order segment is transmitted the driver provides
the device with enough information to perform cryptographic operations.
@@ -225,6 +229,35 @@ was just a retransmission. The former is simpler, and does not require
retransmission detection therefore it is the recommended method until
such time it is proven inefficient.
+Next record sync
+~~~~~~~~~~~~~~~~
+
+Whenever an out of order segment is detected the driver requests
+that the ``ktls`` software fallback code encrypt it. If the segment's
+sequence number is lower than expected the driver assumes retransmission
+and doesn't change device state. If the segment is in the future, it
+may imply a local drop, the driver asks the stack to sync the device
+to the next record state and falls back to software.
+
+Resync request is indicated with:
+
+.. code-block:: c
+
+ void tls_offload_tx_resync_request(struct sock *sk, u32 got_seq, u32 exp_seq)
+
+Until resync is complete driver should not access its expected TCP
+sequence number (as it will be updated from a different context).
+Following helper should be used to test if resync is complete:
+
+.. code-block:: c
+
+ bool tls_offload_tx_resync_pending(struct sock *sk)
+
+Next time ``ktls`` pushes a record it will first send its TCP sequence number
+and TLS record number to the driver. Stack will also make sure that
+the new record will start on a segment boundary (like it does when
+the connection is initially added).
+
RX
--
@@ -268,6 +301,9 @@ Device can only detect that segment 4 also contains a TLS header
if it knows the length of the previous record from segment 2. In this case
the device will lose synchronization with the stream.
+Stream scan resynchronization
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
When the device gets out of sync and the stream reaches TCP sequence
numbers more than a max size record past the expected TCP sequence number,
the device starts scanning for a known header pattern. For example
@@ -298,6 +334,22 @@ Special care has to be taken if the confirmation request is passed
asynchronously to the packet stream and record may get processed
by the kernel before the confirmation request.
+Stack-driven resynchronization
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The driver may also request the stack to perform resynchronization
+whenever it sees the records are no longer getting decrypted.
+If the connection is configured in this mode the stack automatically
+schedules resynchronization after it has received two completely encrypted
+records.
+
+The stack waits for the socket to drain and informs the device about
+the next expected record number and its TCP sequence number. If the
+records continue to be received fully encrypted stack retries the
+synchronization with an exponential back off (first after 2 encrypted
+records, then after 4 records, after 8, after 16... up until every
+128 records).
+
Error handling
==============
@@ -372,14 +424,28 @@ Statistics
Following minimum set of TLS-related statistics should be reported
by the driver:
- * ``rx_tls_decrypted`` - number of successfully decrypted TLS segments
- * ``tx_tls_encrypted`` - number of in-order TLS segments passed to device
- for encryption
+ * ``rx_tls_decrypted_packets`` - number of successfully decrypted RX packets
+ which were part of a TLS stream.
+ * ``rx_tls_decrypted_bytes`` - number of TLS payload bytes in RX packets
+ which were successfully decrypted.
+ * ``tx_tls_encrypted_packets`` - number of TX packets passed to the device
+ for encryption of their TLS payload.
+ * ``tx_tls_encrypted_bytes`` - number of TLS payload bytes in TX packets
+ passed to the device for encryption.
+ * ``tx_tls_ctx`` - number of TLS TX HW offload contexts added to device for
+ encryption.
* ``tx_tls_ooo`` - number of TX packets which were part of a TLS stream
- but did not arrive in the expected order
- * ``tx_tls_drop_no_sync_data`` - number of TX packets dropped because
- they arrived out of order and associated record could not be found
- (see also :ref:`pre_tls_data`)
+ but did not arrive in the expected order.
+ * ``tx_tls_skip_no_sync_data`` - number of TX packets which were part of
+ a TLS stream and arrived out-of-order, but skipped the HW offload routine
+ and went to the regular transmit flow as they were retransmissions of the
+ connection handshake.
+ * ``tx_tls_drop_no_sync_data`` - number of TX packets which were part of
+ a TLS stream dropped, because they arrived out of order and associated
+ record could not be found.
+ * ``tx_tls_drop_bypass_req`` - number of TX packets which were part of a TLS
+ stream dropped, because they contain both data that has been encrypted by
+ software and data that expects hardware crypto offload.
Notable corner cases, exceptions and additional requirements
============================================================
@@ -444,21 +510,3 @@ Drivers should ignore the changes to TLS the device feature flags.
These flags will be acted upon accordingly by the core ``ktls`` code.
TLS device feature flags only control adding of new TLS connection
offloads, old connections will remain active after flags are cleared.
-
-.. _pre_tls_data:
-
-Transmission of pre-TLS data
-----------------------------
-
-User can enqueue some already encrypted and framed records before enabling
-``ktls`` on the socket. Those records have to get sent as they are. This is
-perfectly easy to handle in the software case - such data will be waiting
-in the TCP layer, TLS ULP won't see it. In the offloaded case when pre-queued
-segment reaches transmission point it appears to be out of order (before the
-expected TCP sequence number) and the stack does not have a record information
-associated.
-
-All segments without record information cannot, however, be assumed to be
-pre-queued data, because a race condition exists between TCP stack queuing
-a retransmission, the driver seeing the retransmission and TCP ACK arriving
-for the retransmitted data.
diff --git a/Documentation/networking/tuntap.txt b/Documentation/networking/tuntap.txt
index 949d5dcdd9a3..0104830d5075 100644
--- a/Documentation/networking/tuntap.txt
+++ b/Documentation/networking/tuntap.txt
@@ -204,8 +204,8 @@ Ethernet device, which instead of receiving packets from a physical
media, receives them from user space program and instead of sending
packets via physical media sends them to the user space program.
-Let's say that you configured IPX on the tap0, then whenever
-the kernel sends an IPX packet to tap0, it is passed to the application
+Let's say that you configured IPv6 on the tap0, then whenever
+the kernel sends an IPv6 packet to tap0, it is passed to the application
(VTun for example). The application encrypts, compresses and sends it to
the other side over TCP or UDP. The application on the other side decompresses
and decrypts the data received and writes the packet to the TAP device,