summaryrefslogtreecommitdiffstats
path: root/drivers/net/bonding/bond_main.c
AgeCommit message (Collapse)Author
2015-08-12bonding: Gratuitous ARP gets dropped when first slave addedVenkat Venkatsubra
When the first slave is added (such as during bootup) the first gratuitous ARP gets dropped. We don't see this drop during a failover. The packet gets dropped in qdisc (noop_enqueue). The fix is to delay the sending of gratuitous ARPs till the bond dev's carrier is present. It can also be worked around by setting num_grat_arp to more than 1. Signed-off-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20bonding: correct the MAC address for "follow" fail_over_mac policydingtianhong
The "follow" fail_over_mac policy is useful for multiport devices that either become confused or incur a performance penalty when multiple ports are programmed with the same MAC address, but the same MAC address still may happened by this steps for this policy: 1) echo +eth0 > /sys/class/net/bond0/bonding/slaves bond0 has the same mac address with eth0, it is MAC1. 2) echo +eth1 > /sys/class/net/bond0/bonding/slaves eth1 is backup, eth1 has MAC2. 3) ifconfig eth0 down eth1 became active slave, bond will swap MAC for eth0 and eth1, so eth1 has MAC1, and eth0 has MAC2. 4) ifconfig eth1 down there is no active slave, and eth1 still has MAC1, eth2 has MAC2. 5) ifconfig eth0 up the eth0 became active slave again, the bond set eth0 to MAC1. Something wrong here, then if you set eth1 up, the eth0 and eth1 will have the same MAC address, it will break this policy for ACTIVE_BACKUP mode. This patch will fix this problem by finding the old active slave and swap them MAC address before change active slave. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Tested-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20bonding: correctly handle bonding type change on enslave failureNikolay Aleksandrov
If the bond is enslaving a device with different type it will be setup by it, but if after being setup the enslave fails the bond doesn't switch back its type and also keeps pointers to foreign structures that can be long gone. Thus revert back any type changes if the enslave failed and the bond had to change its type. Example: Before patch: $ echo lo > bond0/bonding/slaves -bash: echo: write error: Cannot assign requested address $ ip l sh bond0 20: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN mode DEFAULT group default link/loopback 16:54:78:34:bd:41 brd 00:00:00:00:00:00 $ echo +eth1 > bond0/bonding/slaves $ ip l sh bond0 20: bond0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 52:54:00:3f:47:69 brd ff:ff:ff:ff:ff:ff (notice the MASTER flag is gone) After patch: $ echo lo > bond0/bonding/slaves -bash: echo: write error: Cannot assign requested address $ ip l sh bond0 21: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 6e:66:94:f6:07:fc brd ff:ff:ff:ff:ff:ff $ echo +eth1 > bond0/bonding/slaves $ ip l sh bond0 21: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 52:54:00:3f:47:69 brd ff:ff:ff:ff:ff:ff Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Fixes: e36b9d16c6a6 ("bonding: clean muticast addresses when device changes type") Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20bonding: fix destruction of bond with devices different from arphrd_etherNikolay Aleksandrov
When the bonding is being unloaded and the netdevice notifier is unregistered it executes NETDEV_UNREGISTER for each device which should remove the bond's proc entry but if the device enslaved is not of ARPHRD_ETHER type and is in front of the bonding, it may execute bond_release_and_destroy() first which would release the last slave and destroy the bond device leaving the proc entry and thus we will get the following error (with dynamic debug on for bond_netdev_event to see the events order): [ 908.963051] eql: event: 9 [ 908.963052] eql: IFF_SLAVE [ 908.963054] eql: event: 2 [ 908.963056] eql: IFF_SLAVE [ 908.963058] eql: event: 6 [ 908.963059] eql: IFF_SLAVE [ 908.963110] bond0: Releasing active interface eql [ 908.976168] bond0: Destroying bond bond0 [ 908.976266] bond0 (unregistering): Released all slaves [ 908.984097] ------------[ cut here ]------------ [ 908.984107] WARNING: CPU: 0 PID: 1787 at fs/proc/generic.c:575 remove_proc_entry+0x112/0x160() [ 908.984110] remove_proc_entry: removing non-empty directory 'net/bonding', leaking at least 'bond0' [ 908.984111] Modules linked in: bonding(-) eql(O) 9p nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ppdev qxl drm_kms_helper snd_hda_codec_generic aesni_intel ttm aes_x86_64 glue_helper pcspkr lrw gf128mul ablk_helper cryptd snd_hda_intel virtio_console snd_hda_codec psmouse serio_raw snd_hwdep snd_hda_core 9pnet_virtio 9pnet evdev joydev drm virtio_balloon snd_pcm snd_timer snd soundcore i2c_piix4 i2c_core pvpanic acpi_cpufreq parport_pc parport processor thermal_sys button autofs4 ext4 crc16 mbcache jbd2 hid_generic usbhid hid sg sr_mod cdrom ata_generic virtio_blk virtio_net floppy ata_piix e1000 libata ehci_pci virtio_pci scsi_mod uhci_hcd ehci_hcd virtio_ring virtio usbcore usb_common [last unloaded: bonding] [ 908.984168] CPU: 0 PID: 1787 Comm: rmmod Tainted: G W O 4.2.0-rc2+ #8 [ 908.984170] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 908.984172] 0000000000000000 ffffffff81732d41 ffffffff81525b34 ffff8800358dfda8 [ 908.984175] ffffffff8106c521 ffff88003595af78 ffff88003595af40 ffff88003e3a4280 [ 908.984178] ffffffffa058d040 0000000000000000 ffffffff8106c59a ffffffff8172ebd0 [ 908.984181] Call Trace: [ 908.984188] [<ffffffff81525b34>] ? dump_stack+0x40/0x50 [ 908.984193] [<ffffffff8106c521>] ? warn_slowpath_common+0x81/0xb0 [ 908.984196] [<ffffffff8106c59a>] ? warn_slowpath_fmt+0x4a/0x50 [ 908.984199] [<ffffffff81218352>] ? remove_proc_entry+0x112/0x160 [ 908.984205] [<ffffffffa05850e6>] ? bond_destroy_proc_dir+0x26/0x30 [bonding] [ 908.984208] [<ffffffffa057540e>] ? bond_net_exit+0x8e/0xa0 [bonding] [ 908.984217] [<ffffffff8142f407>] ? ops_exit_list.isra.4+0x37/0x70 [ 908.984225] [<ffffffff8142f52d>] ? unregister_pernet_operations+0x8d/0xd0 [ 908.984228] [<ffffffff8142f58d>] ? unregister_pernet_subsys+0x1d/0x30 [ 908.984232] [<ffffffffa0585269>] ? bonding_exit+0x23/0xdba [bonding] [ 908.984236] [<ffffffff810e28ba>] ? SyS_delete_module+0x18a/0x250 [ 908.984241] [<ffffffff81086f99>] ? task_work_run+0x89/0xc0 [ 908.984244] [<ffffffff8152b732>] ? entry_SYSCALL_64_fastpath+0x16/0x75 [ 908.984247] ---[ end trace 7c006ed4abbef24b ]--- Thus remove the proc entry manually if bond_release_and_destroy() is used. Because of the checks in bond_remove_proc_entry() it's not a problem for a bond device to change namespaces (the bug fixed by the Fixes commit) but since commit f9399814927ad ("bonding: Don't allow bond devices to change network namespaces.") that can't happen anyway. Reported-by: Carol Soto <clsoto@linux.vnet.ibm.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Fixes: a64d49c3dd50 ("bonding: Manage /proc/net/bonding/ entries from the netdev events") Tested-by: Carol L Soto <clsoto@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-08bonding: "primary_reselect" with "failure" is not working properlyMazhar Rana
When "primary_reselect" is set to "failure", primary interface should not become active until current active slave is down. But if we set first member of bond device as a "primary" interface and "primary_reselect" is set to "failure" then whenever primary interface's link get back(up) it become active slave even if current active slave is still up. With this patch, "bond_find_best_slave" will not traverse members if primary interface is not candidate for failover/reselection and current active slave is still up. Signed-off-by: Mazhar Rana <mazhar.rana@cyberoam.com> Signed-off-by: Jay Vosburgh <j.vosburgh@gmail.com> Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-04net: Add full IPv6 addresses to flow_keysTom Herbert
This patch adds full IPv6 addresses into flow_keys and uses them as input to the flow hash function. The implementation supports either IPv4 or IPv6 addresses in a union, and selector is used to determine how may words to input to jhash2. We also add flow_get_u32_dst and flow_get_u32_src functions which are used to get a u32 representation of the source and destination addresses. For IPv6, ipv6_addr_hash is called. These functions retain getting the legacy values of src and dst in flow_keys. With this patch, Ethertype and IP protocol are now included in the flow hash input. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-17switchdev: add support for fdb add/del/dump via switchdev_port_obj ops.Samudrala, Sridhar
- introduce port fdb obj and generic switchdev_port_fdb_add/del/dump() - use switchdev_port_fdb_add/del/dump in rocker/team/bonding ndo ops. - add support for fdb obj in switchdev_port_obj_add/del/dump() - switch rocker to implement fdb ops via switchdev_ops v3: updated to sync with named union changes. Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-13flow_dissect: use programable dissector in skb_flow_dissect and friendsJiri Pirko
Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-13net: change name of flow_dissector header to match the .c file nameJiri Pirko
add couple of empty lines on the way. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-12switchdev: remove NETIF_F_HW_SWITCH_OFFLOAD feature flagScott Feldman
Roopa said remove the feature flag for this series and she'll work on bringing it back if needed at a later date. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-12switchdev: cut over to new switchdev_port_bridge_getlinkScott Feldman
Signed-off-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-12switchdev: cut over to new switchdev_port_bridge_dellinkScott Feldman
Rocker, bonding and team and switch over to the new switchdev_port_bridge_dellink to avoid duplicating code in each driver. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-12switchdev: cut over to new switchdev_port_bridge_setlinkScott Feldman
Rocker, bonding, and team can now use the switchdev bridge setlink to parse raw netlink; no need to duplicate this code in each driver. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-12switchdev: s/netdev_switch_/switchdev_/ and s/NETDEV_SWITCH_/SWITCHDEV_/Jiri Pirko
Turned out that "switchdev" sticks. So just unify all related terms to use this prefix. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-11bonding: Implement user key part of port_key in an AD system.Mahesh Bandewar
The port key has three components - user-key, speed-part, and duplex-part. The LSBit is for the duplex-part, next 5 bits are for the speed while the remaining 10 bits are the user defined key bits. Get these 10 bits from the user-space (through the SysFs interface) and use it to form the admin port-key. Allowed range for the user-key is 0 - 1023 (10 bits). If it is not provided then use zero for the user-key-bits (default). It can set using following example code - # modprobe bonding mode=4 # usr_port_key=$(( RANDOM & 0x3FF )) # echo $usr_port_key > /sys/class/net/bond0/bonding/ad_user_port_key # echo +eth1 > /sys/class/net/bond0/bonding/slaves ... # ip link set bond0 up Signed-off-by: Mahesh Bandewar <maheshb@google.com> Reviewed-by: Nikolay Aleksandrov <nikolay@redhat.com> [jt: * fixed up style issues reported by checkpatch * fixed up context from change in ad_actor_sys_prio patch] Signed-off-by: Jonathan Toppins <jtoppins@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-11bonding: Allow userspace to set actors' macaddr in an AD-system.Mahesh Bandewar
In an AD system, the communication between actor and partner is the business between these two entities. In the current setup anyone on the same L2 can "guess" the LACPDU contents and then possibly send the spoofed LACPDUs and trick the partner causing connectivity issues for the AD system. This patch allows to use a random mac-address obscuring it's identity making it harder for someone in the L2 is do the same thing. This patch allows user-space to choose the mac-address for the AD-system. This mac-address can not be NULL or a Multicast. If the mac-address is set from user-space; kernel will honor it and will not overwrite it. In the absence (value from user space); the logic will default to using the masters' mac as the mac-address for the AD-system. It can be set using example code below - # modprobe bonding mode=4 # sys_mac_addr=$(printf '%02x:%02x:%02x:%02x:%02x:%02x' \ $(( (RANDOM & 0xFE) | 0x02 )) \ $(( RANDOM & 0xFF )) \ $(( RANDOM & 0xFF )) \ $(( RANDOM & 0xFF )) \ $(( RANDOM & 0xFF )) \ $(( RANDOM & 0xFF ))) # echo $sys_mac_addr > /sys/class/net/bond0/bonding/ad_actor_system # echo +eth1 > /sys/class/net/bond0/bonding/slaves ... # ip link set bond0 up Signed-off-by: Mahesh Bandewar <maheshb@google.com> Reviewed-by: Nikolay Aleksandrov <nikolay@redhat.com> [jt: fixed up style issues reported by checkpatch] Signed-off-by: Jonathan Toppins <jtoppins@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-11bonding: Allow userspace to set actors' system_priority in AD systemMahesh Bandewar
This patch allows user to randomize the system-priority in an ad-system. The allowed range is 1 - 0xFFFF while default value is 0xFFFF. If user does not specify this value, the system defaults to 0xFFFF, which is what it was before this patch. Following example code could set the value - # modprobe bonding mode=4 # sys_prio=$(( 1 + RANDOM + RANDOM )) # echo $sys_prio > /sys/class/net/bond0/bonding/ad_actor_sys_prio # echo +eth1 > /sys/class/net/bond0/bonding/slaves ... # ip link set bond0 up Signed-off-by: Mahesh Bandewar <maheshb@google.com> Reviewed-by: Nikolay Aleksandrov <nikolay@redhat.com> [jt: * fixed up style issues reported by checkpatch * changed how the default value is set in bond_check_params(), this makes the default consistent between what gets set for a new bond and what the default is claimed to be in the bonding options.] Signed-off-by: Jonathan Toppins <jtoppins@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-29net: Fix Kernel Panic in bonding driver debugfs file: rlb_hash_tablePai
This patch fixes a Kernel Panic in bonding driver debugfs file: rlb_hash_table. $> modprobe bonding mode=6 $> cat /sys/kernel/debug/bonding/bond0/rlb_hash_table This will crash the kernel. The struct alb_bond_info is initialized only when the bonding interface is initialized (ip link set bond0 up) and not at the time it is allocated. If we try to read the table before that, it'll result in a kernel panic. The patch applies against both net and net-next Signed-off-by: Vishwanath Pai <vpai@akamai.com> Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-26net/bonding: Make DRV macros privateMatan Barak
The bonding modules currently defines four macros with general names that pollute the global namespace: DRV_VERSION DRV_RELDATE DRV_NAME DRV_DESCRIPTION Fixing that by defining a private bonding_priv.h header files which includes those defines. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/usb/asix_common.c drivers/net/usb/sr9800.c drivers/net/usb/usbnet.c include/linux/usb/usbnet.h net/ipv4/tcp_ipv4.c net/ipv6/tcp_ipv6.c The TCP conflicts were overlapping changes. In 'net' we added a READ_ONCE() to the socket cached RX route read, whilst in 'net-next' Eric Dumazet touched the surrounding code dealing with how mini sockets are handled. With USB, it's a case of the same bug fix first going into net-next and then I cherry picked it back into net. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-31bonding: Bonding Overriding Configuration logic restored.Anton Nayshtut
Before commit 3900f29021f0bc7fe9815aa32f1a993b7dfdd402 ("bonding: slight optimizztion for bond_slave_override()") the override logic was to send packets with non-zero queue_id through the slave with corresponding queue_id, under two conditions only - if the slave can transmit and it's up. The above mentioned commit changed this logic by introducing an additional condition - whether the bond is active (indirectly, using the slave_can_tx and later - bond_is_active_slave), that prevents the user from implementing more complex policies according to the Documentation/networking/bonding.txt. Signed-off-by: Anton Nayshtut <anton@swortex.com> Signed-off-by: Alexey Bogoslavsky <alexey@swortex.com> Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-29bonding: Don't segment multiple tagged packets on bonding deviceToshiaki Makita
Bonding devices don't need to segment multiple tagged packets since their slaves can segment them. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-06bonding: implement bond_poll_controller()Mahesh Bandewar
This patches implements the poll_controller support for all bonding driver. If the slaves have poll_controller net_op defined, this implementation calls them. This is mode agnostic implementation and iterates through all slaves (based on mode) and calls respective handler. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-20bonding: simple code refactorMahesh Bandewar
Remove duplicate code. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-09net/bonding: Fix potential bad memory access during bonding eventsMoni Shoua
When queuing work to send the NETDEV_BONDING_INFO netdev event, it's possible that when the work is executed, the pointer to the slave becomes invalid. This can happen if between queuing the event and the execution of the work, the net-device was un-ensvaled and re-enslaved. Fix that by queuing a work with the data of the slave instead of the slave structure. Fixes: 69e6113343cf ('net/bonding: Notify state change on slaves') Reported-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-04net/bonding: Notify state change on slavesMoni Shoua
Use notifier chain to dispatch an event upon a change in slave state. Event is dispatched with slave specific info. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-04net/bonding: Move slave state changes to a helper functionMoni Shoua
Move slave state changes to a helper function, this is a pre-step for adding functionality of dispatching an event when this helper is called. This commit doesn't add new functionality. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-01bonding: handle NETIF_F_HW_SWITCH_OFFLOAD flag and add ↵Roopa Prabhu
ndo_bridge_setlink/dellink handlers We want bond to pick up the offload flag if any of its slaves have it. NETIF_F_HW_SWITCH_OFFLOAD flag is added to the mask, so that netdev_increment_features does not ignore it. This also adds ndo_bridge_setlink and ndo_bridge_dellink handlers. These currently point to the default handlers provided by the switchdev api. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-27bonding: cleanup and remove dead codeJonathan Toppins
fix sparse warning about non-static function drivers/net/bonding/bond_main.c:3737:5: warning: symbol 'bond_3ad_xor_xmit' was not declared. Should it be static? Reviewed-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: Jonathan Toppins <jtoppins@cumulusnetworks.com> Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-27bonding: fix bond_open() don't always set slave active flagWilson Kok
Mode 802.3ad, fix incorrect bond slave active state when slave is not in active aggregator. During bond_open(), the bonding driver always sets the slave active flag to true if the bond is not in active-backup, alb, or tlb modes. Bonding should let the aggregator selection logic set the active flag when in 802.3ad mode. Cc: Andy Gospodarek <gospo@cumulusnetworks.com> Reviewed-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com> Signed-off-by: Jonathan Toppins <jtoppins@cumulusnetworks.com> Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-27bonding: update bond carrier state when min_links option changesJonathan Toppins
Cc: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: Jonathan Toppins <jtoppins@cumulusnetworks.com> Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25bonding: handle more gso typesEric Dumazet
In commit 5a7baa78851b ("bonding: Advertize vxlan offload features when supported"), Or Gerlitz added support conditional vxlan offload. In this patch I also add support for all kind of tunnels, but we allow a bonding device to not require segmentation, as it is always better to make this segmentation at the very last stage, if a particular slave device requires it. Tested: Setup a GRE tunnel, on a physical NIC not having tx-gre-segmentation. Results on bnx2x are even better, as we no longer have to segment in software. ethtool -K bond0 tx-gre-segmentation off super_netperf 50 --google-pacing-rate 30000000 -H 10.7.8.152 -l 15 7538.32 ethtool -K bond0 tx-gre-segmentation on super_netperf 50 --google-pacing-rate 30000000 -H 10.7.8.152 -l 15 10200.5 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-27bonding: change error message to debug message in __bond_release_one()Wengang Wang
In __bond_release_one(), when the interface is not a slave or not a slave of "this" master, it log error message. The message actually should be a debug message matching what bond_enslave() does. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Acked-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ieee802154/fakehard.c A bug fix went into 'net' for ieee802154/fakehard.c, which is removed in 'net-next'. Add build fix into the merge from Stephen Rothwell in openvswitch, the logging macros take a new initial 'log' argument, a new call was added in 'net' so when we merge that in here we have to explicitly add the new 'log' arg to it else the build fails. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-21vlan: rename __vlan_put_tag to vlan_insert_tag_set_protoJiri Pirko
Name fits better. Plus there's going to be introduced __vlan_insert_tag later on. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-21vlan: kill vlan_put_tag helperJiri Pirko
Since both tx and rx paths work with skb->vlan_tci, there's no need for this function anymore. Switch users directly to __vlan_hwaccel_put_tag. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-19bonding: fix curr_active_slave/carrier with loadbalance arp monitoringNikolay Aleksandrov
Since commit 6fde8f037e60 ("bonding: fix locking in bond_loadbalance_arp_mon()") we can have a stale bond carrier state and stale curr_active_slave when using arp monitoring in loadbalance modes. The reason is that in bond_loadbalance_arp_mon() we can't have do_failover == true but slave_state_changed == false, whenever do_failover is true then slave_state_changed is also true. Then the following piece from bond_loadbalance_arp_mon(): if (slave_state_changed) { bond_slave_state_change(bond); if (BOND_MODE(bond) == BOND_MODE_XOR) bond_update_slave_arr(bond, NULL); } else if (do_failover) { block_netpoll_tx(); bond_select_active_slave(bond); unblock_netpoll_tx(); } will execute only the first branch, always and regardless of do_failover. Since these two events aren't related in such way, we need to decouple and consider them separately. For example this issue could lead to the following result: Bonding Mode: load balancing (round-robin) *MII Status: down* MII Polling Interval (ms): 0 Up Delay (ms): 0 Down Delay (ms): 0 ARP Polling Interval (ms): 100 ARP IP target/s (n.n.n.n form): 192.168.9.2 Slave Interface: ens12 *MII Status: up* Speed: 10000 Mbps Duplex: full Link Failure Count: 2 Permanent HW addr: 00:0f:53:01:42:2c Slave queue ID: 0 Slave Interface: eth1 *MII Status: up* Speed: Unknown Duplex: Unknown Link Failure Count: 70 Permanent HW addr: 52:54:00:2f:0f:8e Slave queue ID: 0 Since some interfaces are up, then the status of the bond should also be up, but it will never change unless something invokes bond_set_carrier() (i.e. enslave, bond_select_active_slave etc). Now, if I force the calling of bond_select_active_slave via for example changing primary_reselect (it can change in any mode), then the MII status goes to "up" because it calls bond_select_active_slave() which should've been done from bond_loadbalance_arp_mon() itself. CC: Veaceslav Falico <vfalico@gmail.com> CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: Ding Tianhong <dingtianhong@huawei.com> Fixes: 6fde8f037e60 ("bonding: fix locking in bond_loadbalance_arp_mon()") Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Acked-by: Veaceslav Falico <vfalico@gmail.com> Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com> Acked-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13net: generic dev_disable_lro() stacked device handlingMichal Kubeček
Large receive offloading is known to cause problems if received packets are passed to other host. Therefore the kernel disables it by calling dev_disable_lro() whenever a network device is enslaved in a bridge or forwarding is enabled for it (or globally). For virtual devices we need to disable LRO on the underlying physical device (which is actually receiving the packets). Current dev_disable_lro() code handles this propagation for a vlan (including 802.1ad nested vlan), macvlan or a vlan on top of a macvlan. It doesn't handle other stacked devices and their combinations, in particular propagation from a bond to its slaves which often causes problems in virtualization setups. As we now have generic data structures describing the upper-lower device relationship, dev_disable_lro() can be generalized to disable LRO also for all lower devices (if any) once it is disabled for the device itself. For bonding and teaming devices, it is necessary to disable LRO not only on current slaves at the moment when dev_disable_lro() is called but also on any slave (port) added later. v2: use lower device links for all devices (including vlan and macvlan) Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Veaceslav Falico <vfalico@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-10net: Move bonding headers under include/netDavid S. Miller
This ways drivers like cxgb4 don't need to do ugly relative includes. Reported-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-31bonding: add bond_tx_drop() helperEric Dumazet
Because bonding stats are usually sum of slave stats, it was not easy to account for tx drops at bonding layer. We can use dev->tx_dropped for this, as this counter is later added to the device stats (in dev_get_stats()) This extends the idea we had in commit ee6377147409a ("bonding: Simplify the xmit function for modes that use xmit_hash") for bond_3ad_xor_xmit() to other bonding modes. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Mahesh Bandewar <maheshb@google.com> Reviewed-by: Nikolay Aleksandrov <nikolay@redhat.com> Acked-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-07net: better IFF_XMIT_DST_RELEASE supportEric Dumazet
Testing xmit_more support with netperf and connected UDP sockets, I found strange dst refcount false sharing. Current handling of IFF_XMIT_DST_RELEASE is not optimal. Dropping dst in validate_xmit_skb() is certainly too late in case packet was queued by cpu X but dequeued by cpu Y The logical point to take care of drop/force is in __dev_queue_xmit() before even taking qdisc lock. As Julian Anastasov pointed out, need for skb_dst() might come from some packet schedulers or classifiers. This patch adds new helper to cleanly express needs of various drivers or qdiscs/classifiers. Drivers that need skb_dst() in their ndo_start_xmit() should call following helper in their setup instead of the prior : dev->priv_flags &= ~IFF_XMIT_DST_RELEASE; -> netif_keep_dst(dev); Instead of using a single bit, we use two bits, one being eventually rebuilt in bonding/team drivers. The other one, is permanent and blocks IFF_XMIT_DST_RELEASE being rebuilt in bonding/team. Eventually, we could add something smarter later. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-06bonding: Simplify the xmit function for modes that use xmit_hashMahesh Bandewar
Earlier change to use usable slave array for TLB mode had an additional performance advantage. So extending the same logic to all other modes that use xmit-hash for slave selection (viz 802.3AD, and XOR modes). Also consolidating this with the earlier TLB change. The main idea is to build the usable slaves array in the control path and use that array for slave selection during xmit operation. Measured performance in a setup with a bond of 4x1G NICs with 200 instances of netperf for the modes involved (3ad, xor, tlb) cmd: netperf -t TCP_RR -H <TargetHost> -l 60 -s 5 Mode TPS-Before TPS-After 802.3ad : 468,694 493,101 TLB (lb=0): 392,583 392,965 XOR : 475,696 484,517 Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-30bonding: make global bonding stats more reliableAndy Gospodarek
As the code stands today, bonding stats are based simply on the stats from the member interfaces. If a member was to be removed from a bond, the stats would instantly drop. This would be confusing to an admin would would suddonly see interface stats drop while traffic is still flowing. In addition to preventing the stats drops mentioned above, new members will now be added to the bond and only traffic received after the member was added to the bond will be counted as part of bonding stats. Bonding counters will also be updated when any slaves are dropped to make sure the reported stats are reliable. v2: Changes suggested by Nik to properly allocate/free stats memory. v3: Properly destroy workqueue and fix netlink configuration path. v4: Moved cached stats into bonding and slave structs as there does not seem to be a complexity/performance benefit to using alloc'd memory vs in-struct memory. Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: arch/mips/net/bpf_jit.c drivers/net/can/flexcan.c Both the flexcan and MIPS bpf_jit conflicts were cases of simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-22bonding: remove the unnecessary notes for bond_xmit_broadcast()dingtianhong
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-22bonding: slight optimization for bond_xmit_roundrobin()dingtianhong
When the slave is the curr_active_slave, no need to check whether the slave is active or not, it is always active. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-15bonding: consolidate ASSERT_RTNL()s and remove the unnecessaryNikolay Aleksandrov
Consolidate the calls to ASSERT_RTNL() before bond_select_active_slave() inside bond_select_active_slave() itself and remove the ASSERT_RTNL() from bond_hw_addr_swap() as it's not exported and its only caller - bond_change_active_slave() already has an ASSERT_RTNL(). Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-15bonding: trivial: style and comment fixesNikolay Aleksandrov
First adjust a couple of locking comments that were left inaccurate, then adjust comments to use the netdev styling and remove extra new lines where necessary and add a couple of new lines between declarations and code. These are all trivial styling changes, no functional change. Also removed a couple of outdated or obvious comments. This patch is by no means a complete fix of all netdev style violations but it gets the bonding closer. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13bonding: fix div by zero while enslaving and transmittingNikolay Aleksandrov
The problem is that the slave is first linked and slave_cnt is incremented afterwards leading to a div by zero in the modes that use it as a modulus. What happens is that in bond_start_xmit() bond_has_slaves() is used to evaluate further transmission and it becomes true after the slave is linked in, but when slave_cnt is used in the xmit path it is still 0, so fetch it once and transmit based on that. Since it is used only in round-robin and XOR modes, the fix is only for them. Thanks to Eric Dumazet for pointing out the fault in my first try to fix this. Call trace (took it out of net-next kernel, but it's the same with net): [46934.330038] divide error: 0000 [#1] SMP [46934.330041] Modules linked in: bonding(O) 9p fscache snd_hda_codec_generic crct10dif_pclmul [46934.330041] bond0: Enslaving eth1 as an active interface with an up link [46934.330051] ppdev joydev crc32_pclmul crc32c_intel 9pnet_virtio ghash_clmulni_intel snd_hda_intel 9pnet snd_hda_controller parport_pc serio_raw pcspkr snd_hda_codec parport virtio_balloon virtio_console snd_hwdep snd_pcm pvpanic i2c_piix4 snd_timer i2ccore snd soundcore virtio_blk virtio_net virtio_pci virtio_ring virtio ata_generic pata_acpi floppy [last unloaded: bonding] [46934.330053] CPU: 1 PID: 3382 Comm: ping Tainted: G O 3.17.0-rc4+ #27 [46934.330053] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [46934.330054] task: ffff88005aebf2c0 ti: ffff88005b728000 task.ti: ffff88005b728000 [46934.330059] RIP: 0010:[<ffffffffa0198c33>] [<ffffffffa0198c33>] bond_start_xmit+0x1c3/0x450 [bonding] [46934.330060] RSP: 0018:ffff88005b72b7f8 EFLAGS: 00010246 [46934.330060] RAX: 0000000000000679 RBX: ffff88004b077000 RCX: 000000000000002a [46934.330061] RDX: 0000000000000000 RSI: ffff88004b3f0500 RDI: ffff88004b077940 [46934.330061] RBP: ffff88005b72b830 R08: 00000000000000c0 R09: ffff88004a83e000 [46934.330062] R10: 000000000000ffff R11: ffff88004b1f12c0 R12: ffff88004b3f0500 [46934.330062] R13: ffff88004b3f0500 R14: 000000000000002a R15: ffff88004b077940 [46934.330063] FS: 00007fbd91a4c740(0000) GS:ffff88005f080000(0000) knlGS:0000000000000000 [46934.330064] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [46934.330064] CR2: 00007f803a8bb000 CR3: 000000004b2c9000 CR4: 00000000000406e0 [46934.330069] Stack: [46934.330071] ffffffff811e6169 00000000e772fa05 ffff88004b077000 ffff88004b3f0500 [46934.330072] ffffffff81d17d18 000000000000002a 0000000000000000 ffff88005b72b8a0 [46934.330073] ffffffff81620108 ffffffff8161fe0e ffff88005b72b8c4 ffff88005b302000 [46934.330073] Call Trace: [46934.330077] [<ffffffff811e6169>] ? __kmalloc_node_track_caller+0x119/0x300 [46934.330084] [<ffffffff81620108>] dev_hard_start_xmit+0x188/0x410 [46934.330086] [<ffffffff8161fe0e>] ? harmonize_features+0x2e/0x90 [46934.330088] [<ffffffff81620b06>] __dev_queue_xmit+0x456/0x590 [46934.330089] [<ffffffff81620c50>] dev_queue_xmit+0x10/0x20 [46934.330090] [<ffffffff8168f022>] arp_xmit+0x22/0x60 [46934.330091] [<ffffffff8168f090>] arp_send.part.16+0x30/0x40 [46934.330092] [<ffffffff8168f1e5>] arp_solicit+0x115/0x2b0 [46934.330094] [<ffffffff8160b5d7>] ? copy_skb_header+0x17/0xa0 [46934.330096] [<ffffffff8162875a>] neigh_probe+0x4a/0x70 [46934.330097] [<ffffffff8162979c>] __neigh_event_send+0xac/0x230 [46934.330098] [<ffffffff8162a00b>] neigh_resolve_output+0x13b/0x220 [46934.330100] [<ffffffff8165f120>] ? ip_forward_options+0x1c0/0x1c0 [46934.330101] [<ffffffff81660478>] ip_finish_output+0x1f8/0x860 [46934.330102] [<ffffffff81661f08>] ip_output+0x58/0x90 [46934.330103] [<ffffffff81661602>] ? __ip_local_out+0xa2/0xb0 [46934.330104] [<ffffffff81661640>] ip_local_out_sk+0x30/0x40 [46934.330105] [<ffffffff81662a66>] ip_send_skb+0x16/0x50 [46934.330106] [<ffffffff81662ad3>] ip_push_pending_frames+0x33/0x40 [46934.330107] [<ffffffff8168854c>] raw_sendmsg+0x88c/0xa30 [46934.330110] [<ffffffff81612b31>] ? skb_recv_datagram+0x41/0x60 [46934.330111] [<ffffffff816875a9>] ? raw_recvmsg+0xa9/0x1f0 [46934.330113] [<ffffffff816978d4>] inet_sendmsg+0x74/0xc0 [46934.330114] [<ffffffff81697a9b>] ? inet_recvmsg+0x8b/0xb0 [46934.330115] bond0: Adding slave eth2 [46934.330116] [<ffffffff8160357c>] sock_sendmsg+0x9c/0xe0 [46934.330118] [<ffffffff81603248>] ? move_addr_to_kernel.part.20+0x28/0x80 [46934.330121] [<ffffffff811b4477>] ? might_fault+0x47/0x50 [46934.330122] [<ffffffff816039b9>] ___sys_sendmsg+0x3a9/0x3c0 [46934.330125] [<ffffffff8144a14a>] ? n_tty_write+0x3aa/0x530 [46934.330127] [<ffffffff810d1ae4>] ? __wake_up+0x44/0x50 [46934.330129] [<ffffffff81242b38>] ? fsnotify+0x238/0x310 [46934.330130] [<ffffffff816048a1>] __sys_sendmsg+0x51/0x90 [46934.330131] [<ffffffff816048f2>] SyS_sendmsg+0x12/0x20 [46934.330134] [<ffffffff81738b29>] system_call_fastpath+0x16/0x1b [46934.330144] Code: 48 8b 10 4c 89 ee 4c 89 ff e8 aa bc ff ff 31 c0 e9 1a ff ff ff 0f 1f 00 4c 89 ee 4c 89 ff e8 65 fb ff ff 31 d2 4c 89 ee 4c 89 ff <f7> b3 64 09 00 00 e8 02 bd ff ff 31 c0 e9 f2 fe ff ff 0f 1f 00 [46934.330146] RIP [<ffffffffa0198c33>] bond_start_xmit+0x1c3/0x450 [bonding] [46934.330146] RSP <ffff88005b72b7f8> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> Fixes: 278b208375 ("bonding: initial RCU conversion") Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13bonding: adjust locking commentsNikolay Aleksandrov
Now that locks have been removed, remove some unnecessary comments and adjust others to reflect reality. Also add a comment to "mode_lock" to describe its current users and give a brief summary why they need it. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>