aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/net/ethernet/mellanox
AgeCommit message (Collapse)Author
2020-08-05mlxsw: core: Free EMAD transactions using kfree_rcu()Ido Schimmel
[ Upstream commit 3c8ce24b037648a5a15b85888b259a74b05ff97d ] The lifetime of EMAD transactions (i.e., 'struct mlxsw_reg_trans') is managed using RCU. They are freed using kfree_rcu() once the transaction ends. However, in case the transaction failed it is freed immediately after being removed from the active transactions list. This is problematic because it is still possible for a different CPU to dereference the transaction from an RCU read-side critical section while traversing the active transaction list in mlxsw_emad_rx_listener_func(). In which case, a use-after-free is triggered [1]. Fix this by freeing the transaction after a grace period by calling kfree_rcu(). [1] BUG: KASAN: use-after-free in mlxsw_emad_rx_listener_func+0x969/0xac0 drivers/net/ethernet/mellanox/mlxsw/core.c:671 Read of size 8 at addr ffff88800b7964e8 by task syz-executor.2/2881 CPU: 0 PID: 2881 Comm: syz-executor.2 Not tainted 5.8.0-rc4+ #44 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xf6/0x16e lib/dump_stack.c:118 print_address_description.constprop.0+0x1c/0x250 mm/kasan/report.c:383 __kasan_report mm/kasan/report.c:513 [inline] kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530 mlxsw_emad_rx_listener_func+0x969/0xac0 drivers/net/ethernet/mellanox/mlxsw/core.c:671 mlxsw_core_skb_receive+0x571/0x700 drivers/net/ethernet/mellanox/mlxsw/core.c:2061 mlxsw_pci_cqe_rdq_handle drivers/net/ethernet/mellanox/mlxsw/pci.c:595 [inline] mlxsw_pci_cq_tasklet+0x12a6/0x2520 drivers/net/ethernet/mellanox/mlxsw/pci.c:651 tasklet_action_common.isra.0+0x13f/0x3e0 kernel/softirq.c:550 __do_softirq+0x223/0x964 kernel/softirq.c:292 asm_call_on_stack+0x12/0x20 arch/x86/entry/entry_64.S:711 </IRQ> __run_on_irqstack arch/x86/include/asm/irq_stack.h:22 [inline] run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:48 [inline] do_softirq_own_stack+0x109/0x140 arch/x86/kernel/irq_64.c:77 invoke_softirq kernel/softirq.c:387 [inline] __irq_exit_rcu kernel/softirq.c:417 [inline] irq_exit_rcu+0x16f/0x1a0 kernel/softirq.c:429 sysvec_apic_timer_interrupt+0x4e/0xd0 arch/x86/kernel/apic/apic.c:1091 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:587 RIP: 0010:arch_local_irq_restore arch/x86/include/asm/irqflags.h:85 [inline] RIP: 0010:__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:160 [inline] RIP: 0010:_raw_spin_unlock_irqrestore+0x3b/0x40 kernel/locking/spinlock.c:191 Code: e8 2a c3 f4 fc 48 89 ef e8 12 96 f5 fc f6 c7 02 75 11 53 9d e8 d6 db 11 fd 65 ff 0d 1f 21 b3 56 5b 5d c3 e8 a7 d7 11 fd 53 9d <eb> ed 0f 1f 00 55 48 89 fd 65 ff 05 05 21 b3 56 ff 74 24 08 48 8d RSP: 0018:ffff8880446ffd80 EFLAGS: 00000286 RAX: 0000000000000006 RBX: 0000000000000286 RCX: 0000000000000006 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffa94ecea9 RBP: ffff888012934408 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000001 R11: fffffbfff57be301 R12: 1ffff110088dffc1 R13: ffff888037b817c0 R14: ffff88802442415a R15: ffff888024424000 __do_sys_perf_event_open+0x1b5d/0x2bd0 kernel/events/core.c:11874 do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:384 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x473dbd Code: Bad RIP value. RSP: 002b:00007f21e5e9cc28 EFLAGS: 00000246 ORIG_RAX: 000000000000012a RAX: ffffffffffffffda RBX: 000000000057bf00 RCX: 0000000000473dbd RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000040 RBP: 000000000057bf00 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000003 R11: 0000000000000246 R12: 000000000057bf0c R13: 00007ffd0493503f R14: 00000000004d0f46 R15: 00007f21e5e9cd80 Allocated by task 871: save_stack+0x1b/0x40 mm/kasan/common.c:48 set_track mm/kasan/common.c:56 [inline] __kasan_kmalloc mm/kasan/common.c:494 [inline] __kasan_kmalloc.constprop.0+0xc2/0xd0 mm/kasan/common.c:467 kmalloc include/linux/slab.h:555 [inline] kzalloc include/linux/slab.h:669 [inline] mlxsw_core_reg_access_emad+0x70/0x1410 drivers/net/ethernet/mellanox/mlxsw/core.c:1812 mlxsw_core_reg_access+0xeb/0x540 drivers/net/ethernet/mellanox/mlxsw/core.c:1991 mlxsw_sp_port_get_hw_xstats+0x335/0x7e0 drivers/net/ethernet/mellanox/mlxsw/spectrum.c:1130 update_stats_cache+0xf4/0x140 drivers/net/ethernet/mellanox/mlxsw/spectrum.c:1173 process_one_work+0xa3e/0x17a0 kernel/workqueue.c:2269 worker_thread+0x9e/0x1050 kernel/workqueue.c:2415 kthread+0x355/0x470 kernel/kthread.c:291 ret_from_fork+0x22/0x30 arch/x86/entry/entry_64.S:293 Freed by task 871: save_stack+0x1b/0x40 mm/kasan/common.c:48 set_track mm/kasan/common.c:56 [inline] kasan_set_free_info mm/kasan/common.c:316 [inline] __kasan_slab_free+0x12c/0x170 mm/kasan/common.c:455 slab_free_hook mm/slub.c:1474 [inline] slab_free_freelist_hook mm/slub.c:1507 [inline] slab_free mm/slub.c:3072 [inline] kfree+0xe6/0x320 mm/slub.c:4052 mlxsw_core_reg_access_emad+0xd45/0x1410 drivers/net/ethernet/mellanox/mlxsw/core.c:1819 mlxsw_core_reg_access+0xeb/0x540 drivers/net/ethernet/mellanox/mlxsw/core.c:1991 mlxsw_sp_port_get_hw_xstats+0x335/0x7e0 drivers/net/ethernet/mellanox/mlxsw/spectrum.c:1130 update_stats_cache+0xf4/0x140 drivers/net/ethernet/mellanox/mlxsw/spectrum.c:1173 process_one_work+0xa3e/0x17a0 kernel/workqueue.c:2269 worker_thread+0x9e/0x1050 kernel/workqueue.c:2415 kthread+0x355/0x470 kernel/kthread.c:291 ret_from_fork+0x22/0x30 arch/x86/entry/entry_64.S:293 The buggy address belongs to the object at ffff88800b796400 which belongs to the cache kmalloc-512 of size 512 The buggy address is located 232 bytes inside of 512-byte region [ffff88800b796400, ffff88800b796600) The buggy address belongs to the page: page:ffffea00002de500 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 head:ffffea00002de500 order:2 compound_mapcount:0 compound_pincount:0 flags: 0x100000000010200(slab|head) raw: 0100000000010200 dead000000000100 dead000000000122 ffff88806c402500 raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88800b796380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88800b796400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff88800b796480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff88800b796500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88800b796580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Fixes: caf7297e7ab5 ("mlxsw: core: Introduce support for asynchronous EMAD register access") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-05mlxsw: core: Increase scope of RCU read-side critical sectionIdo Schimmel
[ Upstream commit 7d8e8f3433dc8d1dc87c1aabe73a154978fb4c4d ] The lifetime of the Rx listener item ('rxl_item') is managed using RCU, but is dereferenced outside of RCU read-side critical section, which can lead to a use-after-free. Fix this by increasing the scope of the RCU read-side critical section. Fixes: 93c1edb27f9e ("mlxsw: Introduce Mellanox switch driver core") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-05mlx4: disable device on shutdownJakub Kicinski
[ Upstream commit 3cab8c65525920f00d8f4997b3e9bb73aecb3a8e ] It appears that not disabling a PCI device on .shutdown may lead to a Hardware Error with particular (perhaps buggy) BIOS versions: mlx4_en: eth0: Close port called mlx4_en 0000:04:00.0: removed PHC reboot: Restarting system {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1 {1}[Hardware Error]: event severity: fatal {1}[Hardware Error]: Error 0, type: fatal {1}[Hardware Error]: section_type: PCIe error {1}[Hardware Error]: port_type: 4, root port {1}[Hardware Error]: version: 1.16 {1}[Hardware Error]: command: 0x4010, status: 0x0143 {1}[Hardware Error]: device_id: 0000:00:02.2 {1}[Hardware Error]: slot: 0 {1}[Hardware Error]: secondary_bus: 0x04 {1}[Hardware Error]: vendor_id: 0x8086, device_id: 0x2f06 {1}[Hardware Error]: class_code: 000604 {1}[Hardware Error]: bridge: secondary_status: 0x2000, control: 0x0003 {1}[Hardware Error]: aer_uncor_status: 0x00100000, aer_uncor_mask: 0x00000000 {1}[Hardware Error]: aer_uncor_severity: 0x00062030 {1}[Hardware Error]: TLP Header: 40000018 040000ff 791f4080 00000000 [hw error repeats] Kernel panic - not syncing: Fatal hardware error! CPU: 0 PID: 2189 Comm: reboot Kdump: loaded Not tainted 5.6.x-blabla #1 Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 05/05/2017 Fix the mlx4 driver. This is a very similar problem to what had been fixed in: commit 0d98ba8d70b0 ("scsi: hpsa: disable device during shutdown") to address https://bugzilla.kernel.org/show_bug.cgi?id=199779. Fixes: 2ba5fbd62b25 ("net/mlx4_core: Handle AER flow properly") Reported-by: Jake Lawrence <lawja@fb.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-07-29mlxsw: destroy workqueue when trap_register in mlxsw_emad_initLiu Jian
[ Upstream commit 5dbaeb87f2b309936be0aeae00cbc9e7f20ab296 ] When mlxsw_core_trap_register fails in mlxsw_emad_init, destroy_workqueue() shouled be called to destroy mlxsw_core->emad_wq. Fixes: d965465b60ba ("mlxsw: core: Fix possible deadlock") Signed-off-by: Liu Jian <liujian56@huawei.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-07-22mlxsw: spectrum_router: Remove inappropriate usage of WARN_ON()Ido Schimmel
[ Upstream commit d9d5420273997664a1c09151ca86ac993f2f89c1 ] We should not trigger a warning when a memory allocation fails. Remove the WARN_ON(). The warning is constantly triggered by syzkaller when it is injecting faults: [ 2230.758664] FAULT_INJECTION: forcing a failure. [ 2230.758664] name failslab, interval 1, probability 0, space 0, times 0 [ 2230.762329] CPU: 3 PID: 1407 Comm: syz-executor.0 Not tainted 5.8.0-rc2+ #28 ... [ 2230.898175] WARNING: CPU: 3 PID: 1407 at drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:6265 mlxsw_sp_router_fib_event+0xfad/0x13e0 [ 2230.898179] Kernel panic - not syncing: panic_on_warn set ... [ 2230.898183] CPU: 3 PID: 1407 Comm: syz-executor.0 Not tainted 5.8.0-rc2+ #28 [ 2230.898190] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 Fixes: 3057224e014c ("mlxsw: spectrum_router: Implement FIB offload in deferred work") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-03net/mlx4_core: fix a memory leak bug.Qiushi Wu
commit febfd9d3c7f74063e8e630b15413ca91b567f963 upstream. In function mlx4_opreq_action(), pointer "mailbox" is not released, when mlx4_cmd_box() return and error, causing a memory leak bug. Fix this issue by going to "out" label, mlx4_free_cmd_mailbox() can free this pointer. Fixes: fe6f700d6cbb ("net/mlx4_core: Respond to operation request by firmware") Signed-off-by: Qiushi Wu <wu000273@umn.edu> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-03net/mlx5e: Update netdev txq on completions during closureMoshe Shemesh
[ Upstream commit 5e911e2c06bd8c17df29147a5e2d4b17fafda024 ] On sq closure when we free its descriptors, we should also update netdev txq on completions which would not arrive. Otherwise if we reopen sqs and attach them back, for example on fw fatal recovery flow, we may get tx timeout. Fixes: 29429f3300a3 ("net/mlx5e: Timeout if SQ doesn't flush during close") Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-03net/mlx5: Add command entry handling completionMoshe Shemesh
[ Upstream commit 17d00e839d3b592da9659c1977d45f85b77f986a ] When FW response to commands is very slow and all command entries in use are waiting for completion we can have a race where commands can get timeout before they get out of the queue and handled. Timeout completion on uninitialized command will cause releasing command's buffers before accessing it for initialization and then we will get NULL pointer exception while trying access it. It may also cause releasing buffers of another command since we may have timeout completion before even allocating entry index for this command. Add entry handling completion to avoid this race. Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-20net: ipv6_stub: use ip6_dst_lookup_flow instead of ip6_dst_lookupSabrina Dubroca
commit 6c8991f41546c3c472503dff1ea9daaddf9331c2 upstream. ipv6_stub uses the ip6_dst_lookup function to allow other modules to perform IPv6 lookups. However, this function skips the XFRM layer entirely. All users of ipv6_stub->ip6_dst_lookup use ip_route_output_flow (via the ip_route_output_key and ip_route_output helpers) for their IPv4 lookups, which calls xfrm_lookup_route(). This patch fixes this inconsistent behavior by switching the stub to ip6_dst_lookup_flow, which also calls xfrm_lookup_route(). This requires some changes in all the callers, as these two functions take different arguments and have different return types. Fixes: 5f81bd2e5d80 ("ipv6: export a stub for IPv6 symbols used by vxlan") Reported-by: Xiumei Mu <xmu@redhat.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 4.14: - Drop change in lwt_bpf.c - Delete now-unused "ret" in mlx5e_route_lookup_ipv6() - Initialise "out_dev" in mlx5e_create_encap_header_ipv6() to avoid introducing a spurious "may be used uninitialised" warning - Adjust filenames, context, indentation] Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-20net/mlx5: Fix command entry leak in Internal Error StateMoshe Shemesh
[ Upstream commit cece6f432cca9f18900463ed01b97a152a03600a ] Processing commands by cmd_work_handler() while already in Internal Error State will result in entry leak, since the handler process force completion without doorbell. Forced completion doesn't release the entry and event completion will never arrive, so entry should be released. Fixes: 73dd3a4839c1 ("net/mlx5: Avoid using pending command interface slots") Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-20net/mlx5: Fix forced completion access non initialized command entryMoshe Shemesh
[ Upstream commit f3cb3cebe26ed4c8036adbd9448b372129d3c371 ] mlx5_cmd_flush() will trigger forced completions to all valid command entries. Triggered by an asynch event such as fast teardown it can happen at any stage of the command, including command initialization. It will trigger forced completion and that can lead to completion on an uninitialized command entry. Setting MLX5_CMD_ENT_STATE_PENDING_COMP only after command entry is initialized will ensure force completion is treated only if command entry is initialized. Fixes: 73dd3a4839c1 ("net/mlx5: Avoid using pending command interface slots") Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-20net/mlx4_core: Fix use of ENOSPC around mlx4_counter_alloc()Tariq Toukan
[ Upstream commit 40e473071dbad04316ddc3613c3a3d1c75458299 ] When ENOSPC is set the idx is still valid and gets set to the global MLX4_SINK_COUNTER_INDEX. However gcc's static analysis cannot tell that ENOSPC is impossible from mlx4_cmd_imm() and gives this warning: drivers/net/ethernet/mellanox/mlx4/main.c:2552:28: warning: 'idx' may be used uninitialized in this function [-Wmaybe-uninitialized] 2552 | priv->def_counter[port] = idx; Also, when ENOSPC is returned mlx4_allocate_default_counters should not fail. Fixes: 6de5f7f6a1fa ("net/mlx4_core: Allocate default counter per port") Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-13mlxsw: spectrum_flower: Do not stop at FLOW_ACTION_VLAN_MANGLEPetr Machata
[ Upstream commit ccfc569347f870830e7c7cf854679a06cf9c45b5 ] The handler for FLOW_ACTION_VLAN_MANGLE ends by returning whatever the lower-level function that it calls returns. If there are more actions lined up after this action, those are never offloaded. Fix by only bailing out when the called function returns an error. Fixes: a150201a70da ("mlxsw: spectrum: Add support for vlan modify TC action") Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-28mlxsw: spectrum_dpipe: Add missing error pathIdo Schimmel
[ Upstream commit 3a99cbb6fa7bca1995586ec2dc21b0368aad4937 ] In case devlink_dpipe_entry_ctx_prepare() failed, release RTNL that was previously taken and free the memory allocated by mlxsw_sp_erif_entry_prepare(). Fixes: 2ba5999f009d ("mlxsw: spectrum: Add Support for erif table entries access") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-27net/mlx5: Take lock with IRQs disabled to avoid deadlockMoni Shoua
[ Upstream commit 33814e5d127e21f53b52e17b0722c1b57d4f4d29 ] The lock in qp_table might be taken from process context or from interrupt context. This may lead to a deadlock unless it is taken with IRQs disabled. Discovered by lockdep ================================ WARNING: inconsistent lock state 4.20.0-rc6 -------------------------------- inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} python/12572 [HC1[1]:SC0[0]:HE0:SE1] takes: 00000000052a4df4 (&(&table->lock)->rlock#2){?.+.}, /0x50 [mlx5_core] {HARDIRQ-ON-W} state was registered at: _raw_spin_lock+0x33/0x70 mlx5_get_rsc+0x1a/0x50 [mlx5_core] mlx5_ib_eqe_pf_action+0x493/0x1be0 [mlx5_ib] process_one_work+0x90c/0x1820 worker_thread+0x87/0xbb0 kthread+0x320/0x3e0 ret_from_fork+0x24/0x30 irq event stamp: 103928 hardirqs last enabled at (103927): [] nk+0x1a/0x1c hardirqs last disabled at (103928): [] unk+0x1a/0x1c softirqs last enabled at (103924): [] tcp_sendmsg+0x31/0x40 softirqs last disabled at (103922): [] 80 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(&table->lock)->rlock#2); lock(&(&table->lock)->rlock#2); *** DEADLOCK *** Fixes: 032080ab43ac ("IB/mlx5: Lock QP during page fault handling") Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-27mlxsw: reg: QEEC: Add minimum shaper fieldsPetr Machata
[ Upstream commit 8b931821aa04823e2e5df0ae93937baabbd23286 ] Add QEEC.mise (minimum shaper enable) and QEEC.min_shaper_rate to enable configuration of minimum shaper. Increase the QEEC length to 0x20 as well: that's the length that the register has had for a long time now, but with the configurations that mlxsw typically exercises, the firmware tolerated 0x1C-sized packets. With mise=true however, FW rejects packets unless they have the full required length. Fixes: b9b7cee40579 ("mlxsw: reg: Add QoS ETS Element Configuration register") Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-04net/mlxfw: Fix out-of-memory error in mfa2 flash burningVladyslav Tarasiuk
[ Upstream commit a5bcd72e054aabb93ddc51ed8cde36a5bfc50271 ] The burning process requires to perform internal allocations of large chunks of memory. This memory doesn't need to be contiguous and can be safely allocated by vzalloc() instead of kzalloc(). This patch changes such allocation to avoid possible out-of-memory failure. Fixes: 410ed13cae39 ("Add the mlxfw module for Mellanox firmware flash process") Signed-off-by: Vladyslav Tarasiuk <vladyslavt@mellanox.com> Reviewed-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Tested-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-17net/mlx5e: Fix SFF 8472 eeprom lengthEran Ben Elisha
[ Upstream commit c431f8597863a91eea6024926e0c1b179cfa4852 ] SFF 8472 eeprom length is 512 bytes. Fix module info return value to support 512 bytes read. Fixes: ace329f4ab3b ("net/mlx5e: ethtool, Remove unsupported SFP EEPROM high pages query") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-17mlxsw: spectrum_router: Refresh nexthop neighbour when it becomes deadIdo Schimmel
[ Upstream commit 83d5782681cc12b3d485a83cb34c46b2445f510c ] The driver tries to periodically refresh neighbours that are used to reach nexthops. This is done by periodically calling neigh_event_send(). However, if the neighbour becomes dead, there is nothing we can do to return it to a connected state and the above function call is basically a NOP. This results in the nexthop never being written to the device's adjacency table and therefore never used to forward packets. Fix this by dropping our reference from the dead neighbour and associating the nexthop with a new neigbhour which we will try to refresh. Fixes: a7ff87acd995 ("mlxsw: spectrum_router: Implement next-hop routing") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Alex Veber <alexve@mellanox.com> Tested-by: Alex Veber <alexve@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-17mlxsw: spectrum_router: Relax GRE decap matching checkNir Dotan
[ Upstream commit da93d2913fdf43d5cde3c5a53ac9cc29684d5c7c ] GRE decap offload is configured when local routes prefix correspond to the local address of one of the offloaded GRE tunnels. The matching check was found to be too strict, such that for a flat GRE configuration, in which the overlay and underlay traffic share the same non-default VRF, decap flow was not offloaded. Relax the check for decap flow offloading. A match occurs if the local address of the tunnel matches the local route address while both share the same VRF table. Fixes: 4607f6d26950 ("mlxsw: spectrum_router: Support IPv4 underlay decap") Signed-off-by: Nir Dotan <nird@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-17net/mlx4_core: Fix return codes of unsupported operationsErez Alfasi
[ Upstream commit 95aac2cdafd8c8298c9b2589c52f44db0d824e0e ] Functions __set_port_type and mlx4_check_port_params returned -EINVAL while the proper return code is -EOPNOTSUPP as a result of an unsupported operation. All drivers should generate this and all users should check for it when detecting an unsupported functionality. Signed-off-by: Erez Alfasi <ereza@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-17net/mlx5: Release resource on error flowMoni Shoua
[ Upstream commit 698114968a22f6c0c9f42e983ba033cc36bb7217 ] Fix reference counting leakage when the event handler aborts due to an unsupported event for the resource type. Fixes: a14c2d4beee5 ("net/mlx5_core: Warn on unsupported events of QP/RQ/SQ") Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-05net/mlx5: Continue driver initialization despite debugfs failureLeon Romanovsky
[ Upstream commit 199fa087dc6b503baad06712716fac645a983e8a ] The failure to create debugfs entry is unpleasant event, but not enough to abort drier initialization. Align the mlx5_core code to debugfs design and continue execution whenever debugfs_create_dir() successes or not. Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-01net/mlxfw: Verify FSM error code translation doesn't exceed array sizeEran Ben Elisha
[ Upstream commit 30e9e0550bf693c94bc15827781fe42dd60be634 ] Array mlxfw_fsm_state_err_str contains value to string translation, when values are provided by mlxfw_dev. If value is larger than MLXFW_FSM_STATE_ERR_MAX, return "unknown error" as expected instead of reading an address than exceed array size. Fixes: 410ed13cae39 ("Add the mlxfw module for Mellanox firmware flash process") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-01net/mlx5e: Fix set vf link state error flowRoi Dayan
[ Upstream commit 751021218f7e66ee9bbaa2be23056e447cd75ec4 ] Before this commit the ndo always returned success. Fix that. Fixes: 1ab2068a4c66 ("net/mlx5: Implement vports admin state backup/restore") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-01net/mlx4_en: fix mlx4 ethtool -N insertionLuigi Rizzo
[ Upstream commit 34e59836565e36fade1464e054a3551c1a0364be ] ethtool expects ETHTOOL_GRXCLSRLALL to set ethtool_rxnfc->data with the total number of entries in the rx classifier table. Surprisingly, mlx4 is missing this part (in principle ethtool could still move forward and try the insert). Tested: compiled and run command: phh13:~# ethtool -N eth1 flow-type udp4 queue 4 Added rule with ID 255 Signed-off-by: Luigi Rizzo <lrizzo@google.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-24mlxsw: spectrum_switchdev: Check notification relevance based on upper deviceIdo Schimmel
[ Upstream commit 5050f6ae253ad1307af3486c26fc4f94287078b7 ] VxLAN FDB updates are sent with the VxLAN device which is not our upper and will therefore be ignored by current code. Solve this by checking whether the upper device (bridge) is our upper. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-20mlxsw: spectrum: Init shaper for TCs 8..15Petr Machata
[ Upstream commit a9f36656b519a9a21309793c306941a3cd0eeb8f ] With introduction of MC-aware mode to mlxsw, it became necessary to configure TCs above 7 as well. There is now code in mlxsw to disable ETS for these higher classes, but disablement of max shaper was neglected. By default, max shaper is currently disabled to begin with, so the problem is just cosmetic. However, for symmetry, do like we do for ETS configuration, and call mlxsw_sp_port_ets_maxrate_set() for both TC i and i + 8. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-12net/mlx5: prevent memory leak in mlx5_fpga_conn_create_cqNavid Emamdoost
[ Upstream commit c8c2a057fdc7de1cd16f4baa51425b932a42eb39 ] In mlx5_fpga_conn_create_cq if mlx5_vector2eqn fails the allocated memory should be released. Fixes: 537a50574175 ("net/mlx5: FPGA, Add high-speed connection routines") Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-10net/mlx5e: Fix handling of compressed CQEs in case of low NAPI budgetMaxim Mikityanskiy
[ Upstream commit 9df86bdb6746d7fcfc2fda715f7a7c3d0ddb2654 ] When CQE compression is enabled, compressed CQEs use the following structure: a title is followed by one or many blocks, each containing 8 mini CQEs (except the last, which may contain fewer mini CQEs). Due to NAPI budget restriction, a complete structure is not always parsed in one NAPI run, and some blocks with mini CQEs may be deferred to the next NAPI poll call - we have the mlx5e_decompress_cqes_cont call in the beginning of mlx5e_poll_rx_cq. However, if the budget is extremely low, some blocks may be left even after that, but the code that follows the mlx5e_decompress_cqes_cont call doesn't check it and assumes that a new CQE begins, which may not be the case. In such cases, random memory corruptions occur. An extremely low NAPI budget of 8 is used when busy_poll or busy_read is active. This commit adds a check to make sure that the previous compressed CQE has been completely parsed after mlx5e_decompress_cqes_cont, otherwise it prevents a new CQE from being fetched in the middle of a compressed CQE. This commit fixes random crashes in __build_skb, __page_pool_put_page and other not-related-directly places, that used to happen when both CQE compression and busy_poll/busy_read were enabled. Fixes: 7219ab34f184 ("net/mlx5e: CQE compression") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-10net/mlx4_core: Dynamically set guaranteed amount of counters per VFEran Ben Elisha
[ Upstream commit e19868efea0c103f23b4b7e986fd0a703822111f ] Prior to this patch, the amount of counters guaranteed per VF in the resource tracker was MLX4_VF_COUNTERS_PER_PORT * MLX4_MAX_PORTS. It was set regardless if the VF was single or dual port. This caused several VFs to have no guaranteed counters although the system could satisfy their request. The fix is to dynamically guarantee counters, based on each VF specification. Fixes: 9de92c60beaa ("net/mlx4_core: Adjust counter grant policy in the resource tracker") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-06mlxsw: spectrum: Set LAG port collector only when activeNir Dotan
[ Upstream commit 48ebab31d424fbdb8ede8914923bec671a933246 ] The LAG port collecting (receive) function was mistakenly set when the port was registered as a LAG member, while it should be set only when the port collection state is set to true. Set LAG port to collecting when it is set to distributing, as described in the IEEE link aggregation standard coupled control mux machine state diagram. Signed-off-by: Nir Dotan <nird@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-05net/mlx5: Add device ID of upcoming BlueField-2Bodong Wang
[ Upstream commit d19a79ee38c8fda6d297e4227e80db8bf51c71a6 ] Add the device ID of upcoming BlueField-2 integrated ConnectX-6 Dx network controller. Its VFs will be using the generic VF device ID: 0x101e "ConnectX Family mlx5Gen Virtual Function". Fixes: 2e9d3e83ab82 ("net/mlx5: Update the list of the PCI supported devices") Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-25net/mlx5e: Use flow keys dissector to parse packets for ARFSMaxim Mikityanskiy
[ Upstream commit 405b93eb764367a670e729da18e54dc42db32620 ] The current ARFS code relies on certain fields to be set in the SKB (e.g. transport_header) and extracts IP addresses and ports by custom code that parses the packet. The necessary SKB fields, however, are not always set at that point, which leads to an out-of-bounds access. Use skb_flow_dissect_flow_keys() to get the necessary information reliably, fix the out-of-bounds access and reuse the code. Fixes: 18c908e477dc ("net/mlx5e: Add accelerated RFS support") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-25net/mlx5e: Only support tx/rx pause setting for port ownerHuy Nguyen
[ Upstream commit 466df6eb4a9e813b3cfc674363316450c57a89c5 ] Only support changing tx/rx pause frame setting if the net device is the vport group manager. Fixes: 3c2d18ef22df ("net/mlx5e: Support ethtool get/set_pauseparam") Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-25net/mlx4_en: fix a memory leak bugWenwen Wang
[ Upstream commit 48ec7014c56e5eb2fbf6f479896143622d834f3b ] In mlx4_en_config_rss_steer(), 'rss_map->indir_qp' is allocated through kzalloc(). After that, mlx4_qp_alloc() is invoked to configure RSS indirection. However, if mlx4_qp_alloc() fails, the allocated 'rss_map->indir_qp' is not deallocated, leading to a memory leak bug. To fix the above issue, add the 'qp_alloc_err' label to free 'rss_map->indir_qp'. Fixes: 4931c6ef04b4 ("net/mlx4_en: Optimized single ring steering") Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-09net/mlx5e: Prevent encap flow counter update async to user queryAriel Levkovich
[ Upstream commit 90bb769291161cf25a818d69cf608c181654473e ] This patch prevents a race between user invoked cached counters query and a neighbor last usage updater. The cached flow counter stats can be queried by calling "mlx5_fc_query_cached" which provides the number of bytes and packets that passed via this flow since the last time this counter was queried. It does so by reducting the last saved stats from the current, cached stats and then updating the last saved stats with the cached stats. It also provide the lastuse value for that flow. Since "mlx5e_tc_update_neigh_used_value" needs to retrieve the last usage time of encapsulation flows, it calls the flow counter query method periodically and async to user queries of the flow counter using cls_flower. This call is causing the driver to update the last reported bytes and packets from the cache and therefore, future user queries of the flow stats will return lower than expected number for bytes and packets since the last saved stats in the driver was updated async to the last saved stats in cls_flower. This causes wrong stats presentation of encapsulation flows to user. Since the neighbor usage updater only needs the lastuse stats from the cached counter, the fix is to use a dedicated lastuse query call that returns the lastuse value without synching between the cached stats and the last saved stats. Fixes: f6dfb4c3f216 ("net/mlx5e: Update neighbour 'used' state using HW flow rules counters") Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-09net/mlx5: Use reversed order when unregister devicesMark Zhang
[ Upstream commit 08aa5e7da6bce1a1963f63cf32c2e7ad434ad578 ] When lag is active, which is controlled by the bonded mlx5e netdev, mlx5 interface unregestering must happen in the reverse order where rdma is unregistered (unloaded) first, to guarantee all references to the lag context in hardware is removed, then remove mlx5e netdev interface which will cleanup the lag context from hardware. Without this fix during destroy of LAG interface, we observed following errors: * mlx5_cmd_check:752:(pid 12556): DESTROY_LAG(0x843) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0xe4ac33) * mlx5_cmd_check:752:(pid 12556): DESTROY_LAG(0x843) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0xa5aee8). Fixes: a31208b1e11d ("net/mlx5_core: New init and exit flow for mlx5_core") Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Mark Zhang <markz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-21mlxsw: spectrum: Disallow prio-tagged packets when PVID is removedIdo Schimmel
[ Upstream commit 4b14cc313f076c37b646cee06a85f0db59cf216c ] When PVID is removed from a bridge port, the Linux bridge drops both untagged and prio-tagged packets. Align mlxsw with this behavior. Fixes: 148f472da5db ("mlxsw: reg: Add the Switch Port Acceptable Frame Types register") Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-06-22mlxsw: spectrum: Prevent force of 56GAmit Cohen
[ Upstream commit 275e928f19117d22f6d26dee94548baf4041b773 ] Force of 56G is not supported by hardware in Ethernet devices. This configuration fails with a bad parameter error from firmware. Add check of this case. Instead of trying to set 56G with autoneg off, return a meaningful error. Fixes: 56ade8fe3fe1 ("mlxsw: spectrum: Add initial support for Spectrum ASIC") Signed-off-by: Amit Cohen <amitc@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-06-11net/mlx4_en: ethtool, Remove unsupported SFP EEPROM high pages queryErez Alfasi
[ Upstream commit 135dd9594f127c8a82d141c3c8430e9e2143216a ] Querying EEPROM high pages data for SFP module is currently not supported by our driver but is still tried, resulting in invalid FW queries. Set the EEPROM ethtool data length to 256 for SFP module to limit the reading for page 0 only and prevent invalid FW queries. Fixes: 7202da8b7f71 ("ethtool, net/mlx4_en: Cable info, get_module_info/eeprom ethtool support") Signed-off-by: Erez Alfasi <ereza@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-09net/mlx5: Allocate root ns memory using kzalloc to match kfreeParav Pandit
[ Upstream commit 25fa506b70cadb580c1e9cbd836d6417276d4bcd ] root ns is yet another fs core node which is freed using kfree() by tree_put_node(). Rest of the other fs core objects are also allocated using kmalloc variants. However, root ns memory is allocated using kvzalloc(). Hence allocate root ns memory using kzalloc(). Fixes: 2530236303d9e ("net/mlx5_core: Flow steering tree initialization") Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-25net/mlx4_core: Change the error print to info printYunjian Wang
[ Upstream commit 00f9fec48157f3734e52130a119846e67a12314b ] The error print within mlx4_flow_steer_promisc_add() should be a info print. Fixes: 592e49dda812 ('net/mlx4: Implement promiscuous mode with device managed flow-steering') Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-16mlxsw: core: Do not use WQ_MEM_RECLAIM for mlxsw workqueueIdo Schimmel
[ Upstream commit b442fed1b724af0de087912a5718ddde1b87acbb ] The workqueue is used to periodically update the networking stack about activity / statistics of various objects such as neighbours and TC actions. It should not be called as part of memory reclaim path, so remove the WQ_MEM_RECLAIM flag. Fixes: 3d5479e92087 ("mlxsw: core: Remove deprecated create_workqueue") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
2019-05-16mlxsw: core: Do not use WQ_MEM_RECLAIM for mlxsw ordered workqueueIdo Schimmel
[ Upstream commit 4af0699782e2cc7d0d89db9eb6f8844dd3df82dc ] The ordered workqueue is used to offload various objects such as routes and neighbours in the order they are notified. It should not be called as part of memory reclaim path, so remove the WQ_MEM_RECLAIM flag. This can also result in a warning [1], if a worker tries to flush a non-WQ_MEM_RECLAIM workqueue. [1] [97703.542861] workqueue: WQ_MEM_RECLAIM mlxsw_core_ordered:mlxsw_sp_router_fib6_event_work [mlxsw_spectrum] is flushing !WQ_MEM_RECLAIM events:rht_deferred_worker [97703.542884] WARNING: CPU: 1 PID: 32492 at kernel/workqueue.c:2605 check_flush_dependency+0xb5/0x130 ... [97703.542988] Hardware name: Mellanox Technologies Ltd. MSN3700C/VMOD0008, BIOS 5.11 10/10/2018 [97703.543049] Workqueue: mlxsw_core_ordered mlxsw_sp_router_fib6_event_work [mlxsw_spectrum] [97703.543061] RIP: 0010:check_flush_dependency+0xb5/0x130 ... [97703.543071] RSP: 0018:ffffb3f08137bc00 EFLAGS: 00010086 [97703.543076] RAX: 0000000000000000 RBX: ffff96e07740ae00 RCX: 0000000000000000 [97703.543080] RDX: 0000000000000094 RSI: ffffffff82dc1934 RDI: 0000000000000046 [97703.543084] RBP: ffffb3f08137bc20 R08: ffffffff82dc18a0 R09: 00000000000225c0 [97703.543087] R10: 0000000000000000 R11: 0000000000007eec R12: ffffffff816e4ee0 [97703.543091] R13: ffff96e06f6a5c00 R14: ffff96e077ba7700 R15: ffffffff812ab0c0 [97703.543097] FS: 0000000000000000(0000) GS:ffff96e077a80000(0000) knlGS:0000000000000000 [97703.543101] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [97703.543104] CR2: 00007f8cd135b280 CR3: 00000001e860e003 CR4: 00000000003606e0 [97703.543109] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [97703.543112] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [97703.543115] Call Trace: [97703.543129] __flush_work+0xbd/0x1e0 [97703.543137] ? __cancel_work_timer+0x136/0x1b0 [97703.543145] ? pwq_dec_nr_in_flight+0x49/0xa0 [97703.543154] __cancel_work_timer+0x136/0x1b0 [97703.543175] ? mlxsw_reg_trans_bulk_wait+0x145/0x400 [mlxsw_core] [97703.543184] cancel_work_sync+0x10/0x20 [97703.543191] rhashtable_free_and_destroy+0x23/0x140 [97703.543198] rhashtable_destroy+0xd/0x10 [97703.543254] mlxsw_sp_fib_destroy+0xb1/0xf0 [mlxsw_spectrum] [97703.543310] mlxsw_sp_vr_put+0xa8/0xc0 [mlxsw_spectrum] [97703.543364] mlxsw_sp_fib_node_put+0xbf/0x140 [mlxsw_spectrum] [97703.543418] ? mlxsw_sp_fib6_entry_destroy+0xe8/0x110 [mlxsw_spectrum] [97703.543475] mlxsw_sp_router_fib6_event_work+0x6cd/0x7f0 [mlxsw_spectrum] [97703.543484] process_one_work+0x1fd/0x400 [97703.543493] worker_thread+0x34/0x410 [97703.543500] kthread+0x121/0x140 [97703.543507] ? process_one_work+0x400/0x400 [97703.543512] ? kthread_park+0x90/0x90 [97703.543523] ret_from_fork+0x35/0x40 Fixes: a3832b31898f ("mlxsw: core: Create an ordered workqueue for FIB offload") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Semion Lisyansky <semionl@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
2019-05-16mlxsw: core: Do not use WQ_MEM_RECLAIM for EMAD workqueueIdo Schimmel
[ Upstream commit a8c133b06183c529c51cd0d54eb57d6b7078370c ] The EMAD workqueue is used to handle retransmission of EMAD packets that contain configuration data for the device's firmware. Given the workers need to allocate these packets and that the code is not called as part of memory reclaim path, remove the WQ_MEM_RECLAIM flag. Fixes: d965465b60ba ("mlxsw: core: Fix possible deadlock") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
2019-05-16mlxsw: spectrum_switchdev: Add MDB entries in prepare phaseIdo Schimmel
[ Upstream commit d4d0e40977ac450f32f2db5e4d8e23c9d2578899 ] The driver cannot guarantee in the prepare phase that it will be able to write an MDB entry to the device. In case the driver returned success during the prepare phase, but then failed to add the entry in the commit phase, a WARNING [1] will be generated by the switchdev core. Fix this by doing the work in the prepare phase instead. [1] [ 358.544486] swp12s0: Commit of object (id=2) failed. [ 358.550061] WARNING: CPU: 0 PID: 30 at net/switchdev/switchdev.c:281 switchdev_port_obj_add_now+0x9b/0xe0 [ 358.560754] CPU: 0 PID: 30 Comm: kworker/0:1 Not tainted 5.0.0-custom-13382-gf2449babf221 #1350 [ 358.570472] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016 [ 358.580582] Workqueue: events switchdev_deferred_process_work [ 358.587001] RIP: 0010:switchdev_port_obj_add_now+0x9b/0xe0 ... [ 358.614109] RSP: 0018:ffffa6b900d6fe18 EFLAGS: 00010286 [ 358.619943] RAX: 0000000000000000 RBX: ffff8b00797ff000 RCX: 0000000000000000 [ 358.627912] RDX: ffff8b00b7a1d4c0 RSI: ffff8b00b7a152e8 RDI: ffff8b00b7a152e8 [ 358.635881] RBP: ffff8b005c3f5bc0 R08: 000000000000022b R09: 0000000000000000 [ 358.643850] R10: 0000000000000000 R11: ffffa6b900d6fcc8 R12: 0000000000000000 [ 358.651819] R13: dead000000000100 R14: ffff8b00b65a23c0 R15: 0ffff8b00b7a2200 [ 358.659790] FS: 0000000000000000(0000) GS:ffff8b00b7a00000(0000) knlGS:0000000000000000 [ 358.668820] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 358.675228] CR2: 00007f00aad90de0 CR3: 00000001ca80d000 CR4: 00000000001006f0 [ 358.683188] Call Trace: [ 358.685918] switchdev_port_obj_add_deferred+0x13/0x60 [ 358.691655] switchdev_deferred_process+0x6b/0xf0 [ 358.696907] switchdev_deferred_process_work+0xa/0x10 [ 358.702548] process_one_work+0x1f5/0x3f0 [ 358.707022] worker_thread+0x28/0x3c0 [ 358.711099] ? process_one_work+0x3f0/0x3f0 [ 358.715768] kthread+0x10d/0x130 [ 358.719369] ? __kthread_create_on_node+0x180/0x180 [ 358.724815] ret_from_fork+0x35/0x40 Fixes: 3a49b4fde2a1 ("mlxsw: Adding layer 2 multicast support") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Alex Kushnarov <alexanderk@mellanox.com> Tested-by: Alex Kushnarov <alexanderk@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
2019-05-08net/mlx5: E-Switch, Fix esw manager vport indication for more vport commandsOmri Kahalon
[ Upstream commit eca4a928585ac08147e5cc8e2111ecbc6279ee31 ] Traditionally, the PF (Physical Function) which resides on vport 0 was the E-switch manager. Since the ECPF (Embedded CPU Physical Function), which resides on vport 0xfffe, was introduced as the E-Switch manager, the assumption that the E-switch manager is on vport 0 is incorrect. Since the eswitch code already uses the actual vport value, all we need is to always set other_vport=1. Signed-off-by: Omri Kahalon <omrik@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-05-02net/mlx5e: ethtool, Remove unsupported SFP EEPROM high pages queryErez Alfasi
[ Upstream commit ace329f4ab3ba434be2adf618073c752d083b524 ] Querying EEPROM high pages data for SFP module is currently not supported by our driver and yet queried, resulting in invalid FW queries. Set the EEPROM ethtool data length to 256 for SFP module will limit the reading for page 0 only and prevent invalid FW queries. Fixes: bb64143eee8c ("net/mlx5e: Add ethtool support for dump module EEPROM") Signed-off-by: Erez Alfasi <ereza@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02mlxsw: spectrum: Fix autoneg status in ethtoolAmit Cohen
[ Upstream commit 151f0dddbbfe4c35c9c5b64873115aafd436af9d ] If link is down and autoneg is set to on/off, the status in ethtool does not change. The reason is when the link is down the function returns with zero before changing autoneg value. Move the checking of link state (up/down) to be performed after setting autoneg value, in order to be sure that autoneg will change in any case. Fixes: 56ade8fe3fe1 ("mlxsw: spectrum: Add initial support for Spectrum ASIC") Signed-off-by: Amit Cohen <amitc@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>