aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/net/ethernet/mellanox
AgeCommit message (Collapse)Author
2024-05-02net/mlx5e: Fix a race in command alloc flowShifeng Li
commit 8f5100da56b3980276234e812ce98d8f075194cd upstream. Fix a cmd->ent use after free due to a race on command entry. Such race occurs when one of the commands releases its last refcount and frees its index and entry while another process running command flush flow takes refcount to this command entry. The process which handles commands flush may see this command as needed to be flushed if the other process allocated a ent->idx but didn't set ent to cmd->ent_arr in cmd_work_handler(). Fix it by moving the assignment of cmd->ent_arr into the spin lock. [70013.081955] BUG: KASAN: use-after-free in mlx5_cmd_trigger_completions+0x1e2/0x4c0 [mlx5_core] [70013.081967] Write of size 4 at addr ffff88880b1510b4 by task kworker/26:1/1433361 [70013.081968] [70013.082028] Workqueue: events aer_isr [70013.082053] Call Trace: [70013.082067] dump_stack+0x8b/0xbb [70013.082086] print_address_description+0x6a/0x270 [70013.082102] kasan_report+0x179/0x2c0 [70013.082173] mlx5_cmd_trigger_completions+0x1e2/0x4c0 [mlx5_core] [70013.082267] mlx5_cmd_flush+0x80/0x180 [mlx5_core] [70013.082304] mlx5_enter_error_state+0x106/0x1d0 [mlx5_core] [70013.082338] mlx5_try_fast_unload+0x2ea/0x4d0 [mlx5_core] [70013.082377] remove_one+0x200/0x2b0 [mlx5_core] [70013.082409] pci_device_remove+0xf3/0x280 [70013.082439] device_release_driver_internal+0x1c3/0x470 [70013.082453] pci_stop_bus_device+0x109/0x160 [70013.082468] pci_stop_and_remove_bus_device+0xe/0x20 [70013.082485] pcie_do_fatal_recovery+0x167/0x550 [70013.082493] aer_isr+0x7d2/0x960 [70013.082543] process_one_work+0x65f/0x12d0 [70013.082556] worker_thread+0x87/0xb50 [70013.082571] kthread+0x2e9/0x3a0 [70013.082592] ret_from_fork+0x1f/0x40 The logical relationship of this error is as follows: aer_recover_work | ent->work -------------------------------------------+------------------------------ aer_recover_work_func | |- pcie_do_recovery | |- report_error_detected | |- mlx5_pci_err_detected |cmd_work_handler |- mlx5_enter_error_state | |- cmd_alloc_index |- enter_error_state | |- lock cmd->alloc_lock |- mlx5_cmd_flush | |- clear_bit |- mlx5_cmd_trigger_completions| |- unlock cmd->alloc_lock |- lock cmd->alloc_lock | |- vector = ~dev->cmd.vars.bitmask |- for_each_set_bit | |- cmd_ent_get(cmd->ent_arr[i]) (UAF) |- unlock cmd->alloc_lock | |- cmd->ent_arr[ent->idx]=ent The cmd->ent_arr[ent->idx] assignment and the bit clearing are not protected by the cmd->alloc_lock in cmd_work_handler(). Fixes: 50b2412b7e78 ("net/mlx5: Avoid possible free of command entry while timeout comp handler") Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Shifeng Li <lishifeng@sangfor.com.cn> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Samasth Norway Ananda <samasth.norway.ananda@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-05-02mlxsw: spectrum_acl_tcam: Fix memory leak when canceling rehash workIdo Schimmel
[ Upstream commit fb4e2b70a7194b209fc7320bbf33b375f7114bd5 ] The rehash delayed work is rescheduled with a delay if the number of credits at end of the work is not negative as supposedly it means that the migration ended. Otherwise, it is rescheduled immediately. After "mlxsw: spectrum_acl_tcam: Fix possible use-after-free during rehash" the above is no longer accurate as a non-negative number of credits is no longer indicative of the migration being done. It can also happen if the work encountered an error in which case the migration will resume the next time the work is scheduled. The significance of the above is that it is possible for the work to be pending and associated with hints that were allocated when the migration started. This leads to the hints being leaked [1] when the work is canceled while pending as part of ACL region dismantle. Fix by freeing the hints if hints are associated with a work that was canceled while pending. Blame the original commit since the reliance on not having a pending work associated with hints is fragile. [1] unreferenced object 0xffff88810e7c3000 (size 256): comm "kworker/0:16", pid 176, jiffies 4295460353 hex dump (first 32 bytes): 00 30 95 11 81 88 ff ff 61 00 00 00 00 00 00 80 .0......a....... 00 00 61 00 40 00 00 00 00 00 00 00 04 00 00 00 ..a.@........... backtrace (crc 2544ddb9): [<00000000cf8cfab3>] kmalloc_trace+0x23f/0x2a0 [<000000004d9a1ad9>] objagg_hints_get+0x42/0x390 [<000000000b143cf3>] mlxsw_sp_acl_erp_rehash_hints_get+0xca/0x400 [<0000000059bdb60a>] mlxsw_sp_acl_tcam_vregion_rehash_work+0x868/0x1160 [<00000000e81fd734>] process_one_work+0x59c/0xf20 [<00000000ceee9e81>] worker_thread+0x799/0x12c0 [<00000000bda6fe39>] kthread+0x246/0x300 [<0000000070056d23>] ret_from_fork+0x34/0x70 [<00000000dea2b93e>] ret_from_fork_asm+0x1a/0x30 Fixes: c9c9af91f1d9 ("mlxsw: spectrum_acl: Allow to interrupt/continue rehash work") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Alexander Zubkov <green@qrator.net> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/0cc12ebb07c4d4c41a1265ee2c28b392ff997a86.1713797103.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-02mlxsw: spectrum_acl_tcam: Fix incorrect list API usageIdo Schimmel
[ Upstream commit b377add0f0117409c418ddd6504bd682ebe0bf79 ] Both the function that migrates all the chunks within a region and the function that migrates all the entries within a chunk call list_first_entry() on the respective lists without checking that the lists are not empty. This is incorrect usage of the API, which leads to the following warning [1]. Fix by returning if the lists are empty as there is nothing to migrate in this case. [1] WARNING: CPU: 0 PID: 6437 at drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c:1266 mlxsw_sp_acl_tcam_vchunk_migrate_all+0x1f1/0> Modules linked in: CPU: 0 PID: 6437 Comm: kworker/0:37 Not tainted 6.9.0-rc3-custom-00883-g94a65f079ef6 #39 Hardware name: Mellanox Technologies Ltd. MSN3700/VMOD0005, BIOS 5.11 01/06/2019 Workqueue: mlxsw_core mlxsw_sp_acl_tcam_vregion_rehash_work RIP: 0010:mlxsw_sp_acl_tcam_vchunk_migrate_all+0x1f1/0x2c0 [...] Call Trace: <TASK> mlxsw_sp_acl_tcam_vregion_rehash_work+0x6c/0x4a0 process_one_work+0x151/0x370 worker_thread+0x2cb/0x3e0 kthread+0xd0/0x100 ret_from_fork+0x34/0x50 ret_from_fork_asm+0x1a/0x30 </TASK> Fixes: 6f9579d4e302 ("mlxsw: spectrum_acl: Remember where to continue rehash migration") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Alexander Zubkov <green@qrator.net> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/4628e9a22d1d84818e28310abbbc498e7bc31bc9.1713797103.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-02mlxsw: spectrum_acl_tcam: Fix warning during rehashIdo Schimmel
[ Upstream commit 743edc8547a92b6192aa1f1b6bb78233fa21dc9b ] As previously explained, the rehash delayed work migrates filters from one region to another. This is done by iterating over all chunks (all the filters with the same priority) in the region and in each chunk iterating over all the filters. When the work runs out of credits it stores the current chunk and entry as markers in the per-work context so that it would know where to resume the migration from the next time the work is scheduled. Upon error, the chunk marker is reset to NULL, but without resetting the entry markers despite being relative to it. This can result in migration being resumed from an entry that does not belong to the chunk being migrated. In turn, this will eventually lead to a chunk being iterated over as if it is an entry. Because of how the two structures happen to be defined, this does not lead to KASAN splats, but to warnings such as [1]. Fix by creating a helper that resets all the markers and call it from all the places the currently only reset the chunk marker. For good measures also call it when starting a completely new rehash. Add a warning to avoid future cases. [1] WARNING: CPU: 7 PID: 1076 at drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_keys.c:407 mlxsw_afk_encode+0x242/0x2f0 Modules linked in: CPU: 7 PID: 1076 Comm: kworker/7:24 Tainted: G W 6.9.0-rc3-custom-00880-g29e61d91b77b #29 Hardware name: Mellanox Technologies Ltd. MSN3700/VMOD0005, BIOS 5.11 01/06/2019 Workqueue: mlxsw_core mlxsw_sp_acl_tcam_vregion_rehash_work RIP: 0010:mlxsw_afk_encode+0x242/0x2f0 [...] Call Trace: <TASK> mlxsw_sp_acl_atcam_entry_add+0xd9/0x3c0 mlxsw_sp_acl_tcam_entry_create+0x5e/0xa0 mlxsw_sp_acl_tcam_vchunk_migrate_all+0x109/0x290 mlxsw_sp_acl_tcam_vregion_rehash_work+0x6c/0x470 process_one_work+0x151/0x370 worker_thread+0x2cb/0x3e0 kthread+0xd0/0x100 ret_from_fork+0x34/0x50 </TASK> Fixes: 6f9579d4e302 ("mlxsw: spectrum_acl: Remember where to continue rehash migration") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Alexander Zubkov <green@qrator.net> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/cc17eed86b41dd829d39b07906fec074a9ce580e.1713797103.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-02mlxsw: spectrum_acl_tcam: Fix memory leak during rehashIdo Schimmel
[ Upstream commit 8ca3f7a7b61393804c46f170743c3b839df13977 ] The rehash delayed work migrates filters from one region to another. This is done by iterating over all chunks (all the filters with the same priority) in the region and in each chunk iterating over all the filters. If the migration fails, the code tries to migrate the filters back to the old region. However, the rollback itself can also fail in which case another migration will be erroneously performed. Besides the fact that this ping pong is not a very good idea, it also creates a problem. Each virtual chunk references two chunks: The currently used one ('vchunk->chunk') and a backup ('vchunk->chunk2'). During migration the first holds the chunk we want to migrate filters to and the second holds the chunk we are migrating filters from. The code currently assumes - but does not verify - that the backup chunk does not exist (NULL) if the currently used chunk does not reference the target region. This assumption breaks when we are trying to rollback a rollback, resulting in the backup chunk being overwritten and leaked [1]. Fix by not rolling back a failed rollback and add a warning to avoid future cases. [1] WARNING: CPU: 5 PID: 1063 at lib/parman.c:291 parman_destroy+0x17/0x20 Modules linked in: CPU: 5 PID: 1063 Comm: kworker/5:11 Tainted: G W 6.9.0-rc2-custom-00784-gc6a05c468a0b #14 Hardware name: Mellanox Technologies Ltd. MSN3700/VMOD0005, BIOS 5.11 01/06/2019 Workqueue: mlxsw_core mlxsw_sp_acl_tcam_vregion_rehash_work RIP: 0010:parman_destroy+0x17/0x20 [...] Call Trace: <TASK> mlxsw_sp_acl_atcam_region_fini+0x19/0x60 mlxsw_sp_acl_tcam_region_destroy+0x49/0xf0 mlxsw_sp_acl_tcam_vregion_rehash_work+0x1f1/0x470 process_one_work+0x151/0x370 worker_thread+0x2cb/0x3e0 kthread+0xd0/0x100 ret_from_fork+0x34/0x50 ret_from_fork_asm+0x1a/0x30 </TASK> Fixes: 843500518509 ("mlxsw: spectrum_acl: Do rollback as another call to mlxsw_sp_acl_tcam_vchunk_migrate_all()") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Alexander Zubkov <green@qrator.net> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/d5edd4f4503934186ae5cfe268503b16345b4e0f.1713797103.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-02mlxsw: spectrum_acl_tcam: Rate limit error messageIdo Schimmel
[ Upstream commit 5bcf925587e9b5d36420d572a0b4d131c90fb306 ] In the rare cases when the device resources are exhausted it is likely that the rehash delayed work will fail. An error message will be printed whenever this happens which can be overwhelming considering the fact that the work is per-region and that there can be hundreds of regions. Fix by rate limiting the error message. Fixes: e5e7962ee5c2 ("mlxsw: spectrum_acl: Implement region migration according to hints") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Alexander Zubkov <green@qrator.net> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/c510763b2ebd25e7990d80183feff91cde593145.1713797103.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-02mlxsw: spectrum_acl_tcam: Fix possible use-after-free during rehashIdo Schimmel
[ Upstream commit 54225988889931467a9b55fdbef534079b665519 ] The rehash delayed work migrates filters from one region to another according to the number of available credits. The migrated from region is destroyed at the end of the work if the number of credits is non-negative as the assumption is that this is indicative of migration being complete. This assumption is incorrect as a non-negative number of credits can also be the result of a failed migration. The destruction of a region that still has filters referencing it can result in a use-after-free [1]. Fix by not destroying the region if migration failed. [1] BUG: KASAN: slab-use-after-free in mlxsw_sp_acl_ctcam_region_entry_remove+0x21d/0x230 Read of size 8 at addr ffff8881735319e8 by task kworker/0:31/3858 CPU: 0 PID: 3858 Comm: kworker/0:31 Tainted: G W 6.9.0-rc2-custom-00782-gf2275c2157d8 #5 Hardware name: Mellanox Technologies Ltd. MSN3700/VMOD0005, BIOS 5.11 01/06/2019 Workqueue: mlxsw_core mlxsw_sp_acl_tcam_vregion_rehash_work Call Trace: <TASK> dump_stack_lvl+0xc6/0x120 print_report+0xce/0x670 kasan_report+0xd7/0x110 mlxsw_sp_acl_ctcam_region_entry_remove+0x21d/0x230 mlxsw_sp_acl_ctcam_entry_del+0x2e/0x70 mlxsw_sp_acl_atcam_entry_del+0x81/0x210 mlxsw_sp_acl_tcam_vchunk_migrate_all+0x3cd/0xb50 mlxsw_sp_acl_tcam_vregion_rehash_work+0x157/0x1300 process_one_work+0x8eb/0x19b0 worker_thread+0x6c9/0xf70 kthread+0x2c9/0x3b0 ret_from_fork+0x4d/0x80 ret_from_fork_asm+0x1a/0x30 </TASK> Allocated by task 174: kasan_save_stack+0x33/0x60 kasan_save_track+0x14/0x30 __kasan_kmalloc+0x8f/0xa0 __kmalloc+0x19c/0x360 mlxsw_sp_acl_tcam_region_create+0xdf/0x9c0 mlxsw_sp_acl_tcam_vregion_rehash_work+0x954/0x1300 process_one_work+0x8eb/0x19b0 worker_thread+0x6c9/0xf70 kthread+0x2c9/0x3b0 ret_from_fork+0x4d/0x80 ret_from_fork_asm+0x1a/0x30 Freed by task 7: kasan_save_stack+0x33/0x60 kasan_save_track+0x14/0x30 kasan_save_free_info+0x3b/0x60 poison_slab_object+0x102/0x170 __kasan_slab_free+0x14/0x30 kfree+0xc1/0x290 mlxsw_sp_acl_tcam_region_destroy+0x272/0x310 mlxsw_sp_acl_tcam_vregion_rehash_work+0x731/0x1300 process_one_work+0x8eb/0x19b0 worker_thread+0x6c9/0xf70 kthread+0x2c9/0x3b0 ret_from_fork+0x4d/0x80 ret_from_fork_asm+0x1a/0x30 Fixes: c9c9af91f1d9 ("mlxsw: spectrum_acl: Allow to interrupt/continue rehash work") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Alexander Zubkov <green@qrator.net> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/3e412b5659ec2310c5c615760dfe5eac18dd7ebd.1713797103.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-02mlxsw: spectrum_acl_tcam: Fix possible use-after-free during activity updateIdo Schimmel
[ Upstream commit 79b5b4b18bc85b19d3a518483f9abbbe6d7b3ba4 ] The rule activity update delayed work periodically traverses the list of configured rules and queries their activity from the device. As part of this task it accesses the entry pointed by 'ventry->entry', but this entry can be changed concurrently by the rehash delayed work, leading to a use-after-free [1]. Fix by closing the race and perform the activity query under the 'vregion->lock' mutex. [1] BUG: KASAN: slab-use-after-free in mlxsw_sp_acl_tcam_flower_rule_activity_get+0x121/0x140 Read of size 8 at addr ffff8881054ed808 by task kworker/0:18/181 CPU: 0 PID: 181 Comm: kworker/0:18 Not tainted 6.9.0-rc2-custom-00781-gd5ab772d32f7 #2 Hardware name: Mellanox Technologies Ltd. MSN3700/VMOD0005, BIOS 5.11 01/06/2019 Workqueue: mlxsw_core mlxsw_sp_acl_rule_activity_update_work Call Trace: <TASK> dump_stack_lvl+0xc6/0x120 print_report+0xce/0x670 kasan_report+0xd7/0x110 mlxsw_sp_acl_tcam_flower_rule_activity_get+0x121/0x140 mlxsw_sp_acl_rule_activity_update_work+0x219/0x400 process_one_work+0x8eb/0x19b0 worker_thread+0x6c9/0xf70 kthread+0x2c9/0x3b0 ret_from_fork+0x4d/0x80 ret_from_fork_asm+0x1a/0x30 </TASK> Allocated by task 1039: kasan_save_stack+0x33/0x60 kasan_save_track+0x14/0x30 __kasan_kmalloc+0x8f/0xa0 __kmalloc+0x19c/0x360 mlxsw_sp_acl_tcam_entry_create+0x7b/0x1f0 mlxsw_sp_acl_tcam_vchunk_migrate_all+0x30d/0xb50 mlxsw_sp_acl_tcam_vregion_rehash_work+0x157/0x1300 process_one_work+0x8eb/0x19b0 worker_thread+0x6c9/0xf70 kthread+0x2c9/0x3b0 ret_from_fork+0x4d/0x80 ret_from_fork_asm+0x1a/0x30 Freed by task 1039: kasan_save_stack+0x33/0x60 kasan_save_track+0x14/0x30 kasan_save_free_info+0x3b/0x60 poison_slab_object+0x102/0x170 __kasan_slab_free+0x14/0x30 kfree+0xc1/0x290 mlxsw_sp_acl_tcam_vchunk_migrate_all+0x3d7/0xb50 mlxsw_sp_acl_tcam_vregion_rehash_work+0x157/0x1300 process_one_work+0x8eb/0x19b0 worker_thread+0x6c9/0xf70 kthread+0x2c9/0x3b0 ret_from_fork+0x4d/0x80 ret_from_fork_asm+0x1a/0x30 Fixes: 2bffc5322fd8 ("mlxsw: spectrum_acl: Don't take mutex in mlxsw_sp_acl_tcam_vregion_rehash_work()") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Alexander Zubkov <green@qrator.net> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/1fcce0a60b231ebeb2515d91022284ba7b4ffe7a.1713797103.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-02mlxsw: spectrum_acl_tcam: Fix race during rehash delayed workIdo Schimmel
[ Upstream commit d90cfe20562407d9f080d24123078d666d730707 ] The purpose of the rehash delayed work is to reduce the number of masks (eRPs) used by an ACL region as the eRP bank is a global and limited resource. This is done in three steps: 1. Creating a new set of masks and a new ACL region which will use the new masks and to which the existing filters will be migrated to. The new region is assigned to 'vregion->region' and the region from which the filters are migrated from is assigned to 'vregion->region2'. 2. Migrating all the filters from the old region to the new region. 3. Destroying the old region and setting 'vregion->region2' to NULL. Only the second steps is performed under the 'vregion->lock' mutex although its comments says that among other things it "Protects consistency of region, region2 pointers". This is problematic as the first step can race with filter insertion from user space that uses 'vregion->region', but under the mutex. Fix by holding the mutex across the entirety of the delayed work and not only during the second step. Fixes: 2bffc5322fd8 ("mlxsw: spectrum_acl: Don't take mutex in mlxsw_sp_acl_tcam_vregion_rehash_work()") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Alexander Zubkov <green@qrator.net> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/1ec1d54edf2bad0a369e6b4fa030aba64e1f124b.1713797103.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-02mlxsw: core: Unregister EMAD trap using FORWARD actionIdo Schimmel
[ Upstream commit 976c44af48141cd8595601c0af2a19a43c5b228b ] The device's manual (PRM - Programmer's Reference Manual) classifies the trap that is used to deliver EMAD responses as an "event trap". Among other things, it means that the only actions that can be associated with the trap are TRAP and FORWARD (NOP). Currently, during driver de-initialization the driver unregisters the trap by setting its action to DISCARD, which violates the above guideline. Future firmware versions will prevent such misuses by returning an error. This does not prevent the driver from working, but an error will be printed to the kernel log during module removal / devlink reload: mlxsw_spectrum 0000:03:00.0: Reg cmd access status failed (status=7(bad parameter)) mlxsw_spectrum 0000:03:00.0: Reg cmd access failed (reg_id=7003(hpkt),type=write) Suppress the error message by aligning the driver to the manual and use a FORWARD (NOP) action when unregistering the trap. Fixes: 4ec14b7634b2 ("mlxsw: Add interface to access registers and process events") Cc: Jiri Pirko <jiri@resnulli.us> Cc: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/753a89e14008fde08cb4a2c1e5f537b81d8eb2d6.1713446092.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-02net/mlx5: Properly link new fs rules into the treeCosmin Ratiu
[ Upstream commit 7c6782ad4911cbee874e85630226ed389ff2e453 ] Previously, add_rule_fg would only add newly created rules from the handle into the tree when they had a refcount of 1. On the other hand, create_flow_handle tries hard to find and reference already existing identical rules instead of creating new ones. These two behaviors can result in a situation where create_flow_handle 1) creates a new rule and references it, then 2) in a subsequent step during the same handle creation references it again, resulting in a rule with a refcount of 2 that is not linked into the tree, will have a NULL parent and root and will result in a crash when the flow group is deleted because del_sw_hw_rule, invoked on rule deletion, assumes node->parent is != NULL. This happened in the wild, due to another bug related to incorrect handling of duplicate pkt_reformat ids, which lead to the code in create_flow_handle incorrectly referencing a just-added rule in the same flow handle, resulting in the problem described above. Full details are at [1]. This patch changes add_rule_fg to add new rules without parents into the tree, properly initializing them and avoiding the crash. This makes it more consistent with how rules are added to an FTE in create_flow_handle. Fixes: 74491de93712 ("net/mlx5: Add multi dest support") Link: https://lore.kernel.org/netdev/ea5264d6-6b55-4449-a602-214c6f509c1e@163.com/T/#u [1] Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240409190820.227554-5-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-23net/mlx5e: fix a double-free in arfs_create_groupsZhipeng Lu
[ Upstream commit 3c6d5189246f590e4e1f167991558bdb72a4738b ] When `in` allocated by kvzalloc fails, arfs_create_groups will free ft->g and return an error. However, arfs_create_table, the only caller of arfs_create_groups, will hold this error and call to mlx5e_destroy_flow_table, in which the ft->g will be freed again. Fixes: 1cabe6b0965e ("net/mlx5e: Create aRFS flow tables") Signed-off-by: Zhipeng Lu <alexious@zju.edu.cn> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-23net/mlx5: Use kfree(ft->g) in arfs_create_groups()Denis Efremov
[ Upstream commit 360000b26e37a75b3000bf0585b263809d96ffd3 ] Use kfree() instead of kvfree() on ft->g in arfs_create_groups() because the memory is allocated with kcalloc(). Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Stable-dep-of: 3c6d5189246f ("net/mlx5e: fix a double-free in arfs_create_groups") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-23net/mlx5: DR, Use the right GVMI number for drop actionYevgeny Kliteynik
[ Upstream commit 5665954293f13642f9c052ead83c1e9d8cff186f ] When FW provides ICM addresses for drop RX/TX, the provided capability is 64 bits that contain its GVMI as well as the ICM address itself. In case of TX DROP this GVMI is different from the GVMI that the domain is operating on. This patch fixes the action to use these GVMI IDs, as provided by FW. Fixes: 9db810ed2d37 ("net/mlx5: DR, Expose steering action functionality") Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-25mlxsw: spectrum_acl_erp: Fix error flow of pool allocation failureAmit Cohen
[ Upstream commit 6d6eeabcfaba2fcadf5443b575789ea606f9de83 ] Lately, a bug was found when many TC filters are added - at some point, several bugs are printed to dmesg [1] and the switch is crashed with segmentation fault. The issue starts when gen_pool_free() fails because of unexpected behavior - a try to free memory which is already freed, this leads to BUG() call which crashes the switch and makes many other bugs. Trying to track down the unexpected behavior led to a bug in eRP code. The function mlxsw_sp_acl_erp_table_alloc() gets a pointer to the allocated index, sets the value and returns an error code. When gen_pool_alloc() fails it returns address 0, we track it and return -ENOBUFS outside, BUT the call for gen_pool_alloc() already override the index in erp_table structure. This is a problem when such allocation is done as part of table expansion. This is not a new table, which will not be used in case of allocation failure. We try to expand eRP table and override the current index (non-zero) with zero. Then, it leads to an unexpected behavior when address 0 is freed twice. Note that address 0 is valid in erp_table->base_index and indeed other tables use it. gen_pool_alloc() fails in case that there is no space left in the pre-allocated pool, in our case, the pool is limited to ACL_MAX_ERPT_BANK_SIZE, which is read from hardware. When more than max erp entries are required, we exceed the limit and return an error, this error leads to "Failed to migrate vregion" print. Fix this by changing erp_table->base_index only in case of a successful allocation. Add a test case for such a scenario. Without this fix it causes segmentation fault: $ TESTS="max_erp_entries_test" ./tc_flower.sh ./tc_flower.sh: line 988: 1560 Segmentation fault tc filter del dev $h2 ingress chain $i protocol ip pref $i handle $j flower &>/dev/null [1]: kernel BUG at lib/genalloc.c:508! invalid opcode: 0000 [#1] PREEMPT SMP CPU: 6 PID: 3531 Comm: tc Not tainted 6.7.0-rc5-custom-ga6893f479f5e #1 Hardware name: Mellanox Technologies Ltd. MSN4700/VMOD0010, BIOS 5.11 07/12/2021 RIP: 0010:gen_pool_free_owner+0xc9/0xe0 ... Call Trace: <TASK> __mlxsw_sp_acl_erp_table_other_dec+0x70/0xa0 [mlxsw_spectrum] mlxsw_sp_acl_erp_mask_destroy+0xf5/0x110 [mlxsw_spectrum] objagg_obj_root_destroy+0x18/0x80 [objagg] objagg_obj_destroy+0x12c/0x130 [objagg] mlxsw_sp_acl_erp_mask_put+0x37/0x50 [mlxsw_spectrum] mlxsw_sp_acl_ctcam_region_entry_remove+0x74/0xa0 [mlxsw_spectrum] mlxsw_sp_acl_ctcam_entry_del+0x1e/0x40 [mlxsw_spectrum] mlxsw_sp_acl_tcam_ventry_del+0x78/0xd0 [mlxsw_spectrum] mlxsw_sp_flower_destroy+0x4d/0x70 [mlxsw_spectrum] mlxsw_sp_flow_block_cb+0x73/0xb0 [mlxsw_spectrum] tc_setup_cb_destroy+0xc1/0x180 fl_hw_destroy_filter+0x94/0xc0 [cls_flower] __fl_delete+0x1ac/0x1c0 [cls_flower] fl_destroy+0xc2/0x150 [cls_flower] tcf_proto_destroy+0x1a/0xa0 ... mlxsw_spectrum3 0000:07:00.0: Failed to migrate vregion mlxsw_spectrum3 0000:07:00.0: Failed to migrate vregion Fixes: f465261aa105 ("mlxsw: spectrum_acl: Implement common eRP core") Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/4cfca254dfc0e5d283974801a24371c7b6db5989.1705502064.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-08net/mlx5e: Correct snprintf truncation handling for fw_version buffer used ↵Rahul Rameshbabu
by representors [ Upstream commit b13559b76157de9d74f04d3ca0e49d69de3b5675 ] snprintf returns the length of the formatted string, excluding the trailing null, without accounting for truncation. This means that is the return value is greater than or equal to the size parameter, the fw_version string was truncated. Link: https://docs.kernel.org/core-api/kernel-api.html#c.snprintf Fixes: 1b2bd0c0264f ("net/mlx5e: Check return value of snprintf writing to fw_version buffer for representors") Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-08net/mlx5: Fix fw tracer first block checkMoshe Shemesh
[ Upstream commit 4261edf11cb7c9224af713a102e5616329306932 ] While handling new traces, to verify it is not the first block being written, last_timestamp is checked. But instead of checking it is non zero it is verified to be zero. Fix to verify last_timestamp is not zero. Fixes: c71ad41ccb0c ("net/mlx5: FW tracer, events handling") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Feras Daoud <ferasda@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-08net/mlx5: improve some commentsHu Haowen
[ Upstream commit 6533380dfd003ea7636cb5672f4f85124b56328b ] Replaced "its" with "it's". Signed-off-by: Hu Haowen <xianfengting221@163.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Stable-dep-of: 4261edf11cb7 ("net/mlx5: Fix fw tracer first block check") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-08Revert "net/mlx5e: fix double free of encap_header"Vlad Buslov
[ Upstream commit 5d089684dc434a31e08d32f0530066d0025c52e4 ] This reverts commit 6f9b1a0731662648949a1c0587f6acb3b7f8acf1. This patch is causing a null ptr issue, the proper fix is in the next patch. Fixes: 6f9b1a073166 ("net/mlx5e: fix double free of encap_header") Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-28net/mlx5e: Check return value of snprintf writing to fw_version buffer for ↵Rahul Rameshbabu
representors [ Upstream commit 1b2bd0c0264febcd8d47209079a6671c38e6558b ] Treat the operation as an error case when the return value is equivalent to the size of the name buffer. Failed to write null terminator to the name buffer, making the string malformed and should not be used. Provide a string with only the firmware version when forming the string with the board id fails. This logic for representors is identical to normal flow with ethtool. Without check, will trigger -Wformat-truncation with W=1. drivers/net/ethernet/mellanox/mlx5/core/en_rep.c: In function 'mlx5e_rep_get_drvinfo': drivers/net/ethernet/mellanox/mlx5/core/en_rep.c:78:31: warning: '%.16s' directive output may be truncated writing up to 16 bytes into a region of size between 13 and 22 [-Wformat-truncation=] 78 | "%d.%d.%04d (%.16s)", | ^~~~~ drivers/net/ethernet/mellanox/mlx5/core/en_rep.c:77:9: note: 'snprintf' output between 12 and 37 bytes into a destination of size 32 77 | snprintf(drvinfo->fw_version, sizeof(drvinfo->fw_version), | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 78 | "%d.%d.%04d (%.16s)", | ~~~~~~~~~~~~~~~~~~~~~ 79 | fw_rev_maj(mdev), fw_rev_min(mdev), | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 80 | fw_rev_sub(mdev), mdev->board_id); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fixes: cf83c8fdcd47 ("net/mlx5e: Add missing ethtool driver info for representors") Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6d4ab2e97dcfbcd748ae71761a9d8e5e41cc732c Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20231114215846.5902-16-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-28net/mlx5_core: Clean driver version and nameLeon Romanovsky
[ Upstream commit 17a7612b99e66d2539341ab4f888f970c2c7f76d ] Remove exposed driver version as it was done in other drivers, so module version will work correctly by displaying the kernel version for which it is compiled. And move mlx5_core module name to general include, so auxiliary drivers will be able to use it as a basis for a name in their device ID tables. Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Stable-dep-of: 1b2bd0c0264f ("net/mlx5e: Check return value of snprintf writing to fw_version buffer for representors") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-28net/mlx5e: fix double free of encap_headerDust Li
[ Upstream commit 6f9b1a0731662648949a1c0587f6acb3b7f8acf1 ] When mlx5_packet_reformat_alloc() fails, the encap_header allocated in mlx5e_tc_tun_create_header_ipv4{6} will be released within it. However, e->encap_header is already set to the previously freed encap_header before mlx5_packet_reformat_alloc(). As a result, the later mlx5e_encap_put() will free e->encap_header again, causing a double free issue. mlx5e_encap_put() --> mlx5e_encap_dealloc() --> kfree(e->encap_header) This happens when cmd: MLX5_CMD_OP_ALLOC_PACKET_REFORMAT_CONTEXT fail. This patch fix it by not setting e->encap_header until mlx5_packet_reformat_alloc() success. Fixes: d589e785baf5e ("net/mlx5e: Allow concurrent creation of encap entries") Reported-by: Cruz Zhao <cruzzhao@linux.alibaba.com> Reported-by: Tianchen Ding <dtcccc@linux.alibaba.com> Signed-off-by: Dust Li <dust.li@linux.alibaba.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-25mlxsw: fix mlxsw_sp2_nve_vxlan_learning_set() return typeDan Carpenter
[ Upstream commit 1e0b72a2a6432c0ef67ee5ce8d9172a7c20bba25 ] The mlxsw_sp2_nve_vxlan_learning_set() function is supposed to return zero on success or negative error codes. So it needs to be type int instead of bool. Fixes: 4ee70efab68d ("mlxsw: spectrum_nve: Add support for VXLAN on Spectrum-2") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-23mlxsw: i2c: Limit single transaction buffer sizeVadim Pasternak
[ Upstream commit d7248f1cc835bd80e936dc5b2d94b149bdd0077d ] Maximum size of buffer is obtained from underlying I2C adapter and in case adapter allows I2C transaction buffer size greater than 100 bytes, transaction will fail due to firmware limitation. As a result driver will fail initialization. Limit the maximum size of transaction buffer by 100 bytes to fit to firmware. Remove unnecessary calculation: max_t(u16, MLXSW_I2C_BLK_DEF, quirk_size). This condition can not happened. Fixes: 3029a693beda ("mlxsw: i2c: Allow flexible setting of I2C transactions size") Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-23mlxsw: i2c: Fix chunk size setting in output mailbox bufferVadim Pasternak
[ Upstream commit 146c7c330507c0384bf29d567186632bfe975927 ] The driver reads commands output from the output mailbox. If the size of the output mailbox is not a multiple of the transaction / block size, then the driver will not issue enough read transactions to read the entire output, which can result in driver initialization errors. Fix by determining the number of transactions using DIV_ROUND_UP(). Fixes: 3029a693beda ("mlxsw: i2c: Allow flexible setting of I2C transactions size") Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-16net/mlx5: Allow 0 for total host VFsDaniel Jurgens
commit 2dc2b3922d3c0f52d3a792d15dcacfbc4cc76b8f upstream. When querying eswitch functions 0 is a valid number of host VFs. After introducing ARM SRIOV falling through to getting the max value from PCI results in using the total VFs allowed on the ARM for the host. Fixes: 86eec50beaf3 ("net/mlx5: Support querying max VFs from device"); Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-11net/mlx5e: fix return value check in mlx5e_ipsec_remove_trailer()Yuanjun Gong
[ Upstream commit e5bcb7564d3bd0c88613c76963c5349be9c511c5 ] mlx5e_ipsec_remove_trailer() should return an error code if function pskb_trim() returns an unexpected value. Fixes: 2ac9cfe78223 ("net/mlx5e: IPSec, Add Innova IPSec offload TX data path") Signed-off-by: Yuanjun Gong <ruc_gongyuanjun@163.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-11net/mlx5: DR, fix memory leak in mlx5dr_cmd_create_reformat_ctxZhengchao Shao
[ Upstream commit 5dd77585dd9d0e03dd1bceb95f0269a7eaf6b936 ] when mlx5_cmd_exec failed in mlx5dr_cmd_create_reformat_ctx, the memory pointed by 'in' is not released, which will cause memory leak. Move memory release after mlx5_cmd_exec. Fixes: 1d9186476e12 ("net/mlx5: DR, Add direct rule command utilities") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-09treewide: Remove uninitialized_var() usageKees Cook
commit 3f649ab728cda8038259d8f14492fe400fbab911 upstream. Using uninitialized_var() is dangerous as it papers over real bugs[1] (or can in the future), and suppresses unrelated compiler warnings (e.g. "unused variable"). If the compiler thinks it is uninitialized, either simply initialize the variable or make compiler changes. In preparation for removing[2] the[3] macro[4], remove all remaining needless uses with the following script: git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \ xargs perl -pi -e \ 's/\buninitialized_var\(([^\)]+)\)/\1/g; s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;' drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid pathological white-space. No outstanding warnings were found building allmodconfig with GCC 9.3.0 for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64, alpha, and m68k. [1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/ [2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/ [3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/ [4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/ Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5 Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-06-09net/mlx5: fw_tracer, Fix event handlingShay Drory
[ Upstream commit 341a80de2468f481b1f771683709b5649cbfe513 ] mlx5 driver needs to parse traces with event_id inside the range of first_string_trace and num_string_trace. However, mlx5 is parsing all events with event_id >= first_string_trace. Fix it by checking for the correct range. Fixes: c71ad41ccb0c ("net/mlx5: FW tracer, events handling") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-05net/mlx5: Devcom, serialize devcom registrationShay Drory
[ Upstream commit 1f893f57a3bf9fe1f4bcb25b55aea7f7f9712fe7 ] From one hand, mlx5 driver is allowing to probe PFs in parallel. From the other hand, devcom, which is a share resource between PFs, is registered without any lock. This might resulted in memory problems. Hence, use the global mlx5_dev_list_lock in order to serialize devcom registration. Fixes: fadd59fc50d0 ("net/mlx5: Introduce inter-device communication mechanism") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-05net/mlx5: devcom only supports 2 portsMark Bloch
[ Upstream commit 8a6e75e5f57e9ac82268d9bfca3403598d9d0292 ] Devcom API is intended to be used between 2 devices only add this implied assumption into the code and check when it's no true. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Stable-dep-of: 1f893f57a3bf ("net/mlx5: Devcom, serialize devcom registration") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-30net/mlx5: Devcom, fix error flow in mlx5_devcom_register_deviceShay Drory
commit af87194352cad882d787d06fb7efa714acd95427 upstream. In case devcom allocation is failed, mlx5 is always freeing the priv. However, this priv might have been allocated by a different thread, and freeing it might lead to use-after-free bugs. Fix it by freeing the priv only in case it was allocated by the running thread. Fixes: fadd59fc50d0 ("net/mlx5: Introduce inter-device communication mechanism") Signed-off-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-30net/mlx5: Fix error message when failing to allocate device memoryRoi Dayan
commit a65735148e0328f80c0f72f9f8d2f609bfcf4aff upstream. Fix spacing for the error and also the correct error code pointer. Fixes: c9b9dcb430b3 ("net/mlx5: Move device memory management to mlx5_core") Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-26mlxfw: fix null-ptr-deref in mlxfw_mfa2_tlv_next()Nikita Zhandarovich
[ Upstream commit c0e73276f0fcbbd3d4736ba975d7dc7a48791b0c ] Function mlxfw_mfa2_tlv_multi_get() returns NULL if 'tlv' in question does not pass checks in mlxfw_mfa2_tlv_payload_get(). This behaviour may lead to NULL pointer dereference in 'multi->total_len'. Fix this issue by testing mlxfw_mfa2_tlv_multi_get()'s return value against NULL. Found by Linux Verification Center (linuxtesting.org) with static analysis tool SVACE. Fixes: 410ed13cae39 ("Add the mlxfw module for Mellanox firmware flash process") Co-developed-by: Natalia Petrova <n.petrova@fintech.ru> Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20230417120718.52325-1-n.zhandarovich@fintech.ru Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-05net/mlx5: Read the TC mapping of all priorities on ETS queryMaher Sanalla
[ Upstream commit 44d553188c38ac74b799dfdcebafef2f7bb70942 ] When ETS configurations are queried by the user to get the mapping assignment between packet priority and traffic class, only priorities up to maximum TCs are queried from QTCT register in FW to retrieve their assigned TC, leaving the rest of the priorities mapped to the default TC #0 which might be misleading. Fix by querying the TC mapping of all priorities on each ETS query, regardless of the maximum number of TCs configured in FW. Fixes: 820c2c5e773d ("net/mlx5e: Read ETS settings directly from firmware") Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-11net/mlx5: Geneve, Fix handling of Geneve object id as error codeMaor Dickman
[ Upstream commit d28a06d7dbedc598a06bd1e53a28125f87ca5d0c ] On success, mlx5_geneve_tlv_option_create returns non negative Geneve object id. In case the object id is positive value the caller functions will handle it as an error (non zero) and will fail to offload the Geneve rule. Fix this by changing caller function ,mlx5_geneve_tlv_option_add, to return 0 in case valid non negative object id was provided. Fixes: 0ccc171ea6a2 ("net/mlx5: Geneve, Manage Geneve TLV options") Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-11net/mlx5: fw_tracer: Fix debug printShay Drory
[ Upstream commit 988c2352273997a242f15c4fc3711773515006a2 ] The debug message specify tdsn, but takes as an argument the tmsn. The correct argument is tmsn, hence, fix the print. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-11net/mlx5: Enhance debug print in page allocation failureJack Morgenstein
[ Upstream commit 7eef93003e5d20e1a6a6e59e12d914b5431cbda2 ] Provide more details to aid debugging. Fixes: bf0bf77f6519 ("mlx5: Support communicating arbitrary host page size to firmware") Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com> Signed-off-by: Majd Dibbiny <majd@nvidia.com> Signed-off-by: Jack Morgenstein <jackm@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-02-06net: mlx5: eliminate anonymous module_init & module_exitRandy Dunlap
[ Upstream commit 2c1e1b949024989e20907b84e11a731a50778416 ] Eliminate anonymous module_init() and module_exit(), which can lead to confusion or ambiguity when reading System.map, crashes/oops/bugs, or an initcall_debug log. Give each of these init and exit functions unique driver-specific names to eliminate the anonymous names. Example 1: (System.map) ffffffff832fc78c t init ffffffff832fc79e t init ffffffff832fc8f8 t init Example 2: (initcall_debug log) calling init+0x0/0x12 @ 1 initcall init+0x0/0x12 returned 0 after 15 usecs calling init+0x0/0x60 @ 1 initcall init+0x0/0x60 returned 0 after 2 usecs calling init+0x0/0x9a @ 1 initcall init+0x0/0x9a returned 0 after 74 usecs Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Eli Cohen <eli@mellanox.com> Cc: Saeed Mahameed <saeedm@nvidia.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: linux-rdma@vger.kernel.org Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-18net/mlx5: Fix ptp max frequency adjustment rangeRahul Rameshbabu
[ Upstream commit fe91d57277eef8bb4aca05acfa337b4a51d0bba4 ] .max_adj of ptp_clock_info acts as an absolute value for the amount in ppb that can be set for a single call of .adjfine. This means that a single call to .getfine cannot be greater than .max_adj or less than -(.max_adj). Provides correct value for max frequency adjustment value supported by devices. Fixes: 3d8c38af1493 ("net/mlx5e: Add PTP Hardware Clock (PHC) support") Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-18net/mlx5: Rename ptp clock infoEran Ben Elisha
[ Upstream commit aac2df7f022eccb5d117f07b1e231410db1a863a ] Fix a typo in ptp_clock_info naming: mlx5_p2p -> mlx5_ptp. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Stable-dep-of: fe91d57277ee ("net/mlx5: Fix ptp max frequency adjustment range") Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-08net/mlx5: DR, Fix uninitialized var warningYueHaibing
[ Upstream commit 52f7cf70eb8fac6111786c59ae9dfc5cf2bee710 ] Smatch warns this: drivers/net/ethernet/mellanox/mlx5/core/steering/dr_table.c:81 mlx5dr_table_set_miss_action() error: uninitialized symbol 'ret'. Initializing ret with -EOPNOTSUPP and fix missing action case. Fixes: 7838e1725394 ("net/mlx5: DR, Expose steering table functionality") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-08net/mlx5e: Fix use-after-free when reverting termination tableRoi Dayan
[ Upstream commit 52c795af04441d76f565c4634f893e5b553df2ae ] When having multiple dests with termination tables and second one or afterwards fails the driver reverts usage of term tables but doesn't reset the assignment in attr->dests[num_vport_dests].termtbl which case a use-after-free when releasing the rule. Fix by resetting the assignment of termtbl to null. Fixes: 10caabdaad5a ("net/mlx5e: Use termination table for VLAN push actions") Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-08net/mlx5: Fix uninitialized variable bug in outlen_write()YueHaibing
[ Upstream commit 3f5769a074c13d8f08455e40586600419e02a880 ] If sscanf() return 0, outlen is uninitialized and used in kzalloc(), this is unexpected. We should return -EINVAL if the string is invalid. Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-08net/mlx5: Fix FW tracer timestamp calculationMoshe Shemesh
[ Upstream commit 61db3d7b99a367416e489ccf764cc5f9b00d62a1 ] Fix a bug in calculation of FW tracer timestamp. Decreasing one in the calculation should effect only bits 52_7 and not effect bits 6_0 of the timestamp, otherwise bits 6_0 are always set in this calculation. Fixes: 70dd6fdb8987 ("net/mlx5: FW tracer, parse traces and kernel tracing support") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Feras Daoud <ferasda@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-08net/mlx4: Check retval of mlx4_bitmap_initPeter Kosyh
[ Upstream commit 594c61ffc77de0a197934aa0f1df9285c68801c6 ] If mlx4_bitmap_init fails, mlx4_bitmap_alloc_range will dereference the NULL pointer (bitmap->table). Make sure, that mlx4_bitmap_alloc_range called in no error case. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: d57febe1a478 ("net/mlx4: Add A0 hybrid steering") Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Peter Kosyh <pkosyh@yandex.ru> Link: https://lore.kernel.org/r/20221117152806.278072-1-pkosyh@yandex.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-11-25net/mlx5: Allow async trigger completion execution on single CPU systemsRoy Novich
[ Upstream commit 2808b37b59288ad8f1897e3546c2296df3384b65 ] For a single CPU system, the kernel thread executing mlx5_cmd_flush() never releases the CPU but calls down_trylock(&cmd→sem) in a busy loop. On a single processor system, this leads to a deadlock as the kernel thread which executes mlx5_cmd_invoke() never gets scheduled. Fix this, by adding the cond_resched() call to the loop, allow the command completion kernel thread to execute. Fixes: 8e715cd613a1 ("net/mlx5: Set command entry semaphore up once got index free") Signed-off-by: Alexander Schmidt <alexschm@de.ibm.com> Signed-off-by: Roy Novich <royno@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-11-03net/mlx5: Fix possible use-after-free in async command interfaceTariq Toukan
[ Upstream commit bacd22df95147ed673bec4692ab2d4d585935241 ] mlx5_cmd_cleanup_async_ctx should return only after all its callback handlers were completed. Before this patch, the below race between mlx5_cmd_cleanup_async_ctx and mlx5_cmd_exec_cb_handler was possible and lead to a use-after-free: 1. mlx5_cmd_cleanup_async_ctx is called while num_inflight is 2 (i.e. elevated by 1, a single inflight callback). 2. mlx5_cmd_cleanup_async_ctx decreases num_inflight to 1. 3. mlx5_cmd_exec_cb_handler is called, decreases num_inflight to 0 and is about to call wake_up(). 4. mlx5_cmd_cleanup_async_ctx calls wait_event, which returns immediately as the condition (num_inflight == 0) holds. 5. mlx5_cmd_cleanup_async_ctx returns. 6. The caller of mlx5_cmd_cleanup_async_ctx frees the mlx5_async_ctx object. 7. mlx5_cmd_exec_cb_handler goes on and calls wake_up() on the freed object. Fix it by syncing using a completion object. Mark it completed when num_inflight reaches 0. Trace: BUG: KASAN: use-after-free in do_raw_spin_lock+0x23d/0x270 Read of size 4 at addr ffff888139cd12f4 by task swapper/5/0 CPU: 5 PID: 0 Comm: swapper/5 Not tainted 6.0.0-rc3_for_upstream_debug_2022_08_30_13_10 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: <IRQ> dump_stack_lvl+0x57/0x7d print_report.cold+0x2d5/0x684 ? do_raw_spin_lock+0x23d/0x270 kasan_report+0xb1/0x1a0 ? do_raw_spin_lock+0x23d/0x270 do_raw_spin_lock+0x23d/0x270 ? rwlock_bug.part.0+0x90/0x90 ? __delete_object+0xb8/0x100 ? lock_downgrade+0x6e0/0x6e0 _raw_spin_lock_irqsave+0x43/0x60 ? __wake_up_common_lock+0xb9/0x140 __wake_up_common_lock+0xb9/0x140 ? __wake_up_common+0x650/0x650 ? destroy_tis_callback+0x53/0x70 [mlx5_core] ? kasan_set_track+0x21/0x30 ? destroy_tis_callback+0x53/0x70 [mlx5_core] ? kfree+0x1ba/0x520 ? do_raw_spin_unlock+0x54/0x220 mlx5_cmd_exec_cb_handler+0x136/0x1a0 [mlx5_core] ? mlx5_cmd_cleanup_async_ctx+0x220/0x220 [mlx5_core] ? mlx5_cmd_cleanup_async_ctx+0x220/0x220 [mlx5_core] mlx5_cmd_comp_handler+0x65a/0x12b0 [mlx5_core] ? dump_command+0xcc0/0xcc0 [mlx5_core] ? lockdep_hardirqs_on_prepare+0x400/0x400 ? cmd_comp_notifier+0x7e/0xb0 [mlx5_core] cmd_comp_notifier+0x7e/0xb0 [mlx5_core] atomic_notifier_call_chain+0xd7/0x1d0 mlx5_eq_async_int+0x3ce/0xa20 [mlx5_core] atomic_notifier_call_chain+0xd7/0x1d0 ? irq_release+0x140/0x140 [mlx5_core] irq_int_handler+0x19/0x30 [mlx5_core] __handle_irq_event_percpu+0x1f2/0x620 handle_irq_event+0xb2/0x1d0 handle_edge_irq+0x21e/0xb00 __common_interrupt+0x79/0x1a0 common_interrupt+0x78/0xa0 </IRQ> <TASK> asm_common_interrupt+0x22/0x40 RIP: 0010:default_idle+0x42/0x60 Code: c1 83 e0 07 48 c1 e9 03 83 c0 03 0f b6 14 11 38 d0 7c 04 84 d2 75 14 8b 05 eb 47 22 02 85 c0 7e 07 0f 00 2d e0 9f 48 00 fb f4 <c3> 48 c7 c7 80 08 7f 85 e8 d1 d3 3e fe eb de 66 66 2e 0f 1f 84 00 RSP: 0018:ffff888100dbfdf0 EFLAGS: 00000242 RAX: 0000000000000001 RBX: ffffffff84ecbd48 RCX: 1ffffffff0afe110 RDX: 0000000000000004 RSI: 0000000000000000 RDI: ffffffff835cc9bc RBP: 0000000000000005 R08: 0000000000000001 R09: ffff88881dec4ac3 R10: ffffed1103bd8958 R11: 0000017d0ca571c9 R12: 0000000000000005 R13: ffffffff84f024e0 R14: 0000000000000000 R15: dffffc0000000000 ? default_idle_call+0xcc/0x450 default_idle_call+0xec/0x450 do_idle+0x394/0x450 ? arch_cpu_idle_exit+0x40/0x40 ? do_idle+0x17/0x450 cpu_startup_entry+0x19/0x20 start_secondary+0x221/0x2b0 ? set_cpu_sibling_map+0x2070/0x2070 secondary_startup_64_no_verify+0xcd/0xdb </TASK> Allocated by task 49502: kasan_save_stack+0x1e/0x40 __kasan_kmalloc+0x81/0xa0 kvmalloc_node+0x48/0xe0 mlx5e_bulk_async_init+0x35/0x110 [mlx5_core] mlx5e_tls_priv_tx_list_cleanup+0x84/0x3e0 [mlx5_core] mlx5e_ktls_cleanup_tx+0x38f/0x760 [mlx5_core] mlx5e_cleanup_nic_tx+0xa7/0x100 [mlx5_core] mlx5e_detach_netdev+0x1ca/0x2b0 [mlx5_core] mlx5e_suspend+0xdb/0x140 [mlx5_core] mlx5e_remove+0x89/0x190 [mlx5_core] auxiliary_bus_remove+0x52/0x70 device_release_driver_internal+0x40f/0x650 driver_detach+0xc1/0x180 bus_remove_driver+0x125/0x2f0 auxiliary_driver_unregister+0x16/0x50 mlx5e_cleanup+0x26/0x30 [mlx5_core] cleanup+0xc/0x4e [mlx5_core] __x64_sys_delete_module+0x2b5/0x450 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Freed by task 49502: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 kasan_set_free_info+0x20/0x30 ____kasan_slab_free+0x11d/0x1b0 kfree+0x1ba/0x520 mlx5e_tls_priv_tx_list_cleanup+0x2e7/0x3e0 [mlx5_core] mlx5e_ktls_cleanup_tx+0x38f/0x760 [mlx5_core] mlx5e_cleanup_nic_tx+0xa7/0x100 [mlx5_core] mlx5e_detach_netdev+0x1ca/0x2b0 [mlx5_core] mlx5e_suspend+0xdb/0x140 [mlx5_core] mlx5e_remove+0x89/0x190 [mlx5_core] auxiliary_bus_remove+0x52/0x70 device_release_driver_internal+0x40f/0x650 driver_detach+0xc1/0x180 bus_remove_driver+0x125/0x2f0 auxiliary_driver_unregister+0x16/0x50 mlx5e_cleanup+0x26/0x30 [mlx5_core] cleanup+0xc/0x4e [mlx5_core] __x64_sys_delete_module+0x2b5/0x450 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Fixes: e355477ed9e4 ("net/mlx5: Make mlx5_cmd_exec_cb() a safe API") Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20221026135153.154807-8-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-11-03net/mlx5e: Do not increment ESN when updating IPsec ESN stateHyong Youb Kim
[ Upstream commit 888be6b279b7257b5f6e4c9527675bff0a335596 ] An offloaded SA stops receiving after about 2^32 + replay_window packets. For example, when SA reaches <seq-hi 0x1, seq 0x2c>, all subsequent packets get dropped with SA-icv-failure (integrity_failed). To reproduce the bug: - ConnectX-6 Dx with crypto enabled (FW 22.30.1004) - ipsec.conf: nic-offload = yes replay-window = 32 esn = yes salifetime=24h - Run netperf for a long time to send more than 2^32 packets netperf -H <device-under-test> -t TCP_STREAM -l 20000 When 2^32 + replay_window packets are received, the replay window moves from the 2nd half of subspace (overlap=1) to the 1st half (overlap=0). The driver then updates the 'esn' value in NIC (i.e. seq_hi) as follows. seq_hi = xfrm_replay_seqhi(seq_bottom) new esn in NIC = seq_hi + 1 The +1 increment is wrong, as seq_hi already contains the correct seq_hi. For example, when seq_hi=1, the driver actually tells NIC to use seq_hi=2 (esn). This incorrect esn value causes all subsequent packets to fail integrity checks (SA-icv-failure). So, do not increment. Fixes: cb01008390bb ("net/mlx5: IPSec, Add support for ESN") Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com> Acked-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20221026135153.154807-2-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>