aboutsummaryrefslogtreecommitdiffstats
path: root/net/sched/cls_api.c
AgeCommit message (Collapse)Author
2023-06-14net: sched: fix possible refcount leak in tc_chain_tmplt_add()Hangyu Hua
[ Upstream commit 44f8baaf230c655c249467ca415b570deca8df77 ] try_module_get will be called in tcf_proto_lookup_ops. So module_put needs to be called to drop the refcount if ops don't implement the required function. Fixes: 9f407f1768d3 ("net: sched: introduce chain templates") Signed-off-by: Hangyu Hua <hbh25y@gmail.com> Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-14net: sched: move rtm_tca_policy declaration to include fileEric Dumazet
[ Upstream commit 886bc7d6ed3357975c5f1d3c784da96000d4bbb4 ] rtm_tca_policy is used from net/sched/sch_api.c and net/sched/cls_api.c, thus should be declared in an include file. This fixes the following sparse warning: net/sched/sch_api.c:1434:25: warning: symbol 'rtm_tca_policy' was not declared. Should it be static? Fixes: e331473fee3d ("net/sched: cls_api: add missing validation of netlink attributes") Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-14net_sched: fix a crash in tc_new_tfilter()Cong Wang
commit 460b360104d51552a57f39e54b2589c9fd7fa0b3 upstream. When tcf_block_find() fails, it already rollbacks the qdisc refcnt, so its caller doesn't need to clean up this again. Avoid calling qdisc_put() again by resetting qdisc to NULL for callers. Reported-by: syzbot+37b8770e6d5a8220a039@syzkaller.appspotmail.com Fixes: e368fdb61d8e ("net: sched: use Qdisc rcu API instead of relying on rtnl lock") Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-14net: sched: use Qdisc rcu API instead of relying on rtnl lockVlad Buslov
[ Upstream commit e368fdb61d8e7c67ac70791b23345b26d7bbc661 ] As a preparation from removing rtnl lock dependency from rules update path, use Qdisc rcu and reference counting capabilities instead of relying on rtnl lock while working with Qdiscs. Create new tcf_block_release() function, and use it to free resources taken by tcf_block_find(). Currently, this function only releases Qdisc and it is extended in next patches in this series. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> [Lee: Sent to Stable] Link: https://syzkaller.appspot.com/bug?id=d7e411c5472dd5da33d8cc921ccadc747743a568 Reported-by: syzbot+5f229e48cccc804062c0@syzkaller.appspotmail.com Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-28net: sched: cls_api: Fix the the wrong parameterYajun Deng
[ Upstream commit 9d85a6f44bd5585761947f40f7821c9cd78a1bbe ] The 4th parameter in tc_chain_notify() should be flags rather than seq. Let's change it back correctly. Fixes: 32a4f5ecd738 ("net: sched: introduce chain object to uapi") Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-07-22sched: consistently handle layer3 header accesses in the presence of VLANsToke Høiland-Jørgensen
[ Upstream commit d7bf2ebebc2bd61ab95e2a8e33541ef282f303d4 ] There are a couple of places in net/sched/ that check skb->protocol and act on the value there. However, in the presence of VLAN tags, the value stored in skb->protocol can be inconsistent based on whether VLAN acceleration is enabled. The commit quoted in the Fixes tag below fixed the users of skb->protocol to use a helper that will always see the VLAN ethertype. However, most of the callers don't actually handle the VLAN ethertype, but expect to find the IP header type in the protocol field. This means that things like changing the ECN field, or parsing diffserv values, stops working if there's a VLAN tag, or if there are multiple nested VLAN tags (QinQ). To fix this, change the helper to take an argument that indicates whether the caller wants to skip the VLAN tags or not. When skipping VLAN tags, we make sure to skip all of them, so behaviour is consistent even in QinQ mode. To make the helper usable from the ECN code, move it to if_vlan.h instead of pkt_sched.h. v3: - Remove empty lines - Move vlan variable definitions inside loop in skb_protocol() - Also use skb_protocol() helper in IP{,6}_ECN_decapsulate() and bpf_skb_ecn_set_ce() v2: - Use eth_type_vlan() helper in skb_protocol() - Also fix code that reads skb->protocol directly - Change a couple of 'if/else if' statements to switch constructs to avoid calling the helper twice Reported-by: Ilya Ponetayev <i.ponetaev@ndmsystems.com> Fixes: d8b9605d2697 ("net: sched: fix skb->protocol use in case of accelerated vlan path") Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-05net: sched: fix possible crash in tcf_action_destroy()Eric Dumazet
[ Upstream commit 3d66b89c30f9220a72e92847768fc8ba4d027d88 ] If the allocation done in tcf_exts_init() failed, we end up with a NULL pointer in exts->actions. kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 8198 Comm: syz-executor.3 Not tainted 5.3.0-rc8+ #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:tcf_action_destroy+0x71/0x160 net/sched/act_api.c:705 Code: c3 08 44 89 ee e8 4f cb bb fb 41 83 fd 20 0f 84 c9 00 00 00 e8 c0 c9 bb fb 48 89 d8 48 b9 00 00 00 00 00 fc ff df 48 c1 e8 03 <80> 3c 08 00 0f 85 c0 00 00 00 4c 8b 33 4d 85 f6 0f 84 9d 00 00 00 RSP: 0018:ffff888096e16ff0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: dffffc0000000000 RDX: 0000000000040000 RSI: ffffffff85b6ab30 RDI: 0000000000000000 RBP: ffff888096e17020 R08: ffff8880993f6140 R09: fffffbfff11cae67 R10: fffffbfff11cae66 R11: ffffffff88e57333 R12: 0000000000000000 R13: 0000000000000000 R14: ffff888096e177a0 R15: 0000000000000001 FS: 00007f62bc84a700(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000758040 CR3: 0000000088b64000 CR4: 00000000001426e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: tcf_exts_destroy+0x38/0xb0 net/sched/cls_api.c:3030 tcindex_set_parms+0xf7f/0x1e50 net/sched/cls_tcindex.c:488 tcindex_change+0x230/0x318 net/sched/cls_tcindex.c:519 tc_new_tfilter+0xa4b/0x1c70 net/sched/cls_api.c:2152 rtnetlink_rcv_msg+0x838/0xb00 net/core/rtnetlink.c:5214 netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477 rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5241 netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline] netlink_unicast+0x531/0x710 net/netlink/af_netlink.c:1328 netlink_sendmsg+0x8a5/0xd60 net/netlink/af_netlink.c:1917 sock_sendmsg_nosec net/socket.c:637 [inline] sock_sendmsg+0xd7/0x130 net/socket.c:657 ___sys_sendmsg+0x3e2/0x920 net/socket.c:2311 __sys_sendmmsg+0x1bf/0x4d0 net/socket.c:2413 __do_sys_sendmmsg net/socket.c:2442 [inline] Fixes: 90b73b77d08e ("net: sched: change action API to use array of pointers to actions") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Vlad Buslov <vladbu@mellanox.com> Cc: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-28net: sched: verify that q!=NULL before setting q->flagsVlad Buslov
commit 503d81d428bd598430f7f9d02021634e1a8139a0 upstream. In function int tc_new_tfilter() q pointer can be NULL when adding filter on a shared block. With recent change that resets TCQ_F_CAN_BYPASS after filter creation, following NULL pointer dereference happens in case parent block is shared: [ 212.925060] BUG: kernel NULL pointer dereference, address: 0000000000000010 [ 212.925445] #PF: supervisor write access in kernel mode [ 212.925709] #PF: error_code(0x0002) - not-present page [ 212.925965] PGD 8000000827923067 P4D 8000000827923067 PUD 827924067 PMD 0 [ 212.926302] Oops: 0002 [#1] SMP KASAN PTI [ 212.926539] CPU: 18 PID: 2617 Comm: tc Tainted: G B 5.2.0+ #512 [ 212.926938] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017 [ 212.927364] RIP: 0010:tc_new_tfilter+0x698/0xd40 [ 212.927633] Code: 74 0d 48 85 c0 74 08 48 89 ef e8 03 aa 62 00 48 8b 84 24 a0 00 00 00 48 8d 78 10 48 89 44 24 18 e8 4d 0c 6b ff 48 8b 44 24 18 <83> 60 10 f b 48 85 ed 0f 85 3d fe ff ff e9 4f fe ff ff e8 81 26 f8 [ 212.928607] RSP: 0018:ffff88884fd5f5d8 EFLAGS: 00010296 [ 212.928905] RAX: 0000000000000000 RBX: 0000000000000000 RCX: dffffc0000000000 [ 212.929201] RDX: 0000000000000007 RSI: 0000000000000004 RDI: 0000000000000297 [ 212.929402] RBP: ffff88886bedd600 R08: ffffffffb91d4b51 R09: fffffbfff7616e4d [ 212.929609] R10: fffffbfff7616e4c R11: ffffffffbb0b7263 R12: ffff88886bc61040 [ 212.929803] R13: ffff88884fd5f950 R14: ffffc900039c5000 R15: ffff88835e927680 [ 212.929999] FS: 00007fe7c50b6480(0000) GS:ffff88886f980000(0000) knlGS:0000000000000000 [ 212.930235] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 212.930394] CR2: 0000000000000010 CR3: 000000085bd04002 CR4: 00000000001606e0 [ 212.930588] Call Trace: [ 212.930682] ? tc_del_tfilter+0xa40/0xa40 [ 212.930811] ? __lock_acquire+0x5b5/0x2460 [ 212.930948] ? find_held_lock+0x85/0xa0 [ 212.931081] ? tc_del_tfilter+0xa40/0xa40 [ 212.931201] rtnetlink_rcv_msg+0x4ab/0x5f0 [ 212.931332] ? rtnl_dellink+0x490/0x490 [ 212.931454] ? lockdep_hardirqs_on+0x260/0x260 [ 212.931589] ? netlink_deliver_tap+0xab/0x5a0 [ 212.931717] ? match_held_lock+0x1b/0x240 [ 212.931844] netlink_rcv_skb+0xd0/0x200 [ 212.931958] ? rtnl_dellink+0x490/0x490 [ 212.932079] ? netlink_ack+0x440/0x440 [ 212.932205] ? netlink_deliver_tap+0x161/0x5a0 [ 212.932335] ? lock_downgrade+0x360/0x360 [ 212.932457] ? lock_acquire+0xe5/0x210 [ 212.932579] netlink_unicast+0x296/0x350 [ 212.932705] ? netlink_attachskb+0x390/0x390 [ 212.932834] ? _copy_from_iter_full+0xe0/0x3a0 [ 212.932976] netlink_sendmsg+0x394/0x600 [ 212.937998] ? netlink_unicast+0x350/0x350 [ 212.943033] ? move_addr_to_kernel.part.0+0x90/0x90 [ 212.948115] ? netlink_unicast+0x350/0x350 [ 212.953185] sock_sendmsg+0x96/0xa0 [ 212.958099] ___sys_sendmsg+0x482/0x520 [ 212.962881] ? match_held_lock+0x1b/0x240 [ 212.967618] ? copy_msghdr_from_user+0x250/0x250 [ 212.972337] ? lock_downgrade+0x360/0x360 [ 212.976973] ? rwlock_bug.part.0+0x60/0x60 [ 212.981548] ? __mod_node_page_state+0x1f/0xa0 [ 212.986060] ? match_held_lock+0x1b/0x240 [ 212.990567] ? find_held_lock+0x85/0xa0 [ 212.994989] ? do_user_addr_fault+0x349/0x5b0 [ 212.999387] ? lock_downgrade+0x360/0x360 [ 213.003713] ? find_held_lock+0x85/0xa0 [ 213.007972] ? __fget_light+0xa1/0xf0 [ 213.012143] ? sockfd_lookup_light+0x91/0xb0 [ 213.016165] __sys_sendmsg+0xba/0x130 [ 213.020040] ? __sys_sendmsg_sock+0xb0/0xb0 [ 213.023870] ? handle_mm_fault+0x337/0x470 [ 213.027592] ? page_fault+0x8/0x30 [ 213.031316] ? lockdep_hardirqs_off+0xbe/0x100 [ 213.034999] ? mark_held_locks+0x24/0x90 [ 213.038671] ? do_syscall_64+0x1e/0xe0 [ 213.042297] do_syscall_64+0x74/0xe0 [ 213.045828] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 213.049354] RIP: 0033:0x7fe7c527c7b8 [ 213.052792] Code: 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 65 8f 0c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f 0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 89 54 [ 213.060269] RSP: 002b:00007ffc3f7908a8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 213.064144] RAX: ffffffffffffffda RBX: 000000005d34716f RCX: 00007fe7c527c7b8 [ 213.068094] RDX: 0000000000000000 RSI: 00007ffc3f790910 RDI: 0000000000000003 [ 213.072109] RBP: 0000000000000000 R08: 0000000000000001 R09: 00007fe7c5340cc0 [ 213.076113] R10: 0000000000404ec2 R11: 0000000000000246 R12: 0000000000000080 [ 213.080146] R13: 0000000000480640 R14: 0000000000000080 R15: 0000000000000000 [ 213.084147] Modules linked in: act_gact cls_flower sch_ingress nfsv3 nfs_acl nfs lockd grace fscache bridge stp llc sunrpc intel_rapl_msr intel_rapl_common [<1;69;32Msb_edac rdma_ucm rdma_cm x86_pkg_temp_thermal iw_cm intel_powerclamp ib_cm coretemp kvm_intel kvm irqbypass mlx5_ib ib_uverbs ib_core crct10dif_pclmul crc32_pc lmul crc32c_intel ghash_clmulni_intel mlx5_core intel_cstate intel_uncore iTCO_wdt igb iTCO_vendor_support mlxfw mei_me ptp ses intel_rapl_perf mei pcspkr ipmi _ssif i2c_i801 joydev enclosure pps_core lpc_ich ioatdma wmi dca ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad ast i2c_algo_bit drm_vram_helpe r ttm drm_kms_helper drm mpt3sas raid_class scsi_transport_sas [ 213.112326] CR2: 0000000000000010 [ 213.117429] ---[ end trace adb58eb0a4ee6283 ]--- Verify that q pointer is not NULL before setting the 'flags' field. Fixes: 3f05e6886a59 ("net_sched: unset TCQ_F_CAN_BYPASS when adding filters") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-28net_sched: unset TCQ_F_CAN_BYPASS when adding filtersCong Wang
[ Upstream commit 3f05e6886a595c9a29a309c52f45326be917823c ] For qdisc's that support TC filters and set TCQ_F_CAN_BYPASS, notably fq_codel, it makes no sense to let packets bypass the TC filters we setup in any scenario, otherwise our packets steering policy could not be enforced. This can be reproduced easily with the following script: ip li add dev dummy0 type dummy ifconfig dummy0 up tc qd add dev dummy0 root fq_codel tc filter add dev dummy0 parent 8001: protocol arp basic action mirred egress redirect dev lo tc filter add dev dummy0 parent 8001: protocol ip basic action mirred egress redirect dev lo ping -I dummy0 192.168.112.1 Without this patch, packets are sent directly to dummy0 without hitting any of the filters. With this patch, packets are redirected to loopback as expected. This fix is not perfect, it only unsets the flag but does not set it back because we have to save the information somewhere in the qdisc if we really want that. Note, both fq_codel and sfq clear this flag in their ->bind_tcf() but this is clearly not sufficient when we don't use any class ID. Fixes: 23624935e0c4 ("net_sched: TCQ_F_CAN_BYPASS generalization") Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-31net_sched: refetch skb protocol for each filterCong Wang
[ Upstream commit cd0c4e70fc0ccfa705cdf55efb27519ce9337a26 ] Martin reported a set of filters don't work after changing from reclassify to continue. Looking into the code, it looks like skb protocol is not always fetched for each iteration of the filters. But, as demonstrated by Martin, TC actions could modify skb->protocol, for example act_vlan, this means we have to refetch skb protocol in each iteration, rather than using the one we fetch in the beginning of the loop. This bug is _not_ introduced by commit 3b3ae880266d ("net: sched: consolidate tc_classify{,_compat}"), technically, if act_vlan is the only action that modifies skb protocol, then it is commit c7e2b9689ef8 ("sched: introduce vlan action") which introduced this bug. Reported-by: Martin Olsson <martin.olsson+netdev@sentorsecurity.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-15net/sched: cls_api: add missing validation of netlink attributesDavide Caratti
Similarly to what has been done in 8b4c3cdd9dd8 ("net: sched: Add policy validation for tc attributes"), fix classifier code to add validation of TCA_CHAIN and TCA_KIND netlink attributes. tested with: # ./tdc.py -c filter v2: Let sch_api and cls_api share nla_policy they have in common, thanks to David Ahern. v3: Avoid EXPORT_SYMBOL(), as validation of those attributes is not done by TC modules, thanks to Cong Wang. While at it, restore the 'Delete / get qdisc' comment to its orginal position, just above tc_get_qdisc() function prototype. Fixes: 5bc1701881e39 ("net: sched: introduce multichain support for filters") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-13net_sched: notify filter deletion when deleting a chainCong Wang
When we delete a chain of filters, we need to notify user-space we are deleting each filters in this chain too. Fixes: 32a4f5ecd738 ("net: sched: introduce chain object to uapi") Cc: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-27net: sched: return -ENOENT when trying to remove filter from non-existent chainJiri Pirko
When chain 0 was implicitly created, removal of non-existent filter from chain 0 gave -ENOENT. Once chain 0 became non-implicit, the same call is giving -EINVAL. Fix this by returning -ENOENT in that case. Reported-by: Roman Mashak <mrv@mojatatu.com> Fixes: f71e0ca4db18 ("net: sched: Avoid implicit chain 0 creation") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-27net: sched: fix extack error message when chain is failed to be createdJiri Pirko
Instead "Cannot find" say "Cannot create". Fixes: c35a4acc2985 ("net: sched: cls_api: handle generic cls errors") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11net: sched: extend action ops with put_dev callbackVlad Buslov
As a preparation for removing dependency on rtnl lock from rules update path, all users of shared objects must take reference while working with them. Extend action ops with put_dev() API to be used on net device returned by get_dev(). Modify mirred action (only action that implements get_dev callback): - Take reference to net device in get_dev. - Implement put_dev API that releases reference to net device. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-09net: sched: fix block->refcnt decrementJiri Pirko
Currently the refcnt is never decremented in case the value is not 1. Fix it by adding decrement in case the refcnt is not 1. Reported-by: Vlad Buslov <vladbu@mellanox.com> Fixes: f71e0ca4db18 ("net: sched: Avoid implicit chain 0 creation") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-03net: sched: fix flush on non-existing chainJiri Pirko
User was able to perform filter flush on chain 0 even if it didn't have any filters in it. With the patch that avoided implicit chain 0 creation, this changed. So in case user wants filter flush on chain which does not exist, just return success. There's no reason for non-0 chains to behave differently than chain 0, so do the same for them. Reported-by: Ido Schimmel <idosch@mellanox.com> Fixes: f71e0ca4db18 ("net: sched: Avoid implicit chain 0 creation") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01net: sched: make tcf_chain_{get,put}() staticJiri Pirko
These are no longer used outside of cls_api.c so make them static. Move tcf_chain_flush() to avoid fwd declaration of tcf_chain_put(). Signed-off-by: Jiri Pirko <jiri@mellanox.com> v1->v2: - new patch Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01net: sched: fix notifications for action-held chainsJiri Pirko
Chains that only have action references serve as placeholders. Until a non-action reference is created, user should not be aware of the chain. Also he should not receive any notifications about it. So send notifications for the new chain only in case the chain gets the first non-action reference. Symmetrically to that, when the last non-action reference is dropped, send the notification about deleted chain. Reported-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> v1->v2: - made __tcf_chain_{get,put}() static as suggested by Cong Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01net: sched: change name of zombie chain to "held_by_acts_only"Jiri Pirko
As mentioned by Cong and Jakub during the review process, it is a bit odd to sometimes (act flow) create a new chain which would be immediately a "zombie". So just rename it to "held_by_acts_only". Signed-off-by: Jiri Pirko <jiri@mellanox.com> Suggested-by: Cong Wang <xiyou.wangcong@gmail.com> Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-27net: sched: don't dump chains only held by actionsJiri Pirko
In case a chain is empty and not explicitly created by a user, such chain should not exist. The only exception is if there is an action "goto chain" pointing to it. In that case, don't show the chain in the dump. Track the chain references held by actions and use them to find out if a chain should or should not be shown in chain dump. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-26net: sched: unmark chain as explicitly created on deleteJiri Pirko
Once user manually deletes the chain using "chain del", the chain cannot be marked as explicitly created anymore. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Fixes: 32a4f5ecd738 ("net: sched: introduce chain object to uapi") Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-26net: sched: cls_api: fix dead code in switchGustavo A. R. Silva
Code at line 1850 is unreachable. Fix this by removing the break statement above it, so the code for case RTM_GETCHAIN can be properly executed. Addresses-Coverity-ID: 1472050 ("Structurally dead code") Fixes: 32a4f5ecd738 ("net: sched: introduce chain object to uapi") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23net: sched: introduce chain templatesJiri Pirko
Allow user to set a template for newly created chains. Template lock down the chain for particular classifier type/options combinations. The classifier needs to support templates, otherwise kernel would reply with error. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23net: sched: introduce chain object to uapiJiri Pirko
Allow user to create, destroy, get and dump chain objects. Do that by extending rtnl commands by the chain-specific ones. User will now be able to explicitly create or destroy chains (so far this was done only automatically according the filter/act needs and refcounting). Also, the user will receive notification about any chain creation or destuction. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23net: sched: Avoid implicit chain 0 creationJiri Pirko
Currently, chain 0 is implicitly created during block creation. However that does not align with chain object exposure, creation and destruction api introduced later on. So make the chain 0 behave the same way as any other chain and only create it when it is needed. Since chain 0 is somehow special as the qdiscs need to hold pointer to the first chain tp, this requires to move the chain head change callback infra to the block structure. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23net: sched: push ops lookup bits into tcf_proto_lookup_ops()Jiri Pirko
Push all bits that take care of ops lookup, including module loading outside tcf_proto_create() function, into tcf_proto_lookup_ops() Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-21net: sched: use PTR_ERR_OR_ZERO macro in tcf_block_cb_registerGustavo A. R. Silva
This line makes up what macro PTR_ERR_OR_ZERO already does. So, make use of PTR_ERR_OR_ZERO rather than an open-code version. This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-20Merge ra.kernel.org:/pub/scm/linux/kernel/git/torvalds/linuxDavid S. Miller
All conflicts were trivial overlapping changes, so reasonably easy to resolve. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-18net: sched: Using NULL instead of plain integerYueHaibing
Fixes the following sparse warnings: net/sched/cls_api.c:1101:43: warning: Using plain integer as NULL pointer net/sched/cls_api.c:1492:75: warning: Using plain integer as NULL pointer Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-13net: sched: refactor flower walk to iterate over idrVlad Buslov
Extend struct tcf_walker with additional 'cookie' field. It is intended to be used by classifier walk implementations to continue iteration directly from particular filter, instead of iterating 'skip' number of times. Change flower walk implementation to save filter handle in 'cookie'. Each time flower walk is called, it looks up filter with saved handle directly with idr, instead of iterating over filter linked list 'skip' number of times. This change improves complexity of dumping flower classifier from quadratic to linearithmic. (assuming idr lookup has logarithmic complexity) Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reported-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-08net: sched: change action API to use array of pointers to actionsVlad Buslov
Act API used linked list to pass set of actions to functions. It is intrusive data structure that stores list nodes inside action structure itself, which means it is not safe to modify such list concurrently. However, action API doesn't use any linked list specific operations on this set of actions, so it can be safely refactored into plain pointer array. Refactor action API to use array of pointers to tc_actions instead of linked list. Change argument 'actions' type of exported action init, destroy and dump functions. Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-08net: sched: implement reference counted action releaseVlad Buslov
Implement helper delete function that uses new action ops 'delete', instead of destroying action directly. This is required so act API could delete actions by index, without holding any references to action that is being deleted. Implement function __tcf_action_put() that releases reference to action and frees it, if necessary. Refactor action deletion code to use new put function and not to rely on rtnl lock. Remove rtnl lock assertions that are no longer needed. Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-08net: sched: implement unlocked action init APIVlad Buslov
Add additional 'rtnl_held' argument to act API init functions. It is required to implement actions that need to release rtnl lock before loading kernel module and reacquire if afterwards. Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-26net: sched: call reoffload op on block callback regJohn Hurley
Call the reoffload tcf_proto_op on all tcf_proto nodes in all chains of a block when a callback tries to register to a block that already has offloaded rules. If all existing rules cannot be offloaded then the registration is rejected. This replaces the previous policy of rejecting such callback registration outright. On unregistration of a callback, the rules are flushed for that given cb. The implementation of block sharing in the NFP driver, for example, duplicates shared rules to all devs bound to a block. This meant that rules could still exist in hw even after a device is unbound from a block (assuming the block still remains active). Signed-off-by: John Hurley <john.hurley@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-26net: sched: pass extack pointer to block binds and cb registrationJohn Hurley
Pass the extact struct from a tc qdisc add to the block bind function and, in turn, to the setup_tc ndo of binding device via the tc_block_offload struct. Pass this back to any block callback registrations to allow netlink logging of fails in the bind process. Signed-off-by: John Hurley <john.hurley@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-06Revert "net: sched: cls: Fix offloading when ingress dev is vxlan"David S. Miller
This reverts commit d96a43c66464cdf0b249fdf47b6dcd65b83af8c0. This potentially breaks things, so reverting as per request by Jakub Kicinski. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-05net: sched: cls: Fix offloading when ingress dev is vxlanPaul Blakey
When using a vxlan device as the ingress dev, we count it as a "no offload dev", so when such a rule comes and err stop is true, we fail early and don't try the egdev route which can offload it through the egress device. Fix that by not calling the block offload if one of the devices attached to it is not offload capable, but make sure egress on such case is capable instead. Fixes: caa7260156eb ("net: sched: keep track of offloaded filters [..]") Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-04net: sched: return error code when tcf proto is not foundVlad Buslov
If requested tcf proto is not found, get and del filter netlink protocol handlers output error message to extack, but do not return actual error code. Add check to return ENOENT when result of tp find function is NULL pointer. Fixes: c431f89b18a2 ("net: sched: split tc_ctl_tfilter into three handlers") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-01net: sched: split tc_ctl_tfilter into three handlersVlad Buslov
tc_ctl_tfilter handles three netlink message types: RTM_NEWTFILTER, RTM_DELTFILTER, RTM_GETTFILTER. However, implementation of this function involves a lot of branching on specific message type because most of the code is message-specific. This significantly complicates adding new functionality and doesn't provide much benefit of code reuse. Split tc_ctl_tfilter to three standalone functions that handle filter new, delete and get requests. The only truly protocol independent part of tc_ctl_tfilter is code that looks up queue, class, and block. Refactor this code to standalone tcf_block_find function that is used by all three new handlers. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Lots of easy overlapping changes in the confict resolutions here. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24net_sched: switch to rcu_workCong Wang
Commit 05f0fe6b74db ("RCU, workqueue: Implement rcu_work") introduces new API's for dispatching work in a RCU callback. Now we can just switch to the new API's for tc filters. This could get rid of a lot of code. Cc: Tejun Heo <tj@kernel.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24net : sched: cls_api: deal with egdev path only if neededOr Gerlitz
When dealing with ingress rule on a netdev, if we did fine through the conventional path, there's no need to continue into the egdev route, and we can stop right there. Not doing so may cause a 2nd rule to be added by the cls api layer with the ingress being the egdev. For example, under sriov switchdev scheme, a user rule of VFR A --> VFR B will end up with two HW rules (1) VF A --> VF B and (2) uplink --> VF B Fixes: 208c0f4b5237 ('net: sched: use tc_setup_cb_call to call per-block callbacks') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-11net: sched: fix error path in tcf_proto_create() when modules are not configuredJiri Pirko
In case modules are not configured, error out when tp->ops is null and prevent later null pointer dereference. Fixes: 33a48927c193 ("sched: push TC filter protocol creation into a separate function") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-27net: Drop pernet_operations::asyncKirill Tkhai
Synchronous pernet_operations are not allowed anymore. All are asynchronous. So, drop the structure member. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-09net sched actions: update Add/Delete action API with new argumentRoman Mashak
Introduce a new function argument to carry total attributes size for correct allocation of skb in event messages. Signed-off-by: Roman Mashak <mrv@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27net: Convert simple pernet_operationsKirill Tkhai
These pernet_operations make pretty simple actions like variable initialization on init, debug checks on exit, and so on, and they obviously are able to be executed in parallel with any others: vrf_net_ops lockd_net_ops grace_net_ops xfrm6_tunnel_net_ops kcm_net_ops tcf_net_ops Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2018-02-20net: sched: report if filter is too large to dumpRoman Kapl
So far, if the filter was too large to fit in the allocated skb, the kernel did not return any error and stopped dumping. Modify the dumper so that it returns -EMSGSIZE when a filter fails to dump and it is the first filter in the skb. If we are not first, we will get a next chance with more room. I understand this is pretty near to being an API change, but the original design (silent truncation) can be considered a bug. Note: The error case can happen pretty easily if you create a filter with 32 actions and have 4kb pages. Also recent versions of iproute try to be clever with their buffer allocation size, which in turn leads to Signed-off-by: Roman Kapl <code@rkapl.cz> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller