summaryrefslogtreecommitdiffstats
path: root/net/netfilter/nft_ct.c
AgeCommit message (Collapse)Author
2018-09-28netfilter: nf_tables: add requirements for connsecmark supportChristian Göttsche
Add ability to set the connection tracking secmark value. Add ability to set the meta secmark value. Signed-off-by: Christian Göttsche <cgzones@googlemail.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-20netfilter: conntrack: remove l3->l4 mapping informationFlorian Westphal
l4 protocols are demuxed by l3num, l4num pair. However, almost all l4 trackers are l3 agnostic. Only exceptions are: - gre, icmp (ipv4 only) - icmpv6 (ipv6 only) This commit gets rid of the l3 mapping, l4 trackers can now be looked up by their IPPROTO_XXX value alone, which gets rid of the additional l3 indirection. For icmp, ipcmp6 and gre, add a check on state->pf and return -NF_ACCEPT in case we're asked to track e.g. icmpv6-in-ipv4, this seems more fitting than using the generic tracker. Additionally we can kill the 2nd l4proto definitions that were needed for v4/v6 split -- they are now the same so we can use single l4proto struct for each protocol, rather than two. The EXPORT_SYMBOLs can be removed as all these object files are part of nf_conntrack with no external references. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-17netfilter: cttimeout: remove superfluous check on layer 4 netlink functionsPablo Neira Ayuso
We assume they are always set accordingly since a874752a10da ("netfilter: conntrack: timeout interface depend on CONFIG_NF_CONNTRACK_TIMEOUT"), so we can get rid of this checks. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-29netfilter: nf_tables: rework ct timeout set supportFlorian Westphal
Using a private template is problematic: 1. We can't assign both a zone and a timeout policy (zone assigns a conntrack template, so we hit problem 1) 2. Using a template needs to take care of ct refcount, else we'll eventually free the private template due to ->use underflow. This patch reworks template policy to instead work with existing conntrack. As long as such conntrack has not yet been placed into the hash table (unconfirmed) we can still add the timeout extension. The only caveat is that we now need to update/correct ct->timeout to reflect the initial/new state, otherwise the conntrack entry retains the default 'new' timeout. Side effect of this change is that setting the policy must now occur from chains that are evaluated *after* the conntrack lookup has taken place. No released kernel contains the timeout policy feature yet, so this change should be ok. Changes since v2: - don't handle 'ct is confirmed case' - after previous patch, no need to special-case tcp/dccp/sctp timeout anymore Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-16netfilter: nft_ct: make l3 protocol field optional for timeout objectHarsha Sharma
If l3 protocol value is not specified for ct timeout object then use the value from nft_ctx protocol family. Signed-off-by: Harsha Sharma <harshasharmaiitr@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-07netfilter: nft_ct: enable conntrack for helpersPablo Neira Ayuso
Enable conntrack if the user defines a helper to be used from the ruleset policy. Fixes: 1a64edf54f55 ("netfilter: nft_ct: add helper set support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-07netfilter: nft_ct: add ct timeout supportHarsha Sharma
This patch allows to add, list and delete connection tracking timeout policies via nft objref infrastructure and assigning these timeout via nft rule. %./libnftnl/examples/nft-ct-timeout-add ip raw cttime tcp Ruleset: table ip raw { ct timeout cttime { protocol tcp; policy = {established: 111, close: 13 } } chain output { type filter hook output priority -300; policy accept; ct timeout set "cttime" } } %./libnftnl/examples/nft-rule-ct-timeout-add ip raw output cttime %conntrack -E [NEW] tcp 6 111 ESTABLISHED src=172.16.19.128 dst=172.16.19.1 sport=22 dport=41360 [UNREPLIED] src=172.16.19.1 dst=172.16.19.128 sport=41360 dport=22 %nft delete rule ip raw output handle <handle> %./libnftnl/examples/nft-ct-timeout-del ip raw cttime Joint work with Pablo Neira. Signed-off-by: Harsha Sharma <harshasharmaiitr@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-18netfilter: Remove useless param helper of nf_ct_helper_ext_addGao Feng
The param helper of nf_ct_helper_ext_add is useless now, then remove it now. Signed-off-by: Gao Feng <gfree.wind@vip.163.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Filling in the padding slot in the bpf structure as a bug fix in 'ne' overlapped with actually using that padding area for something in 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-03netfilter: nf_tables: pass context to object destroy indirectionPablo Neira Ayuso
The new connlimit object needs this to properly deal with conntrack dependencies. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-05-17netfilter: nf_tables: fix NULL pointer dereference on nft_ct_helper_obj_dump()Taehee Yoo
In the nft_ct_helper_obj_dump(), always priv->helper4 is dereferenced. But if family is ipv6, priv->helper6 should be dereferenced. Steps to reproduces: #test.nft table ip6 filter { ct helper ftp { type "ftp" protocol tcp } chain input { type filter hook input priority 4; ct helper set "ftp" } } %nft -f test.nft %nft list ruleset we can see the below messages: [ 916.286233] kasan: GPF could be caused by NULL-ptr deref or user memory access [ 916.294777] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI [ 916.302613] Modules linked in: nft_objref nf_conntrack_sip nf_conntrack_snmp nf_conntrack_broadcast nf_conntrack_ftp nft_ct nf_conntrack nf_tables nfnetlink [last unloaded: nfnetlink] [ 916.318758] CPU: 1 PID: 2093 Comm: nft Not tainted 4.17.0-rc4+ #181 [ 916.326772] Hardware name: To be filled by O.E.M. To be filled by O.E.M./Aptio CRB, BIOS 5.6.5 07/08/2015 [ 916.338773] RIP: 0010:strlen+0x1a/0x90 [ 916.342781] RSP: 0018:ffff88010ff0f2f8 EFLAGS: 00010292 [ 916.346773] RAX: dffffc0000000000 RBX: ffff880119b26ee8 RCX: ffff88010c150038 [ 916.354777] RDX: 0000000000000002 RSI: ffff880119b26ee8 RDI: 0000000000000010 [ 916.362773] RBP: 0000000000000010 R08: 0000000000007e88 R09: ffff88010c15003c [ 916.370773] R10: ffff88010c150037 R11: ffffed002182a007 R12: ffff88010ff04040 [ 916.378779] R13: 0000000000000010 R14: ffff880119b26f30 R15: ffff88010ff04110 [ 916.387265] FS: 00007f57a1997700(0000) GS:ffff88011b800000(0000) knlGS:0000000000000000 [ 916.394785] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 916.402778] CR2: 00007f57a0ac80f0 CR3: 000000010ff02000 CR4: 00000000001006e0 [ 916.410772] Call Trace: [ 916.414787] nft_ct_helper_obj_dump+0x94/0x200 [nft_ct] [ 916.418779] ? nft_ct_set_eval+0x560/0x560 [nft_ct] [ 916.426771] ? memset+0x1f/0x40 [ 916.426771] ? __nla_reserve+0x92/0xb0 [ 916.434774] ? memcpy+0x34/0x50 [ 916.434774] nf_tables_fill_obj_info+0x484/0x860 [nf_tables] [ 916.442773] ? __nft_release_basechain+0x600/0x600 [nf_tables] [ 916.450779] ? lock_acquire+0x193/0x380 [ 916.454771] ? lock_acquire+0x193/0x380 [ 916.458789] ? nf_tables_dump_obj+0x148/0xcb0 [nf_tables] [ 916.462777] nf_tables_dump_obj+0x5f0/0xcb0 [nf_tables] [ 916.470769] ? __alloc_skb+0x30b/0x500 [ 916.474779] netlink_dump+0x752/0xb50 [ 916.478775] __netlink_dump_start+0x4d3/0x750 [ 916.482784] nf_tables_getobj+0x27a/0x930 [nf_tables] [ 916.490774] ? nft_obj_notify+0x100/0x100 [nf_tables] [ 916.494772] ? nf_tables_getobj+0x930/0x930 [nf_tables] [ 916.502579] ? nf_tables_dump_flowtable_done+0x70/0x70 [nf_tables] [ 916.506774] ? nft_obj_notify+0x100/0x100 [nf_tables] [ 916.514808] nfnetlink_rcv_msg+0x8ab/0xa86 [nfnetlink] [ 916.518771] ? nfnetlink_rcv_msg+0x550/0xa86 [nfnetlink] [ 916.526782] netlink_rcv_skb+0x23e/0x360 [ 916.530773] ? nfnetlink_bind+0x200/0x200 [nfnetlink] [ 916.534778] ? debug_check_no_locks_freed+0x280/0x280 [ 916.542770] ? netlink_ack+0x870/0x870 [ 916.546786] ? ns_capable_common+0xf4/0x130 [ 916.550765] nfnetlink_rcv+0x172/0x16c0 [nfnetlink] [ 916.554771] ? sched_clock_local+0xe2/0x150 [ 916.558774] ? sched_clock_cpu+0x144/0x180 [ 916.566575] ? lock_acquire+0x380/0x380 [ 916.570775] ? sched_clock_local+0xe2/0x150 [ 916.574765] ? nfnetlink_net_init+0x130/0x130 [nfnetlink] [ 916.578763] ? sched_clock_cpu+0x144/0x180 [ 916.582770] ? lock_acquire+0x193/0x380 [ 916.590771] ? lock_acquire+0x193/0x380 [ 916.594766] ? lock_acquire+0x380/0x380 [ 916.598760] ? netlink_deliver_tap+0x262/0xa60 [ 916.602766] ? lock_acquire+0x193/0x380 [ 916.606766] netlink_unicast+0x3ef/0x5a0 [ 916.610771] ? netlink_attachskb+0x630/0x630 [ 916.614763] netlink_sendmsg+0x72a/0xb00 [ 916.618769] ? netlink_unicast+0x5a0/0x5a0 [ 916.626766] ? _copy_from_user+0x92/0xc0 [ 916.630773] __sys_sendto+0x202/0x300 [ 916.634772] ? __ia32_sys_getpeername+0xb0/0xb0 [ 916.638759] ? lock_acquire+0x380/0x380 [ 916.642769] ? lock_acquire+0x193/0x380 [ 916.646761] ? finish_task_switch+0xf4/0x560 [ 916.650763] ? __schedule+0x582/0x19a0 [ 916.655301] ? __sched_text_start+0x8/0x8 [ 916.655301] ? up_read+0x1c/0x110 [ 916.655301] ? __do_page_fault+0x48b/0xaa0 [ 916.655301] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe [ 916.655301] __x64_sys_sendto+0xdd/0x1b0 [ 916.655301] do_syscall_64+0x96/0x3d0 [ 916.655301] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 916.655301] RIP: 0033:0x7f57a0ff5e03 [ 916.655301] RSP: 002b:00007fff6367e0a8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c [ 916.655301] RAX: ffffffffffffffda RBX: 00007fff6367f1e0 RCX: 00007f57a0ff5e03 [ 916.655301] RDX: 0000000000000020 RSI: 00007fff6367e110 RDI: 0000000000000003 [ 916.655301] RBP: 00007fff6367e100 R08: 00007f57a0ce9160 R09: 000000000000000c [ 916.655301] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff6367e110 [ 916.655301] R13: 0000000000000020 R14: 00007f57a153c610 R15: 0000562417258de0 [ 916.655301] Code: ff ff ff 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 fa 53 48 c1 ea 03 48 b8 00 00 00 00 00 fc ff df 48 89 fd 48 83 ec 08 <0f> b6 04 02 48 89 fa 83 e2 07 38 d0 7f [ 916.655301] RIP: strlen+0x1a/0x90 RSP: ffff88010ff0f2f8 [ 916.771929] ---[ end trace 1065e048e72479fe ]--- [ 916.777204] Kernel panic - not syncing: Fatal exception [ 916.778158] Kernel Offset: 0x14000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) Signed-off-by: Taehee Yoo <ap420073@gmail.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-03-20netfilter: nft_ct: add NFT_CT_{SRC,DST}_{IP,IP6}Pablo Neira Ayuso
All existing keys, except the NFT_CT_SRC and NFT_CT_DST are assumed to have strict datatypes. This is causing problems with sets and concatenations given the specific length of these keys is not known. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Florian Westphal <fw@strlen.de>
2018-01-10netfilter: nf_tables: add single table list for all familiesPablo Neira Ayuso
Place all existing user defined tables in struct net *, instead of having one list per family. This saves us from one level of indentation in netlink dump functions. Place pointer to struct nft_af_info in struct nft_table temporarily, as we still need this to put back reference module reference counter on table removal. This patch comes in preparation for the removal of struct nft_af_info. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-11-06netfilter: conntrack: move nf_ct_netns_{get,put}() to corePablo Neira Ayuso
So we can call this from other expression that need conntrack in place to work. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Florian Westphal <fw@strlen.de>
2017-09-04netfilter: nf_tables: add select_ops for stateful objectsPablo M. Bermudo Garay
This patch adds support for overloading stateful objects operations through the select_ops() callback, just as it is implemented for expressions. This change is needed for upcoming additions to the stateful objects infrastructure. Signed-off-by: Pablo M. Bermudo Garay <pablombg@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-15netfilter: introduce nf_conntrack_helper_put helper functionLiping Zhang
And convert module_put invocation to nf_conntrack_helper_put, this is prepared for the followup patch, which will add a refcnt for cthelper, so we can reject the deleting request when cthelper is in use. Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-04-19netfilter: nft_ct: allow to set ctnetlink event types of a connectionFlorian Westphal
By default the kernel emits all ctnetlink events for a connection. This allows to select the types of events to generate. This can be used to e.g. only send DESTROY events but no NEW/UPDATE ones and will work even if sysctl net.netfilter.nf_conntrack_events is set to 0. This was already possible via iptables' CT target, but the nft version has the advantage that it can also be used with already-established conntracks. The added nf_ct_is_template() check isn't a bug fix as we only support mark and labels (and unlike ecache the conntrack core doesn't copy those). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-04-15netfilter: kill the fake untracked conntrack objectsFlorian Westphal
resurrect an old patch from Pablo Neira to remove the untracked objects. Currently, there are four possible states of an skb wrt. conntrack. 1. No conntrack attached, ct is NULL. 2. Normal (kmem cache allocated) ct attached. 3. a template (kmalloc'd), not in any hash tables at any point in time 4. the 'untracked' conntrack, a percpu nf_conn object, tagged via IPS_UNTRACKED_BIT in ct->status. Untracked is supposed to be identical to case 1. It exists only so users can check -m conntrack --ctstate UNTRACKED vs. -m conntrack --ctstate INVALID e.g. attempts to set connmark on INVALID or UNTRACKED conntracks is supposed to be a no-op. Thus currently we need to check ct == NULL || nf_ct_is_untracked(ct) in a lot of places in order to avoid altering untracked objects. The other consequence of the percpu untracked object is that all -j NOTRACK (and, later, kfree_skb of such skbs) result in an atomic op (inc/dec the untracked conntracks refcount). This adds a new kernel-private ctinfo state, IP_CT_UNTRACKED, to make the distinction instead. The (few) places that care about packet invalid (ct is NULL) vs. packet untracked now need to test ct == NULL vs. ctinfo == IP_CT_UNTRACKED, but all other places can omit the nf_ct_is_untracked() check. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-04-07netfilter: Remove exceptional & on function nameArushi Singhal
Remove & from function pointers to conform to the style found elsewhere in the file. Done using the following semantic patch // <smpl> @r@ identifier f; @@ f(...) { ... } @@ identifier r.f; @@ - &f + f // </smpl> Signed-off-by: Arushi Singhal <arushisinghal19971997@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-03-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/broadcom/genet/bcmmii.c drivers/net/hyperv/netvsc.c kernel/bpf/hashtab.c Almost entirely overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-15netfilter: nft_ct: do cleanup work when NFTA_CT_DIRECTION is invalidLiping Zhang
We should jump to invoke __nft_ct_set_destroy() instead of just return error. Fixes: edee4f1e9245 ("netfilter: nft_ct: add zone id set support") Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-03-13netfilter: nft_ct: add helper set supportFlorian Westphal
this allows to assign connection tracking helpers to connections via nft objref infrastructure. The idea is to first specifiy a helper object: table ip filter { ct helper some-name { type "ftp" protocol tcp l3proto ip } } and then assign it via nft add ... ct helper set "some-name" helper assignment works for new conntracks only as we cannot expand the conntrack extension area once it has been committed to the main conntrack table. ipv4 and ipv6 protocols are tracked stored separately so we can also handle families that observe both ipv4 and ipv6 traffic. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-03-13netfilter: nf_tables: fix mismatch in big-endian systemLiping Zhang
Currently, there are two different methods to store an u16 integer to the u32 data register. For example: u32 *dest = &regs->data[priv->dreg]; 1. *dest = 0; *(u16 *) dest = val_u16; 2. *dest = val_u16; For method 1, the u16 value will be stored like this, either in big-endian or little-endian system: 0 15 31 +-+-+-+-+-+-+-+-+-+-+-+-+ | Value | 0 | +-+-+-+-+-+-+-+-+-+-+-+-+ For method 2, in little-endian system, the u16 value will be the same as listed above. But in big-endian system, the u16 value will be stored like this: 0 15 31 +-+-+-+-+-+-+-+-+-+-+-+-+ | 0 | Value | +-+-+-+-+-+-+-+-+-+-+-+-+ So later we use "memcmp(&regs->data[priv->sreg], data, 2);" to do compare in nft_cmp, nft_lookup expr ..., method 2 will get the wrong result in big-endian system, as 0~15 bits will always be zero. For the similar reason, when loading an u16 value from the u32 data register, we should use "*(u16 *) sreg;" instead of "(u16)*sreg;", the 2nd method will get the wrong value in the big-endian system. So introduce some wrapper functions to store/load an u8 or u16 integer to/from the u32 data register, and use them in the right place. Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-02-23netfilter: nft_ct: fix random validation errors for zone set supportFlorian Westphal
Dan reports: net/netfilter/nft_ct.c:549 nft_ct_set_init() error: uninitialized symbol 'len'. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: edee4f1e924582 ("netfilter: nft_ct: add zone id set support") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-02-08netfilter: nft_ct: add zone id set supportFlorian Westphal
zones allow tracking multiple connections sharing identical tuples, this is needed e.g. when tracking distinct vlans with overlapping ip addresses (conntrack is l2 agnostic). Thus the zone has to be set before the packet is picked up by the connection tracker. This is done by means of 'conntrack templates' which are conntrack structures used solely to pass this info from one netfilter hook to the next. The iptables CT target instantiates these connection tracking templates once per rule, i.e. the template is fixed/tied to particular zone, can be read-only and therefore be re-used by as many skbs simultaneously as needed. We can't follow this model because we want to take the zone id from an sreg at rule eval time so we could e.g. fill in the zone id from the packets vlan id or a e.g. nftables key : value maps. To avoid cost of per packet alloc/free of the template, use a percpu template 'scratch' object and use the refcount to detect the (unlikely) case where the template is still attached to another skb (i.e., previous skb was nfqueued ...). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-02-08netfilter: nft_ct: prepare for key-dependent error unwindFlorian Westphal
Next patch will add ZONE_ID set support which will need similar error unwind (put operation) as conntrack labels. Prepare for this: remove the 'label_got' boolean in favor of a switch statement that can be extended in next patch. As we already have that in the set_destroy function place that in a separate function and call it from the set init function. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-02-08netfilter: nft_ct: add zone id get supportFlorian Westphal
Just like with counters the direction attribute is optional. We set priv->dir to MAX unconditionally to avoid duplicating the assignment for all keys with optional direction. For keys where direction is mandatory, existing code already returns an error. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-02-02netfilter: add and use nf_ct_set helperFlorian Westphal
Add a helper to assign a nf_conn entry and the ctinfo bits to an sk_buff. This avoids changing code in followup patch that merges skb->nfct and skb->nfctinfo into skb->_nfct. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-01-03netfilter: nft_ct: add average bytes per packet supportLiping Zhang
Similar to xt_connbytes, user can match how many average bytes per packet a connection has transferred so far. Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-04netfilter: add and use nf_ct_netns_get/putFlorian Westphal
currently aliased to try_module_get/_put. Will be changed in next patch when we add functions to make use of ->net argument to store usercount per l3proto tracker. This is needed to avoid registering the conntrack hooks in all netns and later only enable connection tracking in those that need conntrack. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-26netfilter: nft_ct: add notrack supportPablo Neira Ayuso
This patch adds notrack support. I decided to add a new expression, given that this doesn't fit into the existing set operation. Notrack doesn't need a source register, and an hypothetical NFT_CT_NOTRACK key makes no sense since matching the untracked state is done through NFT_CT_STATE. I'm placing this new notrack expression into nft_ct.c, I think a single module is too much. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-25netfilter: nft_ct: report error if mark and dir specified simultaneouslyLiping Zhang
NFT_CT_MARK is unrelated to direction, so if NFTA_CT_DIRECTION attr is specified, report EINVAL to the userspace. This validation check was already done at nft_ct_get_init, but we missed it in nft_ct_set_init. Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-25netfilter: nft_ct: unnecessary to require dir when use ct l3proto/protocolLiping Zhang
Currently, if the user want to match ct l3proto, we must specify the direction, for example: # nft add rule filter input ct original l3proto ipv4 ^^^^^^^^ Otherwise, error message will be reported: # nft add rule filter input ct l3proto ipv4 nft add rule filter input ct l3proto ipv4 <cmdline>:1:1-38: Error: Could not process rule: Invalid argument add rule filter input ct l3proto ipv4 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Actually, there's no need to require NFTA_CT_DIRECTION attr, because ct l3proto and protocol are unrelated to direction. And for compatibility, even if the user specify the NFTA_CT_DIRECTION attr, do not report error, just skip it. Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-07-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for net-next, they are: 1) Count pre-established connections as active in "least connection" schedulers such that pre-established connections to avoid overloading backend servers on peak demands, from Michal Kubecek via Simon Horman. 2) Address a race condition when resizing the conntrack table by caching the bucket size when fulling iterating over the hashtable in these three possible scenarios: 1) dump via /proc/net/nf_conntrack, 2) unlinking userspace helper and 3) unlinking custom conntrack timeout. From Liping Zhang. 3) Revisit early_drop() path to perform lockless traversal on conntrack eviction under stress, use del_timer() as synchronization point to avoid two CPUs evicting the same entry, from Florian Westphal. 4) Move NAT hlist_head to nf_conn object, this simplifies the existing NAT extension and it doesn't increase size since recent patches to align nf_conn, from Florian. 5) Use rhashtable for the by-source NAT hashtable, also from Florian. 6) Don't allow --physdev-is-out from OUTPUT chain, just like --physdev-out is not either, from Hangbin Liu. 7) Automagically set on nf_conntrack counters if the user tries to match ct bytes/packets from nftables, from Liping Zhang. 8) Remove possible_net_t fields in nf_tables set objects since we just simply pass the net pointer to the backend set type implementations. 9) Fix possible off-by-one in h323, from Toby DiPasquale. 10) early_drop() may be called from ctnetlink patch, so we must hold rcu read size lock from them too, this amends Florian's patch #3 coming in this batch, from Liping Zhang. 11) Use binary search to validate jump offset in x_tables, this addresses the O(n!) validation that was introduced recently resolve security issues with unpriviledge namespaces, from Florian. 12) Fix reference leak to connlabel in error path of nft_ct, from Zhang. 13) Three updates for nft_log: Fix log prefix leak in error path. Bail out on loglevel larger than debug in nft_log and set on the new NF_LOG_F_COPY_LEN flag when snaplen is specified. Again from Zhang. 14) Allow to filter rule dumps in nf_tables based on table and chain names. 15) Simplify connlabel to always use 128 bits to store labels and get rid of unused function in xt_connlabel, from Florian. 16) Replace set_expect_timeout() by mod_timer() from the h323 conntrack helper, by Gao Feng. 17) Put back x_tables module reference in nft_compat on error, from Liping Zhang. 18) Add a reference count to the x_tables extensions cache in nft_compat, so we can remove them when unused and avoid a crash if the extensions are rmmod, again from Zhang. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-22netfilter: conntrack: support a fixed size of 128 distinct labelsFlorian Westphal
The conntrack label extension is currently variable-sized, e.g. if only 2 labels are used by iptables rules then the labels->bits[] array will only contain one element. We track size of each label storage area in the 'words' member. But in nftables and openvswitch we always have to ask for worst-case since we don't know what bit will be used at configuration time. As most arches are 64bit we need to allocate 24 bytes in this case: struct nf_conn_labels { u8 words; /* 0 1 */ /* XXX 7 bytes hole, try to pack */ long unsigned bits[2]; /* 8 24 */ Make bits a fixed size and drop the words member, it simplifies the code and only increases memory requirements on x86 when less than 64bit labels are required. We still only allocate the extension if its needed. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-07-19netfilter: nft_ct: fix unpaired nf_connlabels_get/put callLiping Zhang
We only get nf_connlabels if the user add ct label set expr successfully, but we will also put nf_connlabels if the user delete ct lable get expr. This is mismathced, and will cause ct label expr cannot work properly. Also, if we init something fail, we should put nf_connlabels back. Otherwise, we may waste to alloc the memory that will never be used. Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-07-11netfilter: nft_ct: make byte/packet expr more friendlyLiping Zhang
If we want to use ct packets expr, and add a rule like follows: # nft add rule filter input ct packets gt 1 counter We will find that no packets will hit it, because nf_conntrack_acct is disabled by default. So It will not work until we enable it manually via "echo 1 > /proc/sys/net/netfilter/nf_conntrack_acct". This is not friendly, so like xt_connbytes do, if the user want to use ct byte/packet expr, enable nf_conntrack_acct automatically. Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-07-08netfilter: nft_ct: fix expiration getterFlorian Westphal
We need to compute timeout.expires - jiffies, not the other way around. Add a helper, another patch can then later change more places in conntrack code where we currently open-code this. Will allow us to only change one place later when we remove per-ct timer. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-05-05netfilter: nftables: add connlabel set supportFlorian Westphal
Conntrack labels are currently sized depending on the iptables ruleset, i.e. if we're asked to test or set bits 1, 2, and 65 then we would allocate enough room to store at least bit 65. However, with nft, the input is just a register with arbitrary runtime content. We therefore ask for the upper ceiling we currently have, which is enough room to store 128 bits. Alternatively, we could alter nf_connlabel_replace to increase net->ct.label_words at run time, but since 128 bits is not that big we'd only save sizeof(long) so it doesn't seem worth it for now. This follows a similar approach that xtables 'connlabel' match uses, so when user inputs ct label set bar then we will set the bit used by the 'bar' label and leave the rest alone. This is done by passing the sreg content to nf_connlabels_replace as both value and mask argument. Labels (bits) already set thus cannot be re-set to zero, but this is not supported by xtables connlabel match either. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-04-18netfilter: connlabels: change nf_connlabels_get bit arg to 'highest used'Florian Westphal
nf_connlabel_set() takes the bit number that we would like to set. nf_connlabels_get() however took the number of bits that we want to support. So e.g. nf_connlabels_get(32) support bits 0 to 31, but not 32. This changes nf_connlabels_get() to take the highest bit that we want to set. Callers then don't have to cope with a potential integer wrap when using nf_connlabels_get(bit + 1) anymore. Current callers are fine, this change is only to make folloup nft ct label set support simpler. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-01-14netfilter: nft_ct: keep counters away from CONFIG_NF_CONNTRACK_LABELSPablo Neira Ayuso
This is accidental, they don't depend on the label infrastructure. Fixes: 48f66c905a97 ("netfilter: nft_ct: add byte/packet counter support") Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Florian Westphal <fw@strlen.de>
2016-01-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next, they are: 1) Release nf_tables objects on netns destructions via nft_release_afinfo(). 2) Destroy basechain and rules on netdevice removal in the new netdev family. 3) Get rid of defensive check against removal of inactive objects in nf_tables. 4) Pass down netns pointer to our existing nfnetlink callbacks, as well as commit() and abort() nfnetlink callbacks. 5) Allow to invert limit expression in nf_tables, so we can throttle overlimit traffic. 6) Add packet duplication for the netdev family. 7) Add forward expression for the netdev family. 8) Define pr_fmt() in conntrack helpers. 9) Don't leave nfqueue configuration on inconsistent state in case of errors, from Ken-ichirou MATSUZAWA, follow up patches are also from him. 10) Skip queue option handling after unbind. 11) Return error on unknown both in nfqueue and nflog command. 12) Autoload ctnetlink when NFQA_CFG_F_CONNTRACK is set. 13) Add new NFTA_SET_USERDATA attribute to store user data in sets, from Carlos Falgueras. 14) Add support for 64 bit byteordering changes nf_tables, from Florian Westphal. 15) Add conntrack byte/packet counter matching support to nf_tables, also from Florian. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-08netfilter: nft_ct: add byte/packet counter supportFlorian Westphal
If the accounting extension isn't present, we'll return a counter value of 0. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-12-18netfilter: nft_ct: include direction when dumping NFT_CT_L3PROTOCOL keyFlorian Westphal
one nft userspace test case fails with 'ct l3proto original ipv4' mismatches 'ct l3proto ipv4' ... because NFTA_CT_DIRECTION attr is missing. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13netfilter: nf_tables: switch registers to 32 bit addressingPatrick McHardy
Switch the nf_tables registers from 128 bit addressing to 32 bit addressing to support so called concatenations, where multiple values can be concatenated over multiple registers for O(1) exact matches of multiple dimensions using sets. The old register values are mapped to areas of 128 bits for compatibility. When dumping register numbers, values are expressed using the old values if they refer to the beginning of a 128 bit area for compatibility. To support concatenations, register loads of less than a full 32 bit value need to be padded. This mainly affects the payload and exthdr expressions, which both unconditionally zero the last word before copying the data. Userspace fully passes the testsuite using both old and new register addressing. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13netfilter: nf_tables: add register parsing/dumping helpersPatrick McHardy
Add helper functions to parse and dump register values in netlink attributes. These helpers will later be changed to take care of translation between the old 128 bit and the new 32 bit register numbers. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13netfilter: nf_tables: convert expressions to u32 register pointersPatrick McHardy
Simple conversion to use u32 pointers to the beginning of the registers to keep follow up patches smaller. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13netfilter: nf_tables: get rid of NFT_REG_VERDICT usagePatrick McHardy
Replace the array of registers passed to expressions by a struct nft_regs, containing the verdict as a seperate member, which aliases to the NFT_REG_VERDICT register. This is needed to seperate the verdict from the data registers completely, so their size can be changed. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13netfilter: nf_tables: introduce nft_validate_register_load()Patrick McHardy
Change nft_validate_input_register() to not only validate the input register number, but also the length of the load, and rename it to nft_validate_register_load() to reflect that change. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13netfilter: nf_tables: kill nft_validate_output_register()Patrick McHardy
All users of nft_validate_register_store() first invoke nft_validate_output_register(). There is in fact no use for using it on its own, so simplify the code by folding the functionality into nft_validate_register_store() and kill it. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>