summaryrefslogtreecommitdiffstats
path: root/net/netfilter
AgeCommit message (Collapse)Author
2018-09-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for you net tree: 1) Remove duplicated include at the end of UDP conntrack, from Yue Haibing. 2) Restore conntrack dependency on xt_cluster, from Martin Willi. 3) Fix splat with GSO skbs from the checksum target, from Florian Westphal. 4) Rework ct timeout support, the template strategy to attach custom timeouts is not correct since it will not work in conjunction with conntrack zones and we have a possible free after use when removing the rule due to missing refcounting. To fix these problems, do not use conntrack template at all and set custom timeout on the already valid conntrack object. This fix comes with a preparation patch to simplify timeout adjustment by initializating the first position of the timeout array for all of the existing trackers. Patchset from Florian Westphal. 5) Fix missing dependency on from IPv4 chain NAT type, from Florian. 6) Release chain reference counter from the flush path, from Taehee Yoo. 7) After flushing an iptables ruleset, conntrack hooks are unregistered and entries are left stale to be cleaned up by the timeout garbage collector. No TCP tracking is done on established flows by this time. If ruleset is reloaded, then hooks are registered again and TCP tracking is restored, which considers packets to be invalid. Clear window tracking to exercise TCP flow pickup from the middle given that history is lost for us. Again from Florian. 8) Fix crash from netlink interface with CONFIG_NF_CONNTRACK_TIMEOUT=y and CONFIG_NF_CT_NETLINK_TIMEOUT=n. 9) Broken CT target due to returning incorrect type from ctnl_timeout_find_get(). 10) Solve conntrack clash on NF_REPEAT verdicts too, from Michal Vaner. 11) Missing conversion of hashlimit sysctl interface to new API, from Cong Wang. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-11netfilter: xt_hashlimit: use s->file instead of s->privateCong Wang
After switching to the new procfs API, it is supposed to retrieve the private pointer from PDE_DATA(file_inode(s->file)), s->private is no longer referred. Fixes: 1cd671827290 ("netfilter/x_tables: switch to proc_create_seq_private") Reported-by: Sami Farin <hvtaifwkbgefbaei@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Christoph Hellwig <hch@lst.de> Tested-by: Sami Farin <hvtaifwkbgefbaei@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-11netfilter: nfnetlink_queue: Solve the NFQUEUE/conntrack clash for NF_REPEATMichal 'vorner' Vaner
NF_REPEAT places the packet at the beginning of the iptables chain instead of accepting or rejecting it right away. The packet however will reach the end of the chain and continue to the end of iptables eventually, so it needs the same handling as NF_ACCEPT and NF_DROP. Fixes: 368982cd7d1b ("netfilter: nfnetlink_queue: resolve clash for unconfirmed conntracks") Signed-off-by: Michal 'vorner' Vaner <michal.vaner@avast.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-11netfilter: cttimeout: ctnl_timeout_find_get() returns incorrect pointer to typePablo Neira Ayuso
Compiler did not catch incorrect typing in the rcu hook assignment. % nfct add timeout test-tcp inet tcp established 100 close 10 close_wait 10 % iptables -I OUTPUT -t raw -p tcp -j CT --timeout test-tcp dmesg - xt_CT: Timeout policy `test-tcp' can only be used by L3 protocol number 25000 The CT target bails out with incorrect layer 3 protocol number. Fixes: 6c1fd7dc489d ("netfilter: cttimeout: decouple timeout policy from nfnetlink_cttimeout object") Reported-by: Harsha Sharma <harshasharmaiitr@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-11netfilter: conntrack: timeout interface depend on CONFIG_NF_CONNTRACK_TIMEOUTPablo Neira Ayuso
Now that cttimeout support for nft_ct is in place, these should depend on CONFIG_NF_CONNTRACK_TIMEOUT otherwise we can crash when dumping the policy if this option is not enabled. [ 71.600121] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 [...] [ 71.600141] CPU: 3 PID: 7612 Comm: nft Not tainted 4.18.0+ #246 [...] [ 71.600188] Call Trace: [ 71.600201] ? nft_ct_timeout_obj_dump+0xc6/0xf0 [nft_ct] Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-11netfilter: conntrack: reset tcp maxwin on re-registerFlorian Westphal
Doug Smythies says: Sometimes it is desirable to temporarily disable, or clear, the iptables rule set on a computer being controlled via a secure shell session (SSH). While unwise on an internet facing computer, I also do it often on non-internet accessible computers while testing. Recently, this has become problematic, with the SSH session being dropped upon re-load of the rule set. The problem is that when all rules are deleted, conntrack hooks get unregistered. In case the rules are re-added later, its possible that tcp window has moved far enough so that all packets are considered invalid (out of window) until entry expires (which can take forever, default established timeout is 5 days). Fix this by clearing maxwin of existing tcp connections on register. v2: don't touch entries on hook removal. v3: remove obsolete expiry check. Reported-by: Doug Smythies <dsmythies@telus.net> Fixes: 4d3a57f23dec59 ("netfilter: conntrack: do not enable connection tracking unless needed") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-31netfilter: nf_tables: release chain in flushing setTaehee Yoo
When element of verdict map is deleted, the delete routine should release chain. however, flush element of verdict map routine doesn't release chain. test commands: %nft add table ip filter %nft add chain ip filter c1 %nft add map ip filter map1 { type ipv4_addr : verdict \; } %nft add element ip filter map1 { 1 : jump c1 } %nft flush map ip filter map1 %nft flush ruleset splat looks like: [ 4895.170899] kernel BUG at net/netfilter/nf_tables_api.c:1415! [ 4895.178114] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI [ 4895.178880] CPU: 0 PID: 1670 Comm: nft Not tainted 4.18.0+ #55 [ 4895.178880] RIP: 0010:nf_tables_chain_destroy.isra.28+0x39/0x220 [nf_tables] [ 4895.178880] Code: fc ff df 53 48 89 fb 48 83 c7 50 48 89 fa 48 c1 ea 03 0f b6 04 02 84 c0 74 09 3c 03 7f 05 e8 3e 4c 25 e1 8b 43 50 85 c0 74 02 <0f> 0b 48 89 da 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 80 3c 02 [ 4895.228342] RSP: 0018:ffff88010b98f4c0 EFLAGS: 00010202 [ 4895.234841] RAX: 0000000000000001 RBX: ffff8801131c6968 RCX: ffff8801146585b0 [ 4895.234841] RDX: 1ffff10022638d37 RSI: ffff8801191a9348 RDI: ffff8801131c69b8 [ 4895.234841] RBP: ffff8801146585a8 R08: 1ffff1002323526a R09: 0000000000000000 [ 4895.234841] R10: 0000000000000000 R11: 0000000000000000 R12: dead000000000200 [ 4895.234841] R13: dead000000000100 R14: ffffffffa3638af8 R15: dffffc0000000000 [ 4895.234841] FS: 00007f6d188e6700(0000) GS:ffff88011b600000(0000) knlGS:0000000000000000 [ 4895.234841] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 4895.234841] CR2: 00007ffe72b8df88 CR3: 000000010e2d4000 CR4: 00000000001006f0 [ 4895.234841] Call Trace: [ 4895.234841] nf_tables_commit+0x2704/0x2c70 [nf_tables] [ 4895.234841] ? nfnetlink_rcv_batch+0xa4f/0x11b0 [nfnetlink] [ 4895.234841] ? nf_tables_setelem_notify.constprop.48+0x1a0/0x1a0 [nf_tables] [ 4895.323824] ? __lock_is_held+0x9d/0x130 [ 4895.323824] ? kasan_unpoison_shadow+0x30/0x40 [ 4895.333299] ? kasan_kmalloc+0xa9/0xc0 [ 4895.333299] ? kmem_cache_alloc_trace+0x2c0/0x310 [ 4895.333299] ? nfnetlink_rcv_batch+0xa4f/0x11b0 [nfnetlink] [ 4895.333299] nfnetlink_rcv_batch+0xdb9/0x11b0 [nfnetlink] [ 4895.333299] ? debug_show_all_locks+0x290/0x290 [ 4895.333299] ? nfnetlink_net_init+0x150/0x150 [nfnetlink] [ 4895.333299] ? sched_clock_cpu+0xe5/0x170 [ 4895.333299] ? sched_clock_local+0xff/0x130 [ 4895.333299] ? sched_clock_cpu+0xe5/0x170 [ 4895.333299] ? find_held_lock+0x39/0x1b0 [ 4895.333299] ? sched_clock_local+0xff/0x130 [ 4895.333299] ? memset+0x1f/0x40 [ 4895.333299] ? nla_parse+0x33/0x260 [ 4895.333299] ? ns_capable_common+0x6e/0x110 [ 4895.333299] nfnetlink_rcv+0x2c0/0x310 [nfnetlink] [ ... ] Fixes: 591054469b3e ("netfilter: nf_tables: revisit chain/object refcounting from elements") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-29netfilter: nf_tables: rework ct timeout set supportFlorian Westphal
Using a private template is problematic: 1. We can't assign both a zone and a timeout policy (zone assigns a conntrack template, so we hit problem 1) 2. Using a template needs to take care of ct refcount, else we'll eventually free the private template due to ->use underflow. This patch reworks template policy to instead work with existing conntrack. As long as such conntrack has not yet been placed into the hash table (unconfirmed) we can still add the timeout extension. The only caveat is that we now need to update/correct ct->timeout to reflect the initial/new state, otherwise the conntrack entry retains the default 'new' timeout. Side effect of this change is that setting the policy must now occur from chains that are evaluated *after* the conntrack lookup has taken place. No released kernel contains the timeout policy feature yet, so this change should be ok. Changes since v2: - don't handle 'ct is confirmed case' - after previous patch, no need to special-case tcp/dccp/sctp timeout anymore Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-29netfilter: conntrack: place 'new' timeout in first location tooFlorian Westphal
tcp, sctp and dccp trackers re-use the userspace ctnetlink states to index their timeout arrays, which means timeout[0] is never used. Copy the 'new' state (syn-sent, dccp-request, ..) to 0 as well so external users can simply read it off timeouts[0] without need to differentiate dccp/sctp/tcp and udp/icmp/gre/generic. The alternative is to map all array accesses to 'i - 1', but that is a much more intrusive change. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-24netfilter: xt_checksum: ignore gso skbsFlorian Westphal
Satish Patel reports a skb_warn_bad_offload() splat caused by -j CHECKSUM rules: -A POSTROUTING -p tcp -m tcp --sport 80 -j CHECKSUM The CHECKSUM target has never worked with GSO skbs, and the above rule makes no sense as kernel will handle checksum updates on transmit. Unfortunately, there are 3rd party tools that install such rules, so we cannot reject this from the config plane without potential breakage. Amend Kconfig text to clarify that the CHECKSUM target is only useful in virtualized environments, where old dhcp clients that use AF_PACKET used to discard UDP packets with a 'bad' header checksum and add a one-time warning in case such rule isn't restricted to UDP. v2: check IP6T_F_PROTO flag before cmp (Michal Kubecek) Reported-by: Satish Patel <satish.txt@gmail.com> Reported-by: Markos Chandras <markos.chandras@suse.com> Reported-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-23treewide: convert ISO_8859-1 text comments to utf-8Arnd Bergmann
Almost all files in the kernel are either plain text or UTF-8 encoded. A couple however are ISO_8859-1, usually just a few characters in a C comments, for historic reasons. This converts them all to UTF-8 for consistency. Link: http://lkml.kernel.org/r/20180724111600.4158975-1-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Simon Horman <horms@verge.net.au> [IPVS portion] Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [IIO] Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Acked-by: Rob Herring <robh@kernel.org> Cc: Joe Perches <joe@perches.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Samuel Ortiz <sameo@linux.intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Rob Herring <robh+dt@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-23netfilter: xt_cluster: add dependency on conntrack moduleMartin Willi
The cluster match requires conntrack for matching packets. If the netns does not have conntrack hooks registered, the match does not work at all. Implicitly load the conntrack hook for the family, exactly as many other extensions do. This ensures that the match works even if the hooks have not been registered by other means. Signed-off-by: Martin Willi <martin@strongswan.org> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-23netfilter: conntrack: remove duplicated include from nf_conntrack_proto_udp.cYue Haibing
Remove duplicated include. Fixes: c779e849608a ("netfilter: conntrack: remove get_timeout() indirection") Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-16netfilter: nft_dynset: allow dynamic updates of non-anonymous setPablo Neira Ayuso
This check is superfluous since it breaks valid configurations, remove it. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-16netfilter: nft_tproxy: Fix missing-braces warningMáté Eckl
This patch fixes a warning reported by the kbuild test robot (from linux-next tree): net/netfilter/nft_tproxy.c: In function 'nft_tproxy_eval_v6': >> net/netfilter/nft_tproxy.c:85:9: warning: missing braces around initializer [-Wmissing-braces] struct in6_addr taddr = {0}; ^ net/netfilter/nft_tproxy.c:85:9: warning: (near initialization for 'taddr.in6_u') [-Wmissing-braces] This warning is actually caused by a gcc bug already resolved in newer versions (kbuild used 4.9) so this kind of initialization is omitted and memset is used instead. Fixes: 4ed8eb6570a4 ("netfilter: nf_tables: Add native tproxy support") Signed-off-by: Máté Eckl <ecklm94@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-16netfilter: nft_ct: make l3 protocol field optional for timeout objectHarsha Sharma
If l3 protocol value is not specified for ct timeout object then use the value from nft_ctx protocol family. Signed-off-by: Harsha Sharma <harshasharmaiitr@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-16netfilter: x_tables: do not fail xt_alloc_table_info too easillyMichal Hocko
eacd86ca3b03 ("net/netfilter/x_tables.c: use kvmalloc() in xt_alloc_table_info()") has unintentionally fortified xt_alloc_table_info allocation when __GFP_RETRY has been dropped from the vmalloc fallback. Later on there was a syzbot report that this can lead to OOM killer invocations when tables are too large and 0537250fdc6c ("netfilter: x_tables: make allocation less aggressive") has been merged to restore the original behavior. Georgi Nikolov however noticed that he is not able to install his iptables anymore so this can be seen as a regression. The primary argument for 0537250fdc6c was that this allocation path shouldn't really trigger the OOM killer and kill innocent tasks. On the other hand the interface requires root and as such should allow what the admin asks for. Root inside a namespaces makes this more complicated because those might be not trusted in general. If they are not then such namespaces should be restricted anyway. Therefore drop the __GFP_NORETRY and replace it by __GFP_ACCOUNT to enfore memcg constrains on it. Fixes: 0537250fdc6c ("netfilter: x_tables: make allocation less aggressive") Reported-by: Georgi Nikolov <gnikolov@icdsoft.com> Suggested-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-16netfilter: conntrack: fix removal of conntrack entries when l4tracker is removedFlorian Westphal
nf_ct_l4proto_unregister_one() leaves conntracks added by to-be-removed tracker behind, nf_ct_l4proto_unregister has to iterate for each protocol to be removed. v2: call nf_ct_iterate_destroy without holding nf_ct_proto_mutex. Fixes: 2c41f33c1b703 ("netfilter: move table iteration out of netns exit paths") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-16netfilter: nf_tables: don't prevent event handler from device cleanup on ↵Florian Westphal
netns exit When a netnsamespace exits, the nf_tables pernet_ops will remove all rules. However, there is one caveat: Base chains that register ingress hooks will cause use-after-free: device is already gone at that point. The device event handlers prevent this from happening: netns exit synthesizes unregister events for all devices. However, an improper fix for a race condition made the notifiers a no-op in case they get called from netns exit path, so revert that part. This is safe now as the previous patch fixed nf_tables pernet ops and device notifier initialisation ordering. Fixes: 0a2cf5ee432c2 ("netfilter: nf_tables: close race between netns exit and rmmod") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-16netfilter: nf_tables: fix register orderingFlorian Westphal
We must register nfnetlink ops last, as that exposes nf_tables to userspace. Without this, we could theoretically get nfnetlink request before net->nft state has been initialized. Fixes: 99633ab29b213 ("netfilter: nf_tables: complete net namespace support") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-16netfilter: fix memory leaks on netlink_dump_start errorFlorian Westphal
Shaochun Chen points out we leak dumper filter state allocations stored in dump_control->data in case there is an error before netlink sets cb_running (after which ->done will be called at some point). In order to fix this, add .start functions and move allocations there. Same pattern as used in commit 90fd131afc565159c9e0ea742f082b337e10f8c6 ("netfilter: nf_tables: move dumper state allocation into ->start"). Reported-by: shaochun chen <cscnull@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-16netfilter: nft_set: fix allocation size overflow in privsize callback.Taehee Yoo
In order to determine allocation size of set, ->privsize is invoked. At this point, both desc->size and size of each data structure of set are used. desc->size means number of element that is given by user. desc->size is u32 type. so that upperlimit of set element is 4294967295. but return type of ->privsize is also u32. hence overflow can occurred. test commands: %nft add table ip filter %nft add set ip filter hash1 { type ipv4_addr \; size 4294967295 \; } %nft list ruleset splat looks like: [ 1239.202910] kasan: CONFIG_KASAN_INLINE enabled [ 1239.208788] kasan: GPF could be caused by NULL-ptr deref or user memory access [ 1239.217625] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI [ 1239.219329] CPU: 0 PID: 1603 Comm: nft Not tainted 4.18.0-rc5+ #7 [ 1239.229091] RIP: 0010:nft_hash_walk+0x1d2/0x310 [nf_tables_set] [ 1239.229091] Code: 84 d2 7f 10 4c 89 e7 89 44 24 38 e8 d8 5a 17 e0 8b 44 24 38 48 8d 7b 10 41 0f b6 0c 24 48 89 fa 48 89 fe 48 c1 ea 03 83 e6 07 <42> 0f b6 14 3a 40 38 f2 7f 1a 84 d2 74 16 [ 1239.229091] RSP: 0018:ffff8801118cf358 EFLAGS: 00010246 [ 1239.229091] RAX: 0000000000000000 RBX: 0000000000020400 RCX: 0000000000000001 [ 1239.229091] RDX: 0000000000004082 RSI: 0000000000000000 RDI: 0000000000020410 [ 1239.229091] RBP: ffff880114d5a988 R08: 0000000000007e94 R09: ffff880114dd8030 [ 1239.229091] R10: ffff880114d5a988 R11: ffffed00229bb006 R12: ffff8801118cf4d0 [ 1239.229091] R13: ffff8801118cf4d8 R14: 0000000000000000 R15: dffffc0000000000 [ 1239.229091] FS: 00007f5a8fe0b700(0000) GS:ffff88011b600000(0000) knlGS:0000000000000000 [ 1239.229091] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1239.229091] CR2: 00007f5a8ecc27b0 CR3: 000000010608e000 CR4: 00000000001006f0 [ 1239.229091] Call Trace: [ 1239.229091] ? nft_hash_remove+0xf0/0xf0 [nf_tables_set] [ 1239.229091] ? memset+0x1f/0x40 [ 1239.229091] ? __nla_reserve+0x9f/0xb0 [ 1239.229091] ? memcpy+0x34/0x50 [ 1239.229091] nf_tables_dump_set+0x9a1/0xda0 [nf_tables] [ 1239.229091] ? __kmalloc_reserve.isra.29+0x2e/0xa0 [ 1239.229091] ? nft_chain_hash_obj+0x630/0x630 [nf_tables] [ 1239.229091] ? nf_tables_commit+0x2c60/0x2c60 [nf_tables] [ 1239.229091] netlink_dump+0x470/0xa20 [ 1239.229091] __netlink_dump_start+0x5ae/0x690 [ 1239.229091] nft_netlink_dump_start_rcu+0xd1/0x160 [nf_tables] [ 1239.229091] nf_tables_getsetelem+0x2e5/0x4b0 [nf_tables] [ 1239.229091] ? nft_get_set_elem+0x440/0x440 [nf_tables] [ 1239.229091] ? nft_chain_hash_obj+0x630/0x630 [nf_tables] [ 1239.229091] ? nf_tables_dump_obj_done+0x70/0x70 [nf_tables] [ 1239.229091] ? nla_parse+0xab/0x230 [ 1239.229091] ? nft_get_set_elem+0x440/0x440 [nf_tables] [ 1239.229091] nfnetlink_rcv_msg+0x7f0/0xab0 [nfnetlink] [ 1239.229091] ? nfnetlink_bind+0x1d0/0x1d0 [nfnetlink] [ 1239.229091] ? debug_show_all_locks+0x290/0x290 [ 1239.229091] ? sched_clock_cpu+0x132/0x170 [ 1239.229091] ? find_held_lock+0x39/0x1b0 [ 1239.229091] ? sched_clock_local+0x10d/0x130 [ 1239.229091] netlink_rcv_skb+0x211/0x320 [ 1239.229091] ? nfnetlink_bind+0x1d0/0x1d0 [nfnetlink] [ 1239.229091] ? netlink_ack+0x7b0/0x7b0 [ 1239.229091] ? ns_capable_common+0x6e/0x110 [ 1239.229091] nfnetlink_rcv+0x2d1/0x310 [nfnetlink] [ 1239.229091] ? nfnetlink_rcv_batch+0x10f0/0x10f0 [nfnetlink] [ 1239.229091] ? netlink_deliver_tap+0x829/0x930 [ 1239.229091] ? lock_acquire+0x265/0x2e0 [ 1239.229091] netlink_unicast+0x406/0x520 [ 1239.509725] ? netlink_attachskb+0x5b0/0x5b0 [ 1239.509725] ? find_held_lock+0x39/0x1b0 [ 1239.509725] netlink_sendmsg+0x987/0xa20 [ 1239.509725] ? netlink_unicast+0x520/0x520 [ 1239.509725] ? _copy_from_user+0xa9/0xc0 [ 1239.509725] __sys_sendto+0x21a/0x2c0 [ 1239.509725] ? __ia32_sys_getpeername+0xa0/0xa0 [ 1239.509725] ? retint_kernel+0x10/0x10 [ 1239.509725] ? sched_clock_cpu+0x132/0x170 [ 1239.509725] ? find_held_lock+0x39/0x1b0 [ 1239.509725] ? lock_downgrade+0x540/0x540 [ 1239.509725] ? up_read+0x1c/0x100 [ 1239.509725] ? __do_page_fault+0x763/0x970 [ 1239.509725] ? retint_user+0x18/0x18 [ 1239.509725] __x64_sys_sendto+0x177/0x180 [ 1239.509725] do_syscall_64+0xaa/0x360 [ 1239.509725] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 1239.509725] RIP: 0033:0x7f5a8f468e03 [ 1239.509725] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb d0 0f 1f 84 00 00 00 00 00 83 3d 49 c9 2b 00 00 75 13 49 89 ca b8 2c 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 34 c3 48 83 ec 08 e8 [ 1239.509725] RSP: 002b:00007ffd78d0b778 EFLAGS: 00000246 ORIG_RAX: 000000000000002c [ 1239.509725] RAX: ffffffffffffffda RBX: 00007ffd78d0c890 RCX: 00007f5a8f468e03 [ 1239.509725] RDX: 0000000000000034 RSI: 00007ffd78d0b7e0 RDI: 0000000000000003 [ 1239.509725] RBP: 00007ffd78d0b7d0 R08: 00007f5a8f15c160 R09: 000000000000000c [ 1239.509725] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffd78d0b7e0 [ 1239.509725] R13: 0000000000000034 R14: 00007f5a8f9aff60 R15: 00005648040094b0 [ 1239.509725] Modules linked in: nf_tables_set nf_tables nfnetlink ip_tables x_tables [ 1239.670713] ---[ end trace 39375adcda140f11 ]--- [ 1239.676016] RIP: 0010:nft_hash_walk+0x1d2/0x310 [nf_tables_set] [ 1239.682834] Code: 84 d2 7f 10 4c 89 e7 89 44 24 38 e8 d8 5a 17 e0 8b 44 24 38 48 8d 7b 10 41 0f b6 0c 24 48 89 fa 48 89 fe 48 c1 ea 03 83 e6 07 <42> 0f b6 14 3a 40 38 f2 7f 1a 84 d2 74 16 [ 1239.705108] RSP: 0018:ffff8801118cf358 EFLAGS: 00010246 [ 1239.711115] RAX: 0000000000000000 RBX: 0000000000020400 RCX: 0000000000000001 [ 1239.719269] RDX: 0000000000004082 RSI: 0000000000000000 RDI: 0000000000020410 [ 1239.727401] RBP: ffff880114d5a988 R08: 0000000000007e94 R09: ffff880114dd8030 [ 1239.735530] R10: ffff880114d5a988 R11: ffffed00229bb006 R12: ffff8801118cf4d0 [ 1239.743658] R13: ffff8801118cf4d8 R14: 0000000000000000 R15: dffffc0000000000 [ 1239.751785] FS: 00007f5a8fe0b700(0000) GS:ffff88011b600000(0000) knlGS:0000000000000000 [ 1239.760993] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1239.767560] CR2: 00007f5a8ecc27b0 CR3: 000000010608e000 CR4: 00000000001006f0 [ 1239.775679] Kernel panic - not syncing: Fatal exception [ 1239.776630] Kernel Offset: 0x1f000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 1239.776630] Rebooting in 5 seconds.. Fixes: 20a69341f2d0 ("netfilter: nf_tables: add netlink set API") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-16ipvs: don't show negative times in ip_vs_connMatteo Croce
Since commit 500462a9de65 ("timers: Switch to a non-cascading wheel"), timers duration can last even 12.5% more than the scheduled interval. IPVS has two handlers, /proc/net/ip_vs_conn and /proc/net/ip_vs_conn_sync, which shows the remaining time before that a connection expires. The default expire time for a connection is 60 seconds, and the expiration timer can fire even 4 seconds later than the scheduled time. The expiration time is calculated subtracting jiffies to the scheduled expiration time, and it's shown as a huge number when the timer fires late, since both values are unsigned. This can confuse script and tools which relies on it, like ipvsadm: root@mcroce-redhat:~# while ipvsadm -lc |grep SYN_RECV; do sleep 1 ; done TCP 00:05 SYN_RECV [fc00:1::1]:55732 [fc00:1::2]:8000 [fc00:2000::1]:8000 TCP 00:04 SYN_RECV [fc00:1::1]:55732 [fc00:1::2]:8000 [fc00:2000::1]:8000 TCP 00:03 SYN_RECV [fc00:1::1]:55732 [fc00:1::2]:8000 [fc00:2000::1]:8000 TCP 00:02 SYN_RECV [fc00:1::1]:55732 [fc00:1::2]:8000 [fc00:2000::1]:8000 TCP 00:01 SYN_RECV [fc00:1::1]:55732 [fc00:1::2]:8000 [fc00:2000::1]:8000 TCP 00:00 SYN_RECV [fc00:1::1]:55732 [fc00:1::2]:8000 [fc00:2000::1]:8000 TCP 68719476:44 SYN_RECV [fc00:1::1]:55732 [fc00:1::2]:8000 [fc00:2000::1]:8000 TCP 68719476:43 SYN_RECV [fc00:1::1]:55732 [fc00:1::2]:8000 [fc00:2000::1]:8000 TCP 68719476:42 SYN_RECV [fc00:1::1]:55732 [fc00:1::2]:8000 [fc00:2000::1]:8000 TCP 68719476:41 SYN_RECV [fc00:1::1]:55732 [fc00:1::2]:8000 [fc00:2000::1]:8000 TCP 68719476:40 SYN_RECV [fc00:1::1]:55732 [fc00:1::2]:8000 [fc00:2000::1]:8000 TCP 68719476:39 SYN_RECV [fc00:1::1]:55732 [fc00:1::2]:8000 [fc00:2000::1]:8000 Signed-off-by: Matteo Croce <mcroce@redhat.com> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-16ipvs: fix race between ip_vs_conn_new() and ip_vs_del_dest()Tan Hu
We came across infinite loop in ipvs when using ipvs in docker env. When ipvs receives new packets and cannot find an ipvs connection, it will create a new connection, then if the dest is unavailable (i.e. IP_VS_DEST_F_AVAILABLE), the packet will be dropped sliently. But if the dropped packet is the first packet of this connection, the connection control timer never has a chance to start and the ipvs connection cannot be released. This will lead to memory leak, or infinite loop in cleanup_net() when net namespace is released like this: ip_vs_conn_net_cleanup at ffffffffa0a9f31a [ip_vs] __ip_vs_cleanup at ffffffffa0a9f60a [ip_vs] ops_exit_list at ffffffff81567a49 cleanup_net at ffffffff81568b40 process_one_work at ffffffff810a851b worker_thread at ffffffff810a9356 kthread at ffffffff810b0b6f ret_from_fork at ffffffff81697a18 race condition: CPU1 CPU2 ip_vs_in() ip_vs_conn_new() ip_vs_del_dest() __ip_vs_unlink_dest() ~IP_VS_DEST_F_AVAILABLE cp->dest && !IP_VS_DEST_F_AVAILABLE __ip_vs_conn_put ... cleanup_net ---> infinite looping Fix this by checking whether the timer already started. Signed-off-by: Tan Hu <tan.hu@zte.com.cn> Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn> Acked-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: "Highlights: - Gustavo A. R. Silva keeps working on the implicit switch fallthru changes. - Support 802.11ax High-Efficiency wireless in cfg80211 et al, From Luca Coelho. - Re-enable ASPM in r8169, from Kai-Heng Feng. - Add virtual XFRM interfaces, which avoids all of the limitations of existing IPSEC tunnels. From Steffen Klassert. - Convert GRO over to use a hash table, so that when we have many flows active we don't traverse a long list during accumluation. - Many new self tests for routing, TC, tunnels, etc. Too many contributors to mention them all, but I'm really happy to keep seeing this stuff. - Hardware timestamping support for dpaa_eth/fsl-fman from Yangbo Lu. - Lots of cleanups and fixes in L2TP code from Guillaume Nault. - Add IPSEC offload support to netdevsim, from Shannon Nelson. - Add support for slotting with non-uniform distribution to netem packet scheduler, from Yousuk Seung. - Add UDP GSO support to mlx5e, from Boris Pismenny. - Support offloading of Team LAG in NFP, from John Hurley. - Allow to configure TX queue selection based upon RX queue, from Amritha Nambiar. - Support ethtool ring size configuration in aquantia, from Anton Mikaev. - Support DSCP and flowlabel per-transport in SCTP, from Xin Long. - Support list based batching and stack traversal of SKBs, this is very exciting work. From Edward Cree. - Busyloop optimizations in vhost_net, from Toshiaki Makita. - Introduce the ETF qdisc, which allows time based transmissions. IGB can offload this in hardware. From Vinicius Costa Gomes. - Add parameter support to devlink, from Moshe Shemesh. - Several multiplication and division optimizations for BPF JIT in nfp driver, from Jiong Wang. - Lots of prepatory work to make more of the packet scheduler layer lockless, when possible, from Vlad Buslov. - Add ACK filter and NAT awareness to sch_cake packet scheduler, from Toke Høiland-Jørgensen. - Support regions and region snapshots in devlink, from Alex Vesker. - Allow to attach XDP programs to both HW and SW at the same time on a given device, with initial support in nfp. From Jakub Kicinski. - Add TLS RX offload and support in mlx5, from Ilya Lesokhin. - Use PHYLIB in r8169 driver, from Heiner Kallweit. - All sorts of changes to support Spectrum 2 in mlxsw driver, from Ido Schimmel. - PTP support in mv88e6xxx DSA driver, from Andrew Lunn. - Make TCP_USER_TIMEOUT socket option more accurate, from Jon Maxwell. - Support for templates in packet scheduler classifier, from Jiri Pirko. - IPV6 support in RDS, from Ka-Cheong Poon. - Native tproxy support in nf_tables, from Máté Eckl. - Maintain IP fragment queue in an rbtree, but optimize properly for in-order frags. From Peter Oskolkov. - Improvde handling of ACKs on hole repairs, from Yuchung Cheng" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1996 commits) bpf: test: fix spelling mistake "REUSEEPORT" -> "REUSEPORT" hv/netvsc: Fix NULL dereference at single queue mode fallback net: filter: mark expected switch fall-through xen-netfront: fix warn message as irq device name has '/' cxgb4: Add new T5 PCI device ids 0x50af and 0x50b0 net: dsa: mv88e6xxx: missing unlock on error path rds: fix building with IPV6=m inet/connection_sock: prefer _THIS_IP_ to current_text_addr net: dsa: mv88e6xxx: bitwise vs logical bug net: sock_diag: Fix spectre v1 gadget in __sock_diag_cmd() ieee802154: hwsim: using right kind of iteration net: hns3: Add vlan filter setting by ethtool command -K net: hns3: Set tx ring' tc info when netdev is up net: hns3: Remove tx ring BD len register in hns3_enet net: hns3: Fix desc num set to default when setting channel net: hns3: Fix for phy link issue when using marvell phy driver net: hns3: Fix for information of phydev lost problem when down/up net: hns3: Fix for command format parsing error in hclge_is_all_function_id_zero net: hns3: Add support for serdes loopback selftest bnxt_en: take coredump_record structure off stack ...
2018-08-15Merge tag 'audit-pr-20180814' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit patches from Paul Moore: "Twelve audit patches for v4.19 and they run the full gamut from fixes to features. Notable changes include the ability to use the "exe" audit filter field in a wider variety of filter types, a fix for our comparison of GID/EGID in audit filter rules, better association of related audit records (connecting related audit records together into one audit event), and a fix for a potential use-after-free in audit_add_watch(). All the patches pass the audit-testsuite and merge cleanly on your current master branch" * tag 'audit-pr-20180814' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: fix use-after-free in audit_add_watch audit: use ktime_get_coarse_real_ts64() for timestamps audit: use ktime_get_coarse_ts64() for time access audit: simplify audit_enabled check in audit_watch_log_rule_change() audit: check audit_enabled in audit_tree_log_remove_rule() cred: conditionally declare groups-related functions audit: eliminate audit_enabled magic number comparison audit: rename FILTER_TYPE to FILTER_EXCLUDE audit: Fix extended comparison of GID/EGID audit: tie ANOM_ABEND records to syscall audit: tie SECCOMP records to syscall audit: allow other filter list types for AUDIT_EXE
2018-08-08netfilter: nfnetlink_osf: fix using plain integer as NULL warningWei Yongjun
Fixes the following sparse warning: net/netfilter/nfnetlink_osf.c:274:24: warning: Using plain integer as NULL pointer Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-07netfilter: nft_ct: enable conntrack for helpersPablo Neira Ayuso
Enable conntrack if the user defines a helper to be used from the ruleset policy. Fixes: 1a64edf54f55 ("netfilter: nft_ct: add helper set support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-07netfilter: nft_ct: add ct timeout supportHarsha Sharma
This patch allows to add, list and delete connection tracking timeout policies via nft objref infrastructure and assigning these timeout via nft rule. %./libnftnl/examples/nft-ct-timeout-add ip raw cttime tcp Ruleset: table ip raw { ct timeout cttime { protocol tcp; policy = {established: 111, close: 13 } } chain output { type filter hook output priority -300; policy accept; ct timeout set "cttime" } } %./libnftnl/examples/nft-rule-ct-timeout-add ip raw output cttime %conntrack -E [NEW] tcp 6 111 ESTABLISHED src=172.16.19.128 dst=172.16.19.1 sport=22 dport=41360 [UNREPLIED] src=172.16.19.1 dst=172.16.19.128 sport=41360 dport=22 %nft delete rule ip raw output handle <handle> %./libnftnl/examples/nft-ct-timeout-del ip raw cttime Joint work with Pablo Neira. Signed-off-by: Harsha Sharma <harshasharmaiitr@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-07netfilter: cttimeout: decouple timeout policy from nfnetlink_cttimeout objectPablo Neira Ayuso
The timeout policy is currently embedded into the nfnetlink_cttimeout object, move the policy into an independent object. This allows us to reuse part of the existing conntrack timeout extension from nf_tables without adding dependencies with the nfnetlink_cttimeout object layout. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-07netfilter: cttimeout: move ctnl_untimeout to nf_conntrackHarsha Sharma
As, ctnl_untimeout is required by nft_ct, so move ctnl_timeout from nfnetlink_cttimeout to nf_conntrack_timeout and rename as nf_ct_timeout. Signed-off-by: Harsha Sharma <harshasharmaiitr@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-07netfilter: nft_osf: use NFT_OSF_MAXGENRELEN instead of IFNAMSIZFernando Fernandez Mancera
As no "genre" on pf.os exceed 16 bytes of length, we reduce NFT_OSF_MAXGENRELEN parameter to 16 bytes and use it instead of IFNAMSIZ. Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for your net-next tree: 1) Support for transparent proxying for nf_tables, from Mate Eckl. 2) Patchset to add OS passive fingerprint recognition for nf_tables, from Fernando Fernandez. This takes common code from xt_osf and place it into the new nfnetlink_osf module for codebase sharing. 3) Lightweight tunneling support for nf_tables. 4) meta and lookup are likely going to be used in rulesets, make them direct calls. From Florian Westphal. A bunch of incremental updates: 5) use PTR_ERR_OR_ZERO() from nft_numgen, from YueHaibing. 6) Use kvmalloc_array() to allocate hashtables, from Li RongQing. 7) Explicit dependencies between nfnetlink_cttimeout and conntrack timeout extensions, from Harsha Sharma. 8) Simplify NLM_F_CREATE handling in nf_tables. 9) Removed unused variable in the get element command, from YueHaibing. 10) Expose bridge hook priorities through uapi, from Mate Eckl. And a few fixes for previous Netfilter batch for net-next: 11) Use per-netns mutex from flowtable event, from Florian Westphal. 12) Remove explicit dependency on iptables CT target from conntrack zones, from Florian. 13) Fix use-after-free in rmmod nf_conntrack path, also from Florian. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-04net: Remove some unneeded semicolonzhong jiang
These semicolons are not needed. Just remove them. Signed-off-by: zhong jiang <zhongjiang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-04netfilter: nft_tunnel: fix sparse errorsPablo Neira Ayuso
[...] net/netfilter/nft_tunnel.c:117:25: expected unsigned int [unsigned] [usertype] flags net/netfilter/nft_tunnel.c:117:25: got restricted __be16 [usertype] <noident> [...] net/netfilter/nft_tunnel.c:246:33: expected restricted __be16 [addressable] [assigned] [usertype] tp_dst net/netfilter/nft_tunnel.c:246:33: got int Fixes: af308b94a2a4 ("netfilter: nf_tables: add tunnel support") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-03netfilter: conntrack: avoid use-after free on rmmodFlorian Westphal
When the conntrack module is removed, we call nf_ct_iterate_destroy via nf_ct_l4proto_unregister(). Problem is that nf_conntrack_proto_fini() gets called after the conntrack hash table has already been freed. Just remove the l4proto unregister call, its unecessary as the nf_ct_protos[] array gets free'd right after anyway. v2: add comment wrt. missing unreg call. Fixes: a0ae2562c6c4b2 ("netfilter: conntrack: remove l3proto abstraction") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-03netfilter: kconfig: remove ct zone/label dependenciesFlorian Westphal
connection tracking zones currently depend on the xtables CT target. The reasoning was that it makes no sense to support zones if they can't be configured (which needed CT target). Nowadays zones can also be used by OVS and configured via nftables, so remove the dependency. connection tracking labels are handled via hidden dependency that gets auto-selected by the connlabel match. Make it a visible knob, as labels can be attached via ctnetlink or via nftables rules (nft_ct expression) too. This allows to use conntrack labels and zones with nftables-only build. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-03netfilter: nf_tables: simplify NLM_F_CREATE handlingPablo Neira Ayuso
* From nf_tables_newchain(), codepath provides context that allows us to infer if we are updating a chain (in that case, no module autoload is required) or adding a new one (then, module autoload is indeed needed). * We only need it in one single spot in nf_tables_newrule(). * Not needed for nf_tables_newset() at all. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-03netfilter: nf_tables: match on tunnel metadataPablo Neira Ayuso
This patch allows us to match on the tunnel metadata that is available of the packet. We can use this to validate if the packet comes from/goes to tunnel and the corresponding tunnel ID. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-03netfilter: nf_tables: add tunnel supportPablo Neira Ayuso
This patch implements the tunnel object type that can be used to configure tunnels via metadata template through the existing lightweight API from the ingress path. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-03netfilter: nft_tproxy: Add missing config checkMáté Eckl
A config check was missing form the code when using nf_defrag_ipv6_enable with NFT_TPROXY != n and NF_DEFRAG_IPV6 = n and this caused the following error: ../net/netfilter/nft_tproxy.c: In function 'nft_tproxy_init': ../net/netfilter/nft_tproxy.c:237:3: error: implicit declaration of function +'nf_defrag_ipv6_enable' [-Werror=implicit-function-declaration] err = nf_defrag_ipv6_enable(ctx->net); This patch adds a check for NF_TABLES_IPV6 when NF_DEFRAG_IPV6 is selected by Kconfig. Reported-by: Randy Dunlap <rdunlap@infradead.org> Fixes: 4ed8eb6570a4 ("netfilter: nf_tables: Add native tproxy support") Signed-off-by: Máté Eckl <ecklm94@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-03netfilter: cttimeout: Make NF_CT_NETLINK_TIMEOUT depend on NF_CONNTRACK_TIMEOUTHarsha Sharma
With this, remove ifdef for CONFIG_NF_CONNTRACK_TIMEOUT in nfnetlink_cttimeout. This is also required for moving ctnl_untimeout from nfnetlink_cttimeout to nf_conntrack_timeout. Signed-off-by: Harsha Sharma <harshasharmaiitr@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-03netfilter: nf_tables: remove unused variableYueHaibing
Variable 'ext' is being assigned but are never used hence they are unused and can be removed. Cleans up clang warnings: net/netfilter/nf_tables_api.c:4032:28: warning: variable ‘ext’ set but not used [-Wunused-but-set-variable] Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-03netfilter: nf_tables: flow event notifier must use transaction mutexFlorian Westphal
Fixes: f102d66b335a4 ("netfilter: nf_tables: use dedicated mutex to guard transactions") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-03netfilter: nfnetlink_osf: rename nf_osf header file to nfnetlink_osfFernando Fernandez Mancera
The first client of the nf_osf.h userspace header is nft_osf, coming in this batch, rename it to nfnetlink_osf.h as there are no userspace clients for this yet, hence this looks consistent with other nfnetlink subsystem. Suggested-by: Jan Engelhardt <jengelh@inai.de> Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-03netfilter: use kvmalloc_array to allocate memory for hashtableLi RongQing
nf_ct_alloc_hashtable is used to allocate memory for conntrack, NAT bysrc and expectation hashtable. Assuming 64k bucket size, which means 7th order page allocation, __get_free_pages, called by nf_ct_alloc_hashtable, will trigger the direct memory reclaim and stall for a long time, when system has lots of memory stress so replace combination of __get_free_pages and vzalloc with kvmalloc_array, which provides a overflow check and a fallback if no high order memory is available, and do not retry to reclaim memory, reduce stall and remove nf_ct_free_hashtable, since it is just a kvfree Signed-off-by: Zhang Yu <zhangyu31@baidu.com> Signed-off-by: Wang Li <wangli39@baidu.com> Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-30netfilter: nf_tables: Add native tproxy supportMáté Eckl
A great portion of the code is taken from xt_TPROXY.c There are some changes compared to the iptables implementation: - tproxy statement is not terminal here - Either address or port has to be specified, but at least one of them is necessary. If one of them is not specified, the evaluation will be performed with the original attribute of the packet (ie. target port is not specified => the packet's dport will be used). To make this work in inet tables, the tproxy structure has a family member (typically called priv->family) which is not necessarily equal to ctx->family. priv->family can have three values legally: - NFPROTO_IPV4 if the table family is ip OR if table family is inet, but an ipv4 address is specified as a target address. The rule only evaluates ipv4 packets in this case. - NFPROTO_IPV6 if the table family is ip6 OR if table family is inet, but an ipv6 address is specified as a target address. The rule only evaluates ipv6 packets in this case. - NFPROTO_UNSPEC if the table family is inet AND if only the port is specified. The rule will evaluate both ipv4 and ipv6 packets. Signed-off-by: Máté Eckl <ecklm94@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-30netfilter: nf_tables: implement Passive OS fingerprint module in nft_osfFernando Fernandez Mancera
Add basic module functions into nft_osf.[ch] in order to implement OSF module in nf_tables. Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-30netfilter: nfnetlink_osf: extract nfnetlink_subsystem code from xt_osf.cFernando Fernandez Mancera
Move nfnetlink osf subsystem from xt_osf.c to standalone module so we can reuse it from the new nft_ost extension. Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-30netfilter: nf_osf: rename nf_osf.c to nfnetlink_osf.cFernando Fernandez Mancera
Rename nf_osf.c to nfnetlink_osf.c as we introduce nfnetlink_osf which is the OSF infraestructure. Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>