aboutsummaryrefslogtreecommitdiffstats
path: root/net/netfilter/nf_conntrack_core.c
AgeCommit message (Collapse)Author
2015-01-06netfilter: conntrack: fix race between confirmation and flushPablo Neira Ayuso
Commit 5195c14c8b27c ("netfilter: conntrack: fix race in __nf_conntrack_confirm against get_next_corpse") aimed to resolve the race condition between the confirmation (packet path) and the flush command (from control plane). However, it introduced a crash when several packets race to add a new conntrack, which seems easier to reproduce when nf_queue is in place. Fix this race, in __nf_conntrack_confirm(), by removing the CT from unconfirmed list before checking the DYING bit. In case race occured, re-add the CT to the dying list This patch also changes the verdict from NF_ACCEPT to NF_DROP when we lose race. Basically, the confirmation happens for the first packet that we see in a flow. If you just invoked conntrack -F once (which should be the common case), then this is likely to be the first packet of the flow (unless you already called flush anytime soon in the past). This should be hard to trigger, but better drop this packet, otherwise we leave things in inconsistent state since the destination will likely reply to this packet, but it will find no conntrack, unless the origin retransmits. The change of the verdict has been discussed in: https://www.marc.info/?l=linux-netdev&m=141588039530056&w=2 Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-12-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following batch contains netfilter updates for net-next. Basically, enhancements for xt_recent, skip zeroing of timer in conntrack, fix linking problem with recent redirect support for nf_tables, ipset updates and a couple of cleanups. More specifically, they are: 1) Rise maximum number per IP address to be remembered in xt_recent while retaining backward compatibility, from Florian Westphal. 2) Skip zeroing timer area in nf_conn objects, also from Florian. 3) Inspect IPv4 and IPv6 traffic from the bridge to allow filtering using using meta l4proto and transport layer header, from Alvaro Neira. 4) Fix linking problems in the new redirect support when CONFIG_IPV6=n and IP6_NF_IPTABLES=n. And ipset updates from Jozsef Kadlecsik: 5) Support updating element extensions when the set is full (fixes netfilter bugzilla id 880). 6) Fix set match with 32-bits userspace / 64-bits kernel. 7) Indicate explicitly when /0 networks are supported in ipset. 8) Simplify cidr handling for hash:*net* types. 9) Allocate the proper size of memory when /0 networks are supported. 10) Explicitly add padding elements to hash:net,net and hash:net,port, because the elements must be u32 sized for the used hash function. Jozsef is also cooking ipset RCU conversion which should land soon if they reach the merge window in time. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-27netfilter: conntrack: avoid zeroing timerFlorian Westphal
add a __nfct_init_offset annotation member to struct nf_conn to make it clear which members are covered by the memset when the conntrack is allocated. This avoids zeroing timer_list and ct_net; both are already inited explicitly. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-25Revert "netfilter: conntrack: fix race in __nf_conntrack_confirm against ↵Pablo Neira
get_next_corpse" This reverts commit 5195c14c8b27cc0b18220ddbf0e5ad3328a04187. If the conntrack clashes with an existing one, it is left out of the unconfirmed list, thus, crashing when dropping the packet and releasing the conntrack since golden rule is that conntracks are always placed in any of the existing lists for traceability reasons. Reported-by: Daniel Borkmann <dborkman@redhat.com> Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=88841 Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-14netfilter: conntrack: fix race in __nf_conntrack_confirm against get_next_corpsebill bonaparte
After removal of the central spinlock nf_conntrack_lock, in commit 93bb0ceb75be2 ("netfilter: conntrack: remove central spinlock nf_conntrack_lock"), it is possible to race against get_next_corpse(). The race is against the get_next_corpse() cleanup on the "unconfirmed" list (a per-cpu list with seperate locking), which set the DYING bit. Fix this race, in __nf_conntrack_confirm(), by removing the CT from unconfirmed list before checking the DYING bit. In case race occured, re-add the CT to the dying list. While at this, fix coding style of the comment that has been updated. Fixes: 93bb0ceb75be2 ("netfilter: conntrack: remove central spinlock nf_conntrack_lock") Reported-by: bill bonaparte <programme110@gmail.com> Signed-off-by: bill bonaparte <programme110@gmail.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-08-23net: use reciprocal_scale() helperDaniel Borkmann
Replace open codings of (((u64) <x> * <y>) >> 32) with reciprocal_scale(). Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-22net: use ktime_get_ns() and ktime_get_real_ns() helpersEric Dumazet
ktime_get_ns() replaces ktime_to_ns(ktime_get()) ktime_get_real_ns() replaces ktime_to_ns(ktime_get_real()) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-25netfilter: conntrack: remove timer from ecache extensionFlorian Westphal
This brings the (per-conntrack) ecache extension back to 24 bytes in size (was 152 byte on x86_64 with lockdep on). When event delivery fails, re-delivery is attempted via work queue. Redelivery is attempted at least every 0.1 seconds, but can happen more frequently if userspace is not congested. The nf_ct_release_dying_list() function is removed. With this patch, ownership of the to-be-redelivered conntracks (on-dying-list-with-DYING-bit not yet set) is with the work queue, which will release the references once event is out. Joint work with Pablo Neira Ayuso. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-04-18arch: Mass conversion of smp_mb__*()Peter Zijlstra
Mostly scripted conversion of the smp_mb__* barriers. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/n/tip-55dhyhocezdw1dg7u19hmh1u@git.kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-arch@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-14netfilter: nf_conntrack: initialize net.ct.generationAndrey Vagin
[ 251.920788] INFO: trying to register non-static key. [ 251.921386] the code is fine but needs lockdep annotation. [ 251.921386] turning off the locking correctness validator. [ 251.921386] CPU: 2 PID: 15715 Comm: socket_listen Not tainted 3.14.0+ #294 [ 251.921386] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 251.921386] 0000000000000000 000000009d18c210 ffff880075f039b8 ffffffff816b7ecd [ 251.921386] ffffffff822c3b10 ffff880075f039c8 ffffffff816b36f4 ffff880075f03aa0 [ 251.921386] ffffffff810c65ff ffffffff810c4a85 00000000fffffe01 ffffffffa0075172 [ 251.921386] Call Trace: [ 251.921386] [<ffffffff816b7ecd>] dump_stack+0x45/0x56 [ 251.921386] [<ffffffff816b36f4>] register_lock_class.part.24+0x38/0x3c [ 251.921386] [<ffffffff810c65ff>] __lock_acquire+0x168f/0x1b40 [ 251.921386] [<ffffffff810c4a85>] ? trace_hardirqs_on_caller+0x105/0x1d0 [ 251.921386] [<ffffffffa0075172>] ? nf_nat_setup_info+0x252/0x3a0 [nf_nat] [ 251.921386] [<ffffffff816c1215>] ? _raw_spin_unlock_bh+0x35/0x40 [ 251.921386] [<ffffffffa0075172>] ? nf_nat_setup_info+0x252/0x3a0 [nf_nat] [ 251.921386] [<ffffffff810c7272>] lock_acquire+0xa2/0x120 [ 251.921386] [<ffffffffa008ab90>] ? ipv4_confirm+0x90/0xf0 [nf_conntrack_ipv4] [ 251.921386] [<ffffffffa0055989>] __nf_conntrack_confirm+0x129/0x410 [nf_conntrack] [ 251.921386] [<ffffffffa008ab90>] ? ipv4_confirm+0x90/0xf0 [nf_conntrack_ipv4] [ 251.921386] [<ffffffffa008ab90>] ipv4_confirm+0x90/0xf0 [nf_conntrack_ipv4] [ 251.921386] [<ffffffff815e7b00>] ? ip_fragment+0x9f0/0x9f0 [ 251.921386] [<ffffffff815d8c5a>] nf_iterate+0xaa/0xc0 [ 251.921386] [<ffffffff815e7b00>] ? ip_fragment+0x9f0/0x9f0 [ 251.921386] [<ffffffff815d8d14>] nf_hook_slow+0xa4/0x190 [ 251.921386] [<ffffffff815e7b00>] ? ip_fragment+0x9f0/0x9f0 [ 251.921386] [<ffffffff815e98f2>] ip_output+0x92/0x100 [ 251.921386] [<ffffffff815e8df9>] ip_local_out+0x29/0x90 [ 251.921386] [<ffffffff815e9240>] ip_queue_xmit+0x170/0x4c0 [ 251.921386] [<ffffffff815e90d5>] ? ip_queue_xmit+0x5/0x4c0 [ 251.921386] [<ffffffff81601208>] tcp_transmit_skb+0x498/0x960 [ 251.921386] [<ffffffff81602d82>] tcp_connect+0x812/0x960 [ 251.921386] [<ffffffff810e3dc5>] ? ktime_get_real+0x25/0x70 [ 251.921386] [<ffffffff8159ea2a>] ? secure_tcp_sequence_number+0x6a/0xc0 [ 251.921386] [<ffffffff81606f57>] tcp_v4_connect+0x317/0x470 [ 251.921386] [<ffffffff8161f645>] __inet_stream_connect+0xb5/0x330 [ 251.921386] [<ffffffff8158dfc3>] ? lock_sock_nested+0x33/0xa0 [ 251.921386] [<ffffffff810c4b5d>] ? trace_hardirqs_on+0xd/0x10 [ 251.921386] [<ffffffff81078885>] ? __local_bh_enable_ip+0x75/0xe0 [ 251.921386] [<ffffffff8161f8f8>] inet_stream_connect+0x38/0x50 [ 251.921386] [<ffffffff8158b157>] SYSC_connect+0xe7/0x120 [ 251.921386] [<ffffffff810e3789>] ? current_kernel_time+0x69/0xd0 [ 251.921386] [<ffffffff810c4a85>] ? trace_hardirqs_on_caller+0x105/0x1d0 [ 251.921386] [<ffffffff810c4b5d>] ? trace_hardirqs_on+0xd/0x10 [ 251.921386] [<ffffffff8158c36e>] SyS_connect+0xe/0x10 [ 251.921386] [<ffffffff816caf69>] system_call_fastpath+0x16/0x1b [ 312.014104] INFO: rcu_sched detected stalls on CPUs/tasks: {} (detected by 0, t=60003 jiffies, g=42359, c=42358, q=333) [ 312.015097] INFO: Stall ended before state dump start Fixes: 93bb0ceb75be ("netfilter: conntrack: remove central spinlock nf_conntrack_lock") Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-03-17netfilter: conntrack: Fix UP buildsEric Dumazet
ARRAY_SIZE(nf_conntrack_locks) is undefined if spinlock_t is an empty structure. Replace it by CONNTRACK_LOCKS Fixes: 93bb0ceb75be ("netfilter: conntrack: remove central spinlock nf_conntrack_lock") Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-07netfilter: conntrack: remove central spinlock nf_conntrack_lockJesper Dangaard Brouer
nf_conntrack_lock is a monolithic lock and suffers from huge contention on current generation servers (8 or more core/threads). Perf locking congestion is clear on base kernel: - 72.56% ksoftirqd/6 [kernel.kallsyms] [k] _raw_spin_lock_bh - _raw_spin_lock_bh + 25.33% init_conntrack + 24.86% nf_ct_delete_from_lists + 24.62% __nf_conntrack_confirm + 24.38% destroy_conntrack + 0.70% tcp_packet + 2.21% ksoftirqd/6 [kernel.kallsyms] [k] fib_table_lookup + 1.15% ksoftirqd/6 [kernel.kallsyms] [k] __slab_free + 0.77% ksoftirqd/6 [kernel.kallsyms] [k] inet_getpeer + 0.70% ksoftirqd/6 [nf_conntrack] [k] nf_ct_delete + 0.55% ksoftirqd/6 [ip_tables] [k] ipt_do_table This patch change conntrack locking and provides a huge performance improvement. SYN-flood attack tested on a 24-core E5-2695v2(ES) with 10Gbit/s ixgbe (with tool trafgen): Base kernel: 810.405 new conntrack/sec After patch: 2.233.876 new conntrack/sec Notice other floods attack (SYN+ACK or ACK) can easily be deflected using: # iptables -A INPUT -m state --state INVALID -j DROP # sysctl -w net/netfilter/nf_conntrack_tcp_loose=0 Use an array of hashed spinlocks to protect insertions/deletions of conntracks into the hash table. 1024 spinlocks seem to give good results, at minimal cost (4KB memory). Due to lockdep max depth, 1024 becomes 8 if CONFIG_LOCKDEP=y The hash resize is a bit tricky, because we need to take all locks in the array. A seqcount_t is used to synchronize the hash table users with the resizing process. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-03-07netfilter: conntrack: seperate expect locking from nf_conntrack_lockJesper Dangaard Brouer
Netfilter expectations are protected with the same lock as conntrack entries (nf_conntrack_lock). This patch split out expectations locking to use it's own lock (nf_conntrack_expect_lock). Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-03-07netfilter: avoid race with exp->master ctJesper Dangaard Brouer
Preparation for disconnecting the nf_conntrack_lock from the expectations code. Once the nf_conntrack_lock is lifted, a race condition is exposed. The expectations master conntrack exp->master, can race with delete operations, as the refcnt increment happens too late in init_conntrack(). Race is against other CPUs invoking ->destroy() (destroy_conntrack()), or nf_ct_delete() (via timeout or early_drop()). Avoid this race in nf_ct_find_expectation() by using atomic_inc_not_zero(), and checking if nf_ct_is_dying() (path via nf_ct_delete()). Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-03-07netfilter: conntrack: spinlock per cpu to protect special lists.Jesper Dangaard Brouer
One spinlock per cpu to protect dying/unconfirmed/template special lists. (These lists are now per cpu, a bit like the untracked ct) Add a @cpu field to nf_conn, to make sure we hold the appropriate spinlock at removal time. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-03-07netfilter: trivial code cleanup and doc changesJesper Dangaard Brouer
Changes while reading through the netfilter code. Added hint about how conntrack nf_conn refcnt is accessed. And renamed repl_hash to reply_hash for readability Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-02-05netfilter: nf_conntrack: don't release a conntrack with non-zero refcntPablo Neira Ayuso
With this patch, the conntrack refcount is initially set to zero and it is bumped once it is added to any of the list, so we fulfill Eric's golden rule which is that all released objects always have a refcount that equals zero. Andrey Vagin reports that nf_conntrack_free can't be called for a conntrack with non-zero ref-counter, because it can race with nf_conntrack_find_get(). A conntrack slab is created with SLAB_DESTROY_BY_RCU. Non-zero ref-counter says that this conntrack is used. So when we release a conntrack with non-zero counter, we break this assumption. CPU1 CPU2 ____nf_conntrack_find() nf_ct_put() destroy_conntrack() ... init_conntrack __nf_conntrack_alloc (set use = 1) atomic_inc_not_zero(&ct->use) (use = 2) if (!l4proto->new(ct, skb, dataoff, timeouts)) nf_conntrack_free(ct); (use = 2 !!!) ... __nf_conntrack_alloc (set use = 1) if (!nf_ct_key_equal(h, tuple, zone)) nf_ct_put(ct); (use = 0) destroy_conntrack() /* continue to work with CT */ After applying the path "[PATCH] netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get" another bug was triggered in destroy_conntrack(): <4>[67096.759334] ------------[ cut here ]------------ <2>[67096.759353] kernel BUG at net/netfilter/nf_conntrack_core.c:211! ... <4>[67096.759837] Pid: 498649, comm: atdd veid: 666 Tainted: G C --------------- 2.6.32-042stab084.18 #1 042stab084_18 /DQ45CB <4>[67096.759932] RIP: 0010:[<ffffffffa03d99ac>] [<ffffffffa03d99ac>] destroy_conntrack+0x15c/0x190 [nf_conntrack] <4>[67096.760255] Call Trace: <4>[67096.760255] [<ffffffff814844a7>] nf_conntrack_destroy+0x17/0x30 <4>[67096.760255] [<ffffffffa03d9bb5>] nf_conntrack_find_get+0x85/0x130 [nf_conntrack] <4>[67096.760255] [<ffffffffa03d9fb2>] nf_conntrack_in+0x352/0xb60 [nf_conntrack] <4>[67096.760255] [<ffffffffa048c771>] ipv4_conntrack_local+0x51/0x60 [nf_conntrack_ipv4] <4>[67096.760255] [<ffffffff81484419>] nf_iterate+0x69/0xb0 <4>[67096.760255] [<ffffffff814b5b00>] ? dst_output+0x0/0x20 <4>[67096.760255] [<ffffffff814845d4>] nf_hook_slow+0x74/0x110 <4>[67096.760255] [<ffffffff814b5b00>] ? dst_output+0x0/0x20 <4>[67096.760255] [<ffffffff814b66d5>] raw_sendmsg+0x775/0x910 <4>[67096.760255] [<ffffffff8104c5a8>] ? flush_tlb_others_ipi+0x128/0x130 <4>[67096.760255] [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20 <4>[67096.760255] [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20 <4>[67096.760255] [<ffffffff814c136a>] inet_sendmsg+0x4a/0xb0 <4>[67096.760255] [<ffffffff81444e93>] ? sock_sendmsg+0x13/0x140 <4>[67096.760255] [<ffffffff81444f97>] sock_sendmsg+0x117/0x140 <4>[67096.760255] [<ffffffff8102e299>] ? native_smp_send_reschedule+0x49/0x60 <4>[67096.760255] [<ffffffff81519beb>] ? _spin_unlock_bh+0x1b/0x20 <4>[67096.760255] [<ffffffff8109d930>] ? autoremove_wake_function+0x0/0x40 <4>[67096.760255] [<ffffffff814960f0>] ? do_ip_setsockopt+0x90/0xd80 <4>[67096.760255] [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20 <4>[67096.760255] [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20 <4>[67096.760255] [<ffffffff814457c9>] sys_sendto+0x139/0x190 <4>[67096.760255] [<ffffffff810efa77>] ? audit_syscall_entry+0x1d7/0x200 <4>[67096.760255] [<ffffffff810ef7c5>] ? __audit_syscall_exit+0x265/0x290 <4>[67096.760255] [<ffffffff81474daf>] compat_sys_socketcall+0x13f/0x210 <4>[67096.760255] [<ffffffff8104dea3>] ia32_sysret+0x0/0x5 I have reused the original title for the RFC patch that Andrey posted and most of the original patch description. Cc: Eric Dumazet <edumazet@google.com> Cc: Andrew Vagin <avagin@parallels.com> Cc: Florian Westphal <fw@strlen.de> Reported-by: Andrew Vagin <avagin@parallels.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Andrew Vagin <avagin@parallels.com>
2014-02-05netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_getAndrey Vagin
Lets look at destroy_conntrack: hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode); ... nf_conntrack_free(ct) kmem_cache_free(net->ct.nf_conntrack_cachep, ct); net->ct.nf_conntrack_cachep is created with SLAB_DESTROY_BY_RCU. The hash is protected by rcu, so readers look up conntracks without locks. A conntrack is removed from the hash, but in this moment a few readers still can use the conntrack. Then this conntrack is released and another thread creates conntrack with the same address and the equal tuple. After this a reader starts to validate the conntrack: * It's not dying, because a new conntrack was created * nf_ct_tuple_equal() returns true. But this conntrack is not initialized yet, so it can not be used by two threads concurrently. In this case BUG_ON may be triggered from nf_nat_setup_info(). Florian Westphal suggested to check the confirm bit too. I think it's right. task 1 task 2 task 3 nf_conntrack_find_get ____nf_conntrack_find destroy_conntrack hlist_nulls_del_rcu nf_conntrack_free kmem_cache_free __nf_conntrack_alloc kmem_cache_alloc memset(&ct->tuplehash[IP_CT_DIR_MAX], if (nf_ct_is_dying(ct)) if (!nf_ct_tuple_equal() I'm not sure, that I have ever seen this race condition in a real life. Currently we are investigating a bug, which is reproduced on a few nodes. In our case one conntrack is initialized from a few tasks concurrently, we don't have any other explanation for this. <2>[46267.083061] kernel BUG at net/ipv4/netfilter/nf_nat_core.c:322! ... <4>[46267.083951] RIP: 0010:[<ffffffffa01e00a4>] [<ffffffffa01e00a4>] nf_nat_setup_info+0x564/0x590 [nf_nat] ... <4>[46267.085549] Call Trace: <4>[46267.085622] [<ffffffffa023421b>] alloc_null_binding+0x5b/0xa0 [iptable_nat] <4>[46267.085697] [<ffffffffa02342bc>] nf_nat_rule_find+0x5c/0x80 [iptable_nat] <4>[46267.085770] [<ffffffffa0234521>] nf_nat_fn+0x111/0x260 [iptable_nat] <4>[46267.085843] [<ffffffffa0234798>] nf_nat_out+0x48/0xd0 [iptable_nat] <4>[46267.085919] [<ffffffff814841b9>] nf_iterate+0x69/0xb0 <4>[46267.085991] [<ffffffff81494e70>] ? ip_finish_output+0x0/0x2f0 <4>[46267.086063] [<ffffffff81484374>] nf_hook_slow+0x74/0x110 <4>[46267.086133] [<ffffffff81494e70>] ? ip_finish_output+0x0/0x2f0 <4>[46267.086207] [<ffffffff814b5890>] ? dst_output+0x0/0x20 <4>[46267.086277] [<ffffffff81495204>] ip_output+0xa4/0xc0 <4>[46267.086346] [<ffffffff814b65a4>] raw_sendmsg+0x8b4/0x910 <4>[46267.086419] [<ffffffff814c10fa>] inet_sendmsg+0x4a/0xb0 <4>[46267.086491] [<ffffffff814459aa>] ? sock_update_classid+0x3a/0x50 <4>[46267.086562] [<ffffffff81444d67>] sock_sendmsg+0x117/0x140 <4>[46267.086638] [<ffffffff8151997b>] ? _spin_unlock_bh+0x1b/0x20 <4>[46267.086712] [<ffffffff8109d370>] ? autoremove_wake_function+0x0/0x40 <4>[46267.086785] [<ffffffff81495e80>] ? do_ip_setsockopt+0x90/0xd80 <4>[46267.086858] [<ffffffff8100be0e>] ? call_function_interrupt+0xe/0x20 <4>[46267.086936] [<ffffffff8118cb10>] ? ub_slab_ptr+0x20/0x90 <4>[46267.087006] [<ffffffff8118cb10>] ? ub_slab_ptr+0x20/0x90 <4>[46267.087081] [<ffffffff8118f2e8>] ? kmem_cache_alloc+0xd8/0x1e0 <4>[46267.087151] [<ffffffff81445599>] sys_sendto+0x139/0x190 <4>[46267.087229] [<ffffffff81448c0d>] ? sock_setsockopt+0x16d/0x6f0 <4>[46267.087303] [<ffffffff810efa47>] ? audit_syscall_entry+0x1d7/0x200 <4>[46267.087378] [<ffffffff810ef795>] ? __audit_syscall_exit+0x265/0x290 <4>[46267.087454] [<ffffffff81474885>] ? compat_sys_setsockopt+0x75/0x210 <4>[46267.087531] [<ffffffff81474b5f>] compat_sys_socketcall+0x13f/0x210 <4>[46267.087607] [<ffffffff8104dea3>] ia32_sysret+0x0/0x5 <4>[46267.087676] Code: 91 20 e2 01 75 29 48 89 de 4c 89 f7 e8 56 fa ff ff 85 c0 0f 84 68 fc ff ff 0f b6 4d c6 41 8b 45 00 e9 4d fb ff ff e8 7c 19 e9 e0 <0f> 0b eb fe f6 05 17 91 20 e2 80 74 ce 80 3d 5f 2e 00 00 00 74 <1>[46267.088023] RIP [<ffffffffa01e00a4>] nf_nat_setup_info+0x564/0x590 Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Florian Westphal <fw@strlen.de> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Cc: "David S. Miller" <davem@davemloft.net> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Andrey Vagin <avagin@openvz.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-03netfilter: nf_conntrack: remove dead codestephen hemminger
The following code is not used in current upstream code. Some of this seems to be old hooks, other might be used by some out of tree module (which I don't care about breaking), and the need_ipv4_conntrack was used by old NAT code but no longer called. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-11-18netfilter: nf_conntrack: decrement global counter after object releasePablo Neira Ayuso
nf_conntrack_free() decrements our counter (net->ct.count) before releasing the conntrack object. That counter is used in the nf_conntrack_cleanup_net_list path to check if it's time to kmem_cache_destroy our cache of conntrack objects. I think we have a race there that should be easier to trigger (although still hard) with CONFIG_DEBUG_OBJECTS_FREE as object releases become slowier according to the following splat: [ 1136.321305] WARNING: CPU: 2 PID: 2483 at lib/debugobjects.c:260 debug_print_object+0x83/0xa0() [ 1136.321311] ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x20 ... [ 1136.321390] Call Trace: [ 1136.321398] [<ffffffff8160d4a2>] dump_stack+0x45/0x56 [ 1136.321405] [<ffffffff810514e8>] warn_slowpath_common+0x78/0xa0 [ 1136.321410] [<ffffffff81051557>] warn_slowpath_fmt+0x47/0x50 [ 1136.321414] [<ffffffff812f8883>] debug_print_object+0x83/0xa0 [ 1136.321420] [<ffffffff8106aa90>] ? execute_in_process_context+0x90/0x90 [ 1136.321424] [<ffffffff812f99fb>] debug_check_no_obj_freed+0x20b/0x250 [ 1136.321429] [<ffffffff8112e7f2>] ? kmem_cache_destroy+0x92/0x100 [ 1136.321433] [<ffffffff8115d945>] kmem_cache_free+0x125/0x210 [ 1136.321436] [<ffffffff8112e7f2>] kmem_cache_destroy+0x92/0x100 [ 1136.321443] [<ffffffffa046b806>] nf_conntrack_cleanup_net_list+0x126/0x160 [nf_conntrack] [ 1136.321449] [<ffffffffa046c43d>] nf_conntrack_pernet_exit+0x6d/0x80 [nf_conntrack] [ 1136.321453] [<ffffffff81511cc3>] ops_exit_list.isra.3+0x53/0x60 [ 1136.321457] [<ffffffff815124f0>] cleanup_net+0x100/0x1b0 [ 1136.321460] [<ffffffff8106b31e>] process_one_work+0x18e/0x430 [ 1136.321463] [<ffffffff8106bf49>] worker_thread+0x119/0x390 [ 1136.321467] [<ffffffff8106be30>] ? manage_workers.isra.23+0x2a0/0x2a0 [ 1136.321470] [<ffffffff8107210b>] kthread+0xbb/0xc0 [ 1136.321472] [<ffffffff81072050>] ? kthread_create_on_node+0x110/0x110 [ 1136.321477] [<ffffffff8161b8fc>] ret_from_fork+0x7c/0xb0 [ 1136.321479] [<ffffffff81072050>] ? kthread_create_on_node+0x110/0x110 [ 1136.321481] ---[ end trace 25f53c192da70825 ]--- Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-11-03netfilter: introduce nf_conn_acct structureHolger Eitzenberger
Encapsulate counters for both directions into nf_conn_acct. During that process also consistently name pointers to the extend 'acct', not 'counters'. This patch is a cleanup. Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-28netfilter: add SYNPROXY core/targetPatrick McHardy
Add a SYNPROXY for netfilter. The code is split into two parts, the synproxy core with common functions and an address family specific target. The SYNPROXY receives the connection request from the client, responds with a SYN/ACK containing a SYN cookie and announcing a zero window and checks whether the final ACK from the client contains a valid cookie. It then establishes a connection to the original destination and, if successful, sends a window update to the client with the window size announced by the server. Support for timestamps, SACK, window scaling and MSS options can be statically configured as target parameters if the features of the server are known. If timestamps are used, the timestamp value sent back to the client in the SYN/ACK will be different from the real timestamp of the server. In order to now break PAWS, the timestamps are translated in the direction server->client. Signed-off-by: Patrick McHardy <kaber@trash.net> Tested-by: Martin Topholm <mph@one.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-28netfilter: nf_conntrack: make sequence number adjustments usuable without NATPatrick McHardy
Split out sequence number adjustments from NAT and move them to the conntrack core to make them usable for SYN proxying. The sequence number adjustment information is moved to a seperate extend. The extend is added to new conntracks when a NAT mapping is set up for a connection using a helper. As a side effect, this saves 24 bytes per connection with NAT in the common case that a connection does not have a helper assigned. Signed-off-by: Patrick McHardy <kaber@trash.net> Tested-by: Martin Topholm <mph@one.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-09netfilter: nf_conntrack: don't send destroy events from iteratorFlorian Westphal
Let nf_ct_delete handle delivery of the DESTROY event. Based on earlier patch from Pablo Neira. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-07-31netfilter: nf_nat: change sequence number adjustments to 32 bitsPatrick McHardy
Using 16 bits is too small, when many adjustments happen the offsets might overflow and break the connection. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-07-31netfilter: nf_conntrack: remove duplicate code in ctnetlinkFlorian Westphal
ctnetlink contains copy-paste code from death_by_timeout. In order to avoid changing both places in upcoming event delivery patch, export death_by_timeout functionality and use it in the ctnetlink code. Based on earlier patch from Pablo Neira. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-07-31netfilter: nf_conntrack: constify sk_buff argument to nf_ct_attach()Patrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-05-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: "Highlights (1721 non-merge commits, this has to be a record of some sort): 1) Add 'random' mode to team driver, from Jiri Pirko and Eric Dumazet. 2) Make it so that any driver that supports configuration of multiple MAC addresses can provide the forwarding database add and del calls by providing a default implementation and hooking that up if the driver doesn't have an explicit set of handlers. From Vlad Yasevich. 3) Support GSO segmentation over tunnels and other encapsulating devices such as VXLAN, from Pravin B Shelar. 4) Support L2 GRE tunnels in the flow dissector, from Michael Dalton. 5) Implement Tail Loss Probe (TLP) detection in TCP, from Nandita Dukkipati. 6) In the PHY layer, allow supporting wake-on-lan in situations where the PHY registers have to be written for it to be configured. Use it to support wake-on-lan in mv643xx_eth. From Michael Stapelberg. 7) Significantly improve firewire IPV6 support, from YOSHIFUJI Hideaki. 8) Allow multiple packets to be sent in a single transmission using network coding in batman-adv, from Martin Hundebøll. 9) Add support for T5 cxgb4 chips, from Santosh Rastapur. 10) Generalize the VXLAN forwarding tables so that there is more flexibility in configurating various aspects of the endpoints. From David Stevens. 11) Support RSS and TSO in hardware over GRE tunnels in bxn2x driver, from Dmitry Kravkov. 12) Zero copy support in nfnelink_queue, from Eric Dumazet and Pablo Neira Ayuso. 13) Start adding networking selftests. 14) In situations of overload on the same AF_PACKET fanout socket, or per-cpu packet receive queue, minimize drop by distributing the load to other cpus/fanouts. From Willem de Bruijn and Eric Dumazet. 15) Add support for new payload offset BPF instruction, from Daniel Borkmann. 16) Convert several drivers over to mdoule_platform_driver(), from Sachin Kamat. 17) Provide a minimal BPF JIT image disassembler userspace tool, from Daniel Borkmann. 18) Rewrite F-RTO implementation in TCP to match the final specification of it in RFC4138 and RFC5682. From Yuchung Cheng. 19) Provide netlink socket diag of netlink sockets ("Yo dawg, I hear you like netlink, so I implemented netlink dumping of netlink sockets.") From Andrey Vagin. 20) Remove ugly passing of rtnetlink attributes into rtnl_doit functions, from Thomas Graf. 21) Allow userspace to be able to see if a configuration change occurs in the middle of an address or device list dump, from Nicolas Dichtel. 22) Support RFC3168 ECN protection for ipv6 fragments, from Hannes Frederic Sowa. 23) Increase accuracy of packet length used by packet scheduler, from Jason Wang. 24) Beginning set of changes to make ipv4/ipv6 fragment handling more scalable and less susceptible to overload and locking contention, from Jesper Dangaard Brouer. 25) Get rid of using non-type-safe NLMSG_* macros and use nlmsg_*() instead. From Hong Zhiguo. 26) Optimize route usage in IPVS by avoiding reference counting where possible, from Julian Anastasov. 27) Convert IPVS schedulers to RCU, also from Julian Anastasov. 28) Support cpu fanouts in xt_NFQUEUE netfilter target, from Holger Eitzenberger. 29) Network namespace support for nf_log, ebt_log, xt_LOG, ipt_ULOG, nfnetlink_log, and nfnetlink_queue. From Gao feng. 30) Implement RFC3168 ECN protection, from Hannes Frederic Sowa. 31) Support several new r8169 chips, from Hayes Wang. 32) Support tokenized interface identifiers in ipv6, from Daniel Borkmann. 33) Use usbnet_link_change() helper in USB net driver, from Ming Lei. 34) Add 802.1ad vlan offload support, from Patrick McHardy. 35) Support mmap() based netlink communication, also from Patrick McHardy. 36) Support HW timestamping in mlx4 driver, from Amir Vadai. 37) Rationalize AF_PACKET packet timestamping when transmitting, from Willem de Bruijn and Daniel Borkmann. 38) Bring parity to what's provided by /proc/net/packet socket dumping and the info provided by netlink socket dumping of AF_PACKET sockets. From Nicolas Dichtel. 39) Fix peeking beyond zero sized SKBs in AF_UNIX, from Benjamin Poirier" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1722 commits) filter: fix va_list build error af_unix: fix a fatal race with bit fields bnx2x: Prevent memory leak when cnic is absent bnx2x: correct reading of speed capabilities net: sctp: attribute printl with __printf for gcc fmt checks netlink: kconfig: move mmap i/o into netlink kconfig netpoll: convert mutex into a semaphore netlink: Fix skb ref counting. net_sched: act_ipt forward compat with xtables mlx4_en: fix a build error on 32bit arches Revert "bnx2x: allow nvram test to run when device is down" bridge: avoid OOPS if root port not found drivers: net: cpsw: fix kernel warn on cpsw irq enable sh_eth: use random MAC address if no valid one supplied 3c509.c: call SET_NETDEV_DEV for all device types (ISA/ISAPnP/EISA) tg3: fix to append hardware time stamping flags unix/stream: fix peeking with an offset larger than data in queue unix/dgram: fix peeking with an offset larger than data in queue unix/dgram: peek beyond 0-sized skbs openvswitch: Remove unneeded ovs_netdev_get_ifindex() ...
2013-04-29net/netfilter: rename random32() to prandom_u32()Akinobu Mita
Use preferable function name which implies using a pseudo-random number generator. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Patrick McHardy <kaber@trash.net> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-19Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== The following patchset contains a small batch of Netfilter updates for your net-next tree, they are: * Three patches that provide more accurate error reporting to user-space, instead of -EPERM, in IPv4/IPv6 netfilter re-routing code and NAT, from Patrick McHardy. * Update copyright statements in Netfilter filters of Patrick McHardy, from himself. * Add Kconfig dependency on the raw/mangle tables to the rpfilter, from Florian Westphal. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19netfilter: rename netlink related "pid" variables to "portid"Patrick McHardy
Get rid of the confusing mix of pid and portid and use portid consistently for all netlink related socket identities. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-18netfilter: add my copyright statementsPatrick McHardy
Add copyright statements to all netfilter files which have had significant changes done by myself in the past. Some notes: - nf_conntrack_ecache.c was incorrectly attributed to Rusty and Netfilter Core Team when it got split out of nf_conntrack_core.c. The copyrights even state a date which lies six years before it was written. It was written in 2005 by Harald and myself. - net/ipv{4,6}/netfilter.c, net/netfitler/nf_queue.c were missing copyright statements. I've added the copyright statement from net/netfilter/core.c, where this code originated - for nf_conntrack_proto_tcp.c I've also added Jozsef, since I didn't want it to give the wrong impression Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-03-19netfilter: nf_conntrack: speed up module removal path if netns in useVladimir Davydov
The patch introduces nf_conntrack_cleanup_net_list(), which cleanups nf_conntrack for a list of netns and calls synchronize_net() only once for them all. This should reduce netns destruction time. I've measured cleanup time for 1k dummy net ns. Here are the results: <without the patch> # modprobe nf_conntrack # time modprobe -r nf_conntrack real 0m10.337s user 0m0.000s sys 0m0.376s <with the patch> # modprobe nf_conntrack # time modprobe -r nf_conntrack real 0m5.661s user 0m0.000s sys 0m0.216s Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Patrick McHardy <kaber@trash.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-03-19netfilter: nf_conntrack: add include to fix sparse warningstephen hemminger
Include header file to pickup prototype of nf_nat_seq_adjust_hook Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-23netfilter: nf_ct_proto: move initialization out of pernet_operationsGao feng
Move the global initial codes to the module_init/exit context. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-23netfilter: nf_ct_labels: move initialization out of pernet_operationsGao feng
Move the global initial codes to the module_init/exit context. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-23netfilter: nf_ct_helper: move initialization out of pernet_operationsGao feng
Move the global initial codes to the module_init/exit context. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-23netfilter: nf_ct_timeout: move initialization out of pernet_operationsGao feng
Move the global initial codes to the module_init/exit context. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-23netfilter: nf_ct_ecache: move initialization out of pernet_operationsGao feng
Move the global initial codes to the module_init/exit context. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-23netfilter: nf_ct_tstamp: move initialization out of pernet_operationsGao feng
Move the global initial codes to the module_init/exit context. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-23netfilter: nf_ct_acct: move initialization out of pernet_operationsGao feng
Move the global initial codes to the module_init/exit context. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-23netfilter: nf_ct_expect: move initialization out of pernet_operationsGao feng
Move the global initial codes to the module_init/exit context. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-23netfilter: nf_conntrack: move initialization out of pernet operationsGao feng
nf_conntrack initialization and cleanup codes happens in pernet operations function. This task should be done in module_init/exit. We can't use init_net to identify if it's the right time to initialize or cleanup since we cannot make assumption on the order netns are created/destroyed. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-18netfilter: add connlabel conntrack extensionFlorian Westphal
similar to connmarks, except labels are bit-based; i.e. all labels may be attached to a flow at the same time. Up to 128 labels are supported. Supporting more labels is possible, but requires increasing the ct offset delta from u8 to u16 type due to increased extension sizes. Mapping of bit-identifier to label name is done in userspace. The extension is enabled at run-time once "-m connlabel" netfilter rules are added. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-12netfilter: nf_conntrack: fix BUG_ON while removing nf_conntrack with netnsPablo Neira Ayuso
canqun zhang reported that we're hitting BUG_ON in the nf_conntrack_destroy path when calling kfree_skb while rmmod'ing the nf_conntrack module. Currently, the nf_ct_destroy hook is being set to NULL in the destroy path of conntrack.init_net. However, this is a problem since init_net may be destroyed before any other existing netns (we cannot assume any specific ordering while releasing existing netns according to what I read in recent emails). Thanks to Gao feng for initial patch to address this issue. Reported-by: canqun zhang <canqunzhang@gmail.com> Acked-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-12-16netfilter: xt_CT: fix crash while destroy ct templatesPablo Neira Ayuso
In (d871bef netfilter: ctnetlink: dump entries from the dying and unconfirmed lists), we assume that all conntrack objects are inserted in any of the existing lists. However, template conntrack objects were not. This results in hitting BUG_ON in the destroy_conntrack path while removing a rule that uses the CT target. This patch fixes the situation by adding the template lists, which is where template conntrack objects reside now. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-12-11net: remove obsolete simple_strto<foo>Abhijit Pawar
This patch removes the redundant occurences of simple_strto<foo> Signed-off-by: Abhijit Pawar <abhi.c.pawar@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-10net: remove obsolete simple_strto<foo>Abhijit Pawar
This patch replace the obsolete simple_strto<foo> with kstrto<foo> Signed-off-by: Abhijit Pawar <abhi.c.pawar@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-03netfilter: nf_conntrack: improve nf_conn object traceabilityPablo Neira Ayuso
This patch modifies the conntrack subsystem so that all existing allocated conntrack objects can be found in any of the following places: * the hash table, this is the typical place for alive conntrack objects. * the unconfirmed list, this is the place for newly created conntrack objects that are still traversing the stack. * the dying list, this is where you can find conntrack objects that are dying or that should die anytime soon (eg. once the destroy event is delivered to the conntrackd daemon). Thus, we make sure that we follow the track for all existing conntrack objects. This patch, together with some extension of the ctnetlink interface to dump the content of the dying and unconfirmed lists, will help in case to debug suspected nf_conn object leaks. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-09-21netfilter: nf_nat: fix oops when unloading protocol modulesPatrick McHardy
When unloading a protocol module nf_ct_iterate_cleanup() is used to remove all conntracks using the protocol from the bysource hash and clean their NAT sections. Since the conntrack isn't actually killed, the NAT callback is invoked twice, once for each direction, which causes an oops when trying to delete it from the bysource hash for the second time. The same oops can also happen when removing both an L3 and L4 protocol since the cleanup function doesn't check whether the conntrack has already been cleaned up. Pid: 4052, comm: modprobe Not tainted 3.6.0-rc3-test-nat-unload-fix+ #32 Red Hat KVM RIP: 0010:[<ffffffffa002c303>] [<ffffffffa002c303>] nf_nat_proto_clean+0x73/0xd0 [nf_nat] RSP: 0018:ffff88007808fe18 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8800728550c0 RCX: ffff8800756288b0 RDX: dead000000200200 RSI: ffff88007808fe88 RDI: ffffffffa002f208 RBP: ffff88007808fe28 R08: ffff88007808e000 R09: 0000000000000000 R10: dead000000200200 R11: dead000000100100 R12: ffffffff81c6dc00 R13: ffff8800787582b8 R14: ffff880078758278 R15: ffff88007808fe88 FS: 00007f515985d700(0000) GS:ffff88007cd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007f515986a000 CR3: 000000007867a000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process modprobe (pid: 4052, threadinfo ffff88007808e000, task ffff8800756288b0) Stack: ffff88007808fe68 ffffffffa002c290 ffff88007808fe78 ffffffff815614e3 ffffffff00000000 00000aeb00000246 ffff88007808fe68 ffffffff81c6dc00 ffff88007808fe88 ffffffffa00358a0 0000000000000000 000000000040f5b0 Call Trace: [<ffffffffa002c290>] ? nf_nat_net_exit+0x50/0x50 [nf_nat] [<ffffffff815614e3>] nf_ct_iterate_cleanup+0xc3/0x170 [<ffffffffa002c55a>] nf_nat_l3proto_unregister+0x8a/0x100 [nf_nat] [<ffffffff812a0303>] ? compat_prepare_timeout+0x13/0xb0 [<ffffffffa0035848>] nf_nat_l3proto_ipv4_exit+0x10/0x23 [nf_nat_ipv4] ... To fix this, - check whether the conntrack has already been cleaned up in nf_nat_proto_clean - change nf_ct_iterate_cleanup() to only invoke the callback function once for each conntrack (IP_CT_DIR_ORIGINAL). The second change doesn't affect other callers since when conntracks are actually killed, both directions are removed from the hash immediately and the callback is already only invoked once. If it is not killed, the second callback invocation will always return the same decision not to kill it. Reported-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>