aboutsummaryrefslogtreecommitdiffstats
path: root/net/netfilter
AgeCommit message (Collapse)Author
2022-04-15netfilter: nf_conntrack_tcp: preserve liberal flag in tcp optionsPablo Neira Ayuso
[ Upstream commit f2dd495a8d589371289981d5ed33e6873df94ecc ] Do not reset IP_CT_TCP_FLAG_BE_LIBERAL flag in out-of-sync scenarios coming before the TCP window tracking, otherwise such connections will fail in the window check. Update tcp_options() to leave this flag in place and add a new helper function to reset the tcp window state. Based on patch from Sven Auhagen. Fixes: c4832c7bbc3f ("netfilter: nf_ct_tcp: improve out-of-sync situation in TCP tracking") Tested-by: Sven Auhagen <sven.auhagen@voleatech.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-28netfilter: nf_tables: initialize registers in nft_do_chain()Pablo Neira Ayuso
commit 4c905f6740a365464e91467aa50916555b28213d upstream. Initialize registers to avoid stack leak into userspace. Fixes: 96518518cc41 ("netfilter: add nftables") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-08netfilter: nf_queue: fix possible use-after-freeFlorian Westphal
commit c3873070247d9e3c7a6b0cf9bf9b45e8018427b1 upstream. Eric Dumazet says: The sock_hold() side seems suspect, because there is no guarantee that sk_refcnt is not already 0. On failure, we cannot queue the packet and need to indicate an error. The packet will be dropped by the caller. v2: split skb prefetch hunk into separate change Fixes: 271b72c7fa82c ("udp: RCU handling for Unicast packets.") Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-08netfilter: nf_queue: don't assume sk is full socketFlorian Westphal
commit 747670fd9a2d1b7774030dba65ca022ba442ce71 upstream. There is no guarantee that state->sk refers to a full socket. If refcount transitions to 0, sock_put calls sk_free which then ends up with garbage fields. I'd like to thank Oleksandr Natalenko and Jiri Benc for considerable debug work and pointing out state->sk oddities. Fixes: ca6fb0651883 ("tcp: attach SYNACK messages to request sockets instead of listener") Tested-by: Oleksandr Natalenko <oleksandr@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-08netfilter: fix use-after-free in __nf_register_net_hook()Eric Dumazet
commit 56763f12b0f02706576a088e85ef856deacc98a0 upstream. We must not dereference @new_hooks after nf_hook_mutex has been released, because other threads might have freed our allocated hooks already. BUG: KASAN: use-after-free in nf_hook_entries_get_hook_ops include/linux/netfilter.h:130 [inline] BUG: KASAN: use-after-free in hooks_validate net/netfilter/core.c:171 [inline] BUG: KASAN: use-after-free in __nf_register_net_hook+0x77a/0x820 net/netfilter/core.c:438 Read of size 2 at addr ffff88801c1a8000 by task syz-executor237/4430 CPU: 1 PID: 4430 Comm: syz-executor237 Not tainted 5.17.0-rc5-syzkaller-00306-g2293be58d6a1 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 print_address_description.constprop.0.cold+0x8d/0x336 mm/kasan/report.c:255 __kasan_report mm/kasan/report.c:442 [inline] kasan_report.cold+0x83/0xdf mm/kasan/report.c:459 nf_hook_entries_get_hook_ops include/linux/netfilter.h:130 [inline] hooks_validate net/netfilter/core.c:171 [inline] __nf_register_net_hook+0x77a/0x820 net/netfilter/core.c:438 nf_register_net_hook+0x114/0x170 net/netfilter/core.c:571 nf_register_net_hooks+0x59/0xc0 net/netfilter/core.c:587 nf_synproxy_ipv6_init+0x85/0xe0 net/netfilter/nf_synproxy_core.c:1218 synproxy_tg6_check+0x30d/0x560 net/ipv6/netfilter/ip6t_SYNPROXY.c:81 xt_check_target+0x26c/0x9e0 net/netfilter/x_tables.c:1038 check_target net/ipv6/netfilter/ip6_tables.c:530 [inline] find_check_entry.constprop.0+0x7f1/0x9e0 net/ipv6/netfilter/ip6_tables.c:573 translate_table+0xc8b/0x1750 net/ipv6/netfilter/ip6_tables.c:735 do_replace net/ipv6/netfilter/ip6_tables.c:1153 [inline] do_ip6t_set_ctl+0x56e/0xb90 net/ipv6/netfilter/ip6_tables.c:1639 nf_setsockopt+0x83/0xe0 net/netfilter/nf_sockopt.c:101 ipv6_setsockopt+0x122/0x180 net/ipv6/ipv6_sockglue.c:1024 rawv6_setsockopt+0xd3/0x6a0 net/ipv6/raw.c:1084 __sys_setsockopt+0x2db/0x610 net/socket.c:2180 __do_sys_setsockopt net/socket.c:2191 [inline] __se_sys_setsockopt net/socket.c:2188 [inline] __x64_sys_setsockopt+0xba/0x150 net/socket.c:2188 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f65a1ace7d9 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 71 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f65a1a7f308 EFLAGS: 00000246 ORIG_RAX: 0000000000000036 RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 00007f65a1ace7d9 RDX: 0000000000000040 RSI: 0000000000000029 RDI: 0000000000000003 RBP: 00007f65a1b574c8 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000020000000 R11: 0000000000000246 R12: 00007f65a1b55130 R13: 00007f65a1b574c0 R14: 00007f65a1b24090 R15: 0000000000022000 </TASK> The buggy address belongs to the page: page:ffffea0000706a00 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1c1a8 flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff) raw: 00fff00000000000 ffffea0001c1b108 ffffea000046dd08 0000000000000000 raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected page_owner tracks the page as freed page last allocated via order 2, migratetype Unmovable, gfp_mask 0x52dc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_ZERO), pid 4430, ts 1061781545818, free_ts 1061791488993 prep_new_page mm/page_alloc.c:2434 [inline] get_page_from_freelist+0xa72/0x2f50 mm/page_alloc.c:4165 __alloc_pages+0x1b2/0x500 mm/page_alloc.c:5389 __alloc_pages_node include/linux/gfp.h:572 [inline] alloc_pages_node include/linux/gfp.h:595 [inline] kmalloc_large_node+0x62/0x130 mm/slub.c:4438 __kmalloc_node+0x35a/0x4a0 mm/slub.c:4454 kmalloc_node include/linux/slab.h:604 [inline] kvmalloc_node+0x97/0x100 mm/util.c:580 kvmalloc include/linux/slab.h:731 [inline] kvzalloc include/linux/slab.h:739 [inline] allocate_hook_entries_size net/netfilter/core.c:61 [inline] nf_hook_entries_grow+0x140/0x780 net/netfilter/core.c:128 __nf_register_net_hook+0x144/0x820 net/netfilter/core.c:429 nf_register_net_hook+0x114/0x170 net/netfilter/core.c:571 nf_register_net_hooks+0x59/0xc0 net/netfilter/core.c:587 nf_synproxy_ipv6_init+0x85/0xe0 net/netfilter/nf_synproxy_core.c:1218 synproxy_tg6_check+0x30d/0x560 net/ipv6/netfilter/ip6t_SYNPROXY.c:81 xt_check_target+0x26c/0x9e0 net/netfilter/x_tables.c:1038 check_target net/ipv6/netfilter/ip6_tables.c:530 [inline] find_check_entry.constprop.0+0x7f1/0x9e0 net/ipv6/netfilter/ip6_tables.c:573 translate_table+0xc8b/0x1750 net/ipv6/netfilter/ip6_tables.c:735 do_replace net/ipv6/netfilter/ip6_tables.c:1153 [inline] do_ip6t_set_ctl+0x56e/0xb90 net/ipv6/netfilter/ip6_tables.c:1639 nf_setsockopt+0x83/0xe0 net/netfilter/nf_sockopt.c:101 page last free stack trace: reset_page_owner include/linux/page_owner.h:24 [inline] free_pages_prepare mm/page_alloc.c:1352 [inline] free_pcp_prepare+0x374/0x870 mm/page_alloc.c:1404 free_unref_page_prepare mm/page_alloc.c:3325 [inline] free_unref_page+0x19/0x690 mm/page_alloc.c:3404 kvfree+0x42/0x50 mm/util.c:613 rcu_do_batch kernel/rcu/tree.c:2527 [inline] rcu_core+0x7b1/0x1820 kernel/rcu/tree.c:2778 __do_softirq+0x29b/0x9c2 kernel/softirq.c:558 Memory state around the buggy address: ffff88801c1a7f00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff88801c1a7f80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >ffff88801c1a8000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ^ ffff88801c1a8080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff88801c1a8100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff Fixes: 2420b79f8c18 ("netfilter: debug: check for sorted array") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-02netfilter: nf_tables: fix memory leak during stateful obj updateFlorian Westphal
commit dad3bdeef45f81a6e90204bcc85360bb76eccec7 upstream. stateful objects can be updated from the control plane. The transaction logic allocates a temporary object for this purpose. The ->init function was called for this object, so plain kfree() leaks resources. We must call ->destroy function of the object. nft_obj_destroy does this, but it also decrements the module refcount, but the update path doesn't increment it. To avoid special-casing the update object release, do module_get for the update case too and release it via nft_obj_destroy(). Fixes: d62d0ba97b58 ("netfilter: nf_tables: Introduce stateful object update operation") Cc: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-02netfilter: nf_tables_offload: incorrect flow offload action array sizePablo Neira Ayuso
commit b1a5983f56e371046dcf164f90bfaf704d2b89f6 upstream. immediate verdict expression needs to allocate one slot in the flow offload action array, however, immediate data expression does not need to do so. fwd and dup expression need to allocate one slot, this is missing. Add a new offload_action interface to report if this expression needs to allocate one slot in the flow offload action array. Fixes: be2861dc36d7 ("netfilter: nft_{fwd,dup}_netdev: add offload support") Reported-and-tested-by: Nick Gregory <Nick.Gregory@Sophos.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-23netfilter: conntrack: don't refresh sctp entries in closed stateFlorian Westphal
[ Upstream commit 77b337196a9d87f3d6bb9b07c0436ecafbffda1e ] Vivek Thrivikraman reported: An SCTP server application which is accessed continuously by client application. When the session disconnects the client retries to establish a connection. After restart of SCTP server application the session is not established because of stale conntrack entry with connection state CLOSED as below. (removing this entry manually established new connection): sctp 9 CLOSED src=10.141.189.233 [..] [ASSURED] Just skip timeout update of closed entries, we don't want them to stay around forever. Reported-and-tested-by: Vivek Thrivikraman <vivek.thrivikraman@est.tech> Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1579 Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-02-23netfilter: nft_synproxy: unregister hooks on init error pathPablo Neira Ayuso
commit 2b4e5fb4d3776c391e40fb33673ba946dd96012d upstream. Disable the IPv4 hooks if the IPv6 hooks fail to be registered. Fixes: ad49d86e07a4 ("netfilter: nf_tables: Add synproxy support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-01netfilter: conntrack: don't increment invalid counter on NF_REPEATFlorian Westphal
[ Upstream commit 830af2eba40327abec64325a5b08b1e85c37a2e0 ] The packet isn't invalid, REPEAT means we're trying again after cleaning out a stale connection, e.g. via tcp tracker. This caused increases of invalid stat counter in a test case involving frequent connection reuse, even though no packet is actually invalid. Fixes: 56a62e2218f5 ("netfilter: conntrack: fix NF_REPEAT handling") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-02-01netfilter: nft_payload: do not update layer 4 checksum when mangling fragmentsPablo Neira Ayuso
commit 4e1860a3863707e8177329c006d10f9e37e097a8 upstream. IP fragments do not come with the transport header, hence skip bogus layer 4 checksum updates. Fixes: 1814096980bb ("netfilter: nft_payload: layer 4 checksum adjustment for pseudoheader fields") Reported-and-tested-by: Steffen Weinreich <steve@weinreich.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-29netfilter: fix regression in looped (broad|multi)cast's MAC handlingIgnacy Gawędzki
[ Upstream commit ebb966d3bdfed581ecccbb4a7432341baf7619b4 ] In commit 5648b5e1169f ("netfilter: nfnetlink_queue: fix OOB when mac header was cleared"), the test for non-empty MAC header introduced in commit 2c38de4c1f8da7 ("netfilter: fix looped (broad|multi)cast's MAC handling") has been replaced with a test for a set MAC header. This breaks the case when the MAC header has been reset (using skb_reset_mac_header), as is the case with looped-back multicast packets. As a result, the packets ending up in NFQUEUE get a bogus hwaddr interpreted from the first bytes of the IP header. This patch adds a test for a non-empty MAC header in addition to the test for a set MAC header. The same two tests are also implemented in nfnetlink_log.c, where the initial code of commit 2c38de4c1f8da7 ("netfilter: fix looped (broad|multi)cast's MAC handling") has not been touched, but where supposedly the same situation may happen. Fixes: 5648b5e1169f ("netfilter: nfnetlink_queue: fix OOB when mac header was cleared") Signed-off-by: Ignacy Gawędzki <ignacy.gawedzki@green-communications.fr> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-01netfilter: ipvs: Fix reuse connection if RS weight is 0yangxingwu
[ Upstream commit c95c07836fa4c1767ed11d8eca0769c652760e32 ] We are changing expire_nodest_conn to work even for reused connections when conn_reuse_mode=0, just as what was done with commit dc7b3eb900aa ("ipvs: Fix reuse connection if real server is dead"). For controlled and persistent connections, the new connection will get the needed real server depending on the rules in ip_vs_check_template(). Fixes: d752c3645717 ("ipvs: allow rescheduling of new connections when port reuse is detected") Co-developed-by: Chuanqi Liu <legend050709@qq.com> Signed-off-by: Chuanqi Liu <legend050709@qq.com> Signed-off-by: yangxingwu <xingwu.yang@gmail.com> Acked-by: Simon Horman <horms@verge.net.au> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-17netfilter: nfnetlink_queue: fix OOB when mac header was clearedFlorian Westphal
[ Upstream commit 5648b5e1169ff1d6d6a46c35c0b5fbebd2a5cbb2 ] On 64bit platforms the MAC header is set to 0xffff on allocation and also when a helper like skb_unset_mac_header() is called. dev_parse_header may call skb_mac_header() which assumes valid mac offset: BUG: KASAN: use-after-free in eth_header_parse+0x75/0x90 Read of size 6 at addr ffff8881075a5c05 by task nf-queue/1364 Call Trace: memcpy+0x20/0x60 eth_header_parse+0x75/0x90 __nfqnl_enqueue_packet+0x1a61/0x3380 __nf_queue+0x597/0x1300 nf_queue+0xf/0x40 nf_hook_slow+0xed/0x190 nf_hook+0x184/0x440 ip_output+0x1c0/0x2a0 nf_reinject+0x26f/0x700 nfqnl_recv_verdict+0xa16/0x18b0 nfnetlink_rcv_msg+0x506/0xe70 The existing code only works if the skb has a mac header. Fixes: 2c38de4c1f8da7 ("netfilter: fix looped (broad|multi)cast's MAC handling") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-17netfilter: nft_dynset: relax superfluous check on set updatesPablo Neira Ayuso
[ Upstream commit 7b1394892de8d95748d05e3ee41e85edb4abbfa1 ] Relax this condition to make add and update commands idempotent for sets with no timeout. The eval function already checks if the set element timeout is available and updates it if the update command is used. Fixes: 22fe54d5fefc ("netfilter: nf_tables: add support for dynamic set updates") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-17netfilter: conntrack: set on IPS_ASSURED if flows enters internal stream statePablo Neira Ayuso
[ Upstream commit b7b1d02fc43925a4d569ec221715db2dfa1ce4f5 ] The internal stream state sets the timeout to 120 seconds 2 seconds after the creation of the flow, attach this internal stream state to the IPS_ASSURED flag for consistent event reporting. Before this patch: [NEW] udp 17 30 src=10.246.11.13 dst=216.239.35.0 sport=37282 dport=123 [UNREPLIED] src=216.239.35.0 dst=10.246.11.13 sport=123 dport=37282 [UPDATE] udp 17 30 src=10.246.11.13 dst=216.239.35.0 sport=37282 dport=123 src=216.239.35.0 dst=10.246.11.13 sport=123 dport=37282 [UPDATE] udp 17 30 src=10.246.11.13 dst=216.239.35.0 sport=37282 dport=123 src=216.239.35.0 dst=10.246.11.13 sport=123 dport=37282 [ASSURED] [DESTROY] udp 17 src=10.246.11.13 dst=216.239.35.0 sport=37282 dport=123 src=216.239.35.0 dst=10.246.11.13 sport=123 dport=37282 [ASSURED] Note IPS_ASSURED for the flow not yet in the internal stream state. after this update: [NEW] udp 17 30 src=10.246.11.13 dst=216.239.35.0 sport=37282 dport=123 [UNREPLIED] src=216.239.35.0 dst=10.246.11.13 sport=123 dport=37282 [UPDATE] udp 17 30 src=10.246.11.13 dst=216.239.35.0 sport=37282 dport=123 src=216.239.35.0 dst=10.246.11.13 sport=123 dport=37282 [UPDATE] udp 17 120 src=10.246.11.13 dst=216.239.35.0 sport=37282 dport=123 src=216.239.35.0 dst=10.246.11.13 sport=123 dport=37282 [ASSURED] [DESTROY] udp 17 src=10.246.11.13 dst=216.239.35.0 sport=37282 dport=123 src=216.239.35.0 dst=10.246.11.13 sport=123 dport=37282 [ASSURED] Before this patch, short-lived UDP flows never entered IPS_ASSURED, so they were already candidate flow to be deleted by early_drop under stress. Before this patch, IPS_ASSURED is set on regardless the internal stream state, attach this internal stream state to IPS_ASSURED. packet #1 (original direction) enters NEW state packet #2 (reply direction) enters ESTABLISHED state, sets on IPS_SEEN_REPLY paclet #3 (any direction) sets on IPS_ASSURED (if 2 seconds since the creation has passed by). Reported-by: Maciej Żenczykowski <zenczykowski@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-27netfilter: Kconfig: use 'default y' instead of 'm' for bool config optionVegard Nossum
commit 77076934afdcd46516caf18ed88b2f88025c9ddb upstream. This option, NF_CONNTRACK_SECMARK, is a bool, so it can never be 'm'. Fixes: 33b8e77605620 ("[NETFILTER]: Add CONFIG_NETFILTER_ADVANCED option") Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-27netfilter: ipvs: make global sysctl readonly in non-init netnsAntoine Tenart
[ Upstream commit 174c376278949c44aad89c514a6b5db6cee8db59 ] Because the data pointer of net/ipv4/vs/debug_level is not updated per netns, it must be marked as read-only in non-init netns. Fixes: c6d2d445d8de ("IPVS: netns, final patch enabling network name space.") Signed-off-by: Antoine Tenart <atenart@kernel.org> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-17netfilter: nf_nat_masquerade: defer conntrack walk to work queueFlorian Westphal
[ Upstream commit 7970a19b71044bf4dc2c1becc200275bdf1884d4 ] The ipv4 and device notifiers are called with RTNL mutex held. The table walk can take some time, better not block other RTNL users. 'ip a' has been reported to block for up to 20 seconds when conntrack table has many entries and device down events are frequent (e.g., PPP). Reported-and-tested-by: Martin Zaharinov <micron10@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-17netfilter: nf_nat_masquerade: make async masq_inet6_event handling genericFlorian Westphal
[ Upstream commit 30db406923b9285a9bac06a6af5e74bd6d0f1d06 ] masq_inet6_event is called asynchronously from system work queue, because the inet6 notifier is atomic and nf_iterate_cleanup can sleep. The ipv4 and device notifiers call nf_iterate_cleanup directly. This is legal, but these notifiers are called with RTNL mutex held. A large conntrack table with many devices coming and going will have severe impact on the system usability, with 'ip a' blocking for several seconds. This change places the defer code into a helper and makes it more generic so ipv4 and ifdown notifiers can be converted to defer the cleanup walk as well in a follow patch. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-06netfilter: ipset: Fix oversized kvmalloc() callsJozsef Kadlecsik
commit 7bbc3d385bd813077acaf0e6fdb2a86a901f5382 upstream. The commit commit 7661809d493b426e979f39ab512e3adf41fbcc69 Author: Linus Torvalds <torvalds@linux-foundation.org> Date: Wed Jul 14 09:45:49 2021 -0700 mm: don't allow oversized kvmalloc() calls limits the max allocatable memory via kvmalloc() to MAX_INT. Apply the same limit in ipset. Reported-by: syzbot+3493b1873fb3ea827986@syzkaller.appspotmail.com Reported-by: syzbot+2b8443c35458a617c904@syzkaller.appspotmail.com Reported-by: syzbot+ee5cb15f4a0e85e0d54e@syzkaller.appspotmail.com Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-06ipvs: check that ip_vs_conn_tab_bits is between 8 and 20Andrea Claudi
[ Upstream commit 69e73dbfda14fbfe748d3812da1244cce2928dcb ] ip_vs_conn_tab_bits may be provided by the user through the conn_tab_bits module parameter. If this value is greater than 31, or less than 0, the shift operator used to derive tab_size causes undefined behaviour. Fix this checking ip_vs_conn_tab_bits value to be in the range specified in ipvs Kconfig. If not, simply use default value. Fixes: 6f7edb4881bf ("IPVS: Allow boot time change of hash size") Reported-by: Yi Chen <yiche@redhat.com> Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Acked-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-03netfilter: conntrack: collect all entries in one cycleFlorian Westphal
[ Upstream commit 4608fdfc07e116f9fc0895beb40abad7cdb5ee3d ] Michal Kubecek reports that conntrack gc is responsible for frequent wakeups (every 125ms) on idle systems. On busy systems, timed out entries are evicted during lookup. The gc worker is only needed to remove entries after system becomes idle after a busy period. To resolve this, always scan the entire table. If the scan is taking too long, reschedule so other work_structs can run and resume from next bucket. After a completed scan, wait for 2 minutes before the next cycle. Heuristics for faster re-schedule are removed. GC_SCAN_INTERVAL could be exposed as a sysctl in the future to allow tuning this as-needed or even turn the gc worker off. Reported-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-08-26netfilter: nft_exthdr: fix endianness of tcp option castSergey Marinkevich
[ Upstream commit 2e34328b396a69b73661ba38d47d92b7cf21c2c4 ] I got a problem on MIPS with Big-Endian is turned on: every time when NF trying to change TCP MSS it returns because of new.v16 was greater than old.v16. But real MSS was 1460 and my rule was like this: add rule table chain tcp option maxseg size set 1400 And 1400 is lesser that 1460, not greater. Later I founded that main causer is cast from u32 to __be16. Debugging: In example MSS = 1400(HEX: 0x578). Here is representation of each byte like it is in memory by addresses from left to right(e.g. [0x0 0x1 0x2 0x3]). LE — Little-Endian system, BE — Big-Endian, left column is type. LE BE u32: [78 05 00 00] [00 00 05 78] As you can see, u32 representation will be casted to u16 from different half of 4-byte address range. But actually nf_tables uses registers and store data of various size. Actually TCP MSS stored in 2 bytes. But registers are still u32 in definition: struct nft_regs { union { u32 data[20]; struct nft_verdict verdict; }; }; So, access like regs->data[priv->sreg] exactly u32. So, according to table presents above, per-byte representation of stored TCP MSS in register will be: LE BE (u32)regs->data[]: [78 05 00 00] [05 78 00 00] ^^ ^^ We see that register uses just half of u32 and other 2 bytes may be used for some another data. But in nft_exthdr_tcp_set_eval() it casted just like u32 -> __be16: new.v16 = src But u32 overfill __be16, so it get 2 low bytes. For clarity draw one more table(<xx xx> means that bytes will be used for cast). LE BE u32: [<78 05> 00 00] [00 00 <05 78>] (u32)regs->data[]: [<78 05> 00 00] [05 78 <00 00>] As you can see, for Little-Endian nothing changes, but for Big-endian we take the wrong half. In my case there is some other data instead of zeros, so new MSS was wrongly greater. For shooting this bug I used solution for ports ranges. Applying of this patch does not affect Little-Endian systems. Signed-off-by: Sergey Marinkevich <sergey.marinkevich@eltex-co.ru> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-08-04netfilter: nft_nat: allow to specify layer 4 protocol NAT onlyPablo Neira Ayuso
[ Upstream commit a33f387ecd5aafae514095c2c4a8c24f7aea7e8b ] nft_nat reports a bogus EAFNOSUPPORT if no layer 3 information is specified. Fixes: d07db9884a5f ("netfilter: nf_tables: introduce nft_validate_register_load()") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-08-04netfilter: conntrack: adjust stop timestamp to real expiry valueFlorian Westphal
[ Upstream commit 30a56a2b881821625f79837d4d968c679852444e ] In case the entry is evicted via garbage collection there is delay between the timeout value and the eviction event. This adjusts the stop value based on how much time has passed. Fixes: b87a2f9199ea82 ("netfilter: conntrack: add gc worker to remove timed-out entries") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-25netfilter: ctnetlink: suspicious RCU usage in ctnetlink_dump_helpinfoVasily Averin
commit c23a9fd209bc6f8c1fa6ee303fdf037d784a1627 upstream. Two patches listed below removed ctnetlink_dump_helpinfo call from under rcu_read_lock. Now its rcu_dereference generates following warning: ============================= WARNING: suspicious RCU usage 5.13.0+ #5 Not tainted ----------------------------- net/netfilter/nf_conntrack_netlink.c:221 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 stack backtrace: CPU: 1 PID: 2251 Comm: conntrack Not tainted 5.13.0+ #5 Call Trace: dump_stack+0x7f/0xa1 ctnetlink_dump_helpinfo+0x134/0x150 [nf_conntrack_netlink] ctnetlink_fill_info+0x2c2/0x390 [nf_conntrack_netlink] ctnetlink_dump_table+0x13f/0x370 [nf_conntrack_netlink] netlink_dump+0x10c/0x370 __netlink_dump_start+0x1a7/0x260 ctnetlink_get_conntrack+0x1e5/0x250 [nf_conntrack_netlink] nfnetlink_rcv_msg+0x613/0x993 [nfnetlink] netlink_rcv_skb+0x50/0x100 nfnetlink_rcv+0x55/0x120 [nfnetlink] netlink_unicast+0x181/0x260 netlink_sendmsg+0x23f/0x460 sock_sendmsg+0x5b/0x60 __sys_sendto+0xf1/0x160 __x64_sys_sendto+0x24/0x30 do_syscall_64+0x36/0x70 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: 49ca022bccc5 ("netfilter: ctnetlink: don't dump ct extensions of unconfirmed conntracks") Fixes: 0b35f6031a00 ("netfilter: Remove duplicated rcu_read_lock.") Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-14netfilter: nft_tproxy: restrict support to TCP and UDP transport protocolsPablo Neira Ayuso
[ Upstream commit 52f0f4e178c757b3d356087376aad8bd77271828 ] Add unfront check for TCP and UDP packets before performing further processing. Fixes: 4ed8eb6570a4 ("netfilter: nf_tables: Add native tproxy support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14netfilter: nft_osf: check for TCP packet before further processingPablo Neira Ayuso
[ Upstream commit 8f518d43f89ae00b9cf5460e10b91694944ca1a8 ] The osf expression only supports for TCP packets, add a upfront sanity check to skip packet parsing if this is not a TCP packet. Fixes: b96af92d6eaf ("netfilter: nf_tables: implement Passive OS fingerprint module in nft_osf") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14netfilter: nft_exthdr: check for IPv6 packet before further processingPablo Neira Ayuso
[ Upstream commit cdd73cc545c0fb9b1a1f7b209f4f536e7990cff4 ] ipv6_find_hdr() does not validate that this is an IPv6 packet. Add a sanity check for calling ipv6_find_hdr() to make sure an IPv6 packet is passed for parsing. Fixes: 96518518cc41 ("netfilter: add nftables") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-23netfilter: synproxy: Fix out of bounds when parsing TCP optionsMaxim Mikityanskiy
[ Upstream commit 5fc177ab759418c9537433e63301096e733fb915 ] The TCP option parser in synproxy (synproxy_parse_options) could read one byte out of bounds. When the length is 1, the execution flow gets into the loop, reads one byte of the opcode, and if the opcode is neither TCPOPT_EOL nor TCPOPT_NOP, it reads one more byte, which exceeds the length of 1. This fix is inspired by commit 9609dad263f8 ("ipv4: tcp_input: fix stack out of bounds when parsing TCP options."). v2 changes: Added an early return when length < 0 to avoid calling skb_header_pointer with negative length. Cc: Young Xiao <92siuyang@gmail.com> Fixes: 48b1de4c110a ("netfilter: add SYNPROXY core/target") Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-10netfilter: nfnetlink_cthelper: hit EBUSY on updates if size mismatchesPablo Neira Ayuso
[ Upstream commit 8971ee8b087750a23f3cd4dc55bff2d0303fd267 ] The private helper data size cannot be updated. However, updates that contain NFCTH_PRIV_DATA_LEN might bogusly hit EBUSY even if the size is the same. Fixes: 12f7a505331e ("netfilter: add user-space connection tracking helper infrastructure") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-10netfilter: nft_ct: skip expectations for confirmed conntrackPablo Neira Ayuso
[ Upstream commit 1710eb913bdcda3917f44d383c32de6bdabfc836 ] nft_ct_expect_obj_eval() calls nf_ct_ext_add() for a confirmed conntrack entry. However, nf_ct_ext_add() can only be called for !nf_ct_is_confirmed(). [ 1825.349056] WARNING: CPU: 0 PID: 1279 at net/netfilter/nf_conntrack_extend.c:48 nf_ct_xt_add+0x18e/0x1a0 [nf_conntrack] [ 1825.351391] RIP: 0010:nf_ct_ext_add+0x18e/0x1a0 [nf_conntrack] [ 1825.351493] Code: 41 5c 41 5d 41 5e 41 5f c3 41 bc 0a 00 00 00 e9 15 ff ff ff ba 09 00 00 00 31 f6 4c 89 ff e8 69 6c 3d e9 eb 96 45 31 ed eb cd <0f> 0b e9 b1 fe ff ff e8 86 79 14 e9 eb bf 0f 1f 40 00 0f 1f 44 00 [ 1825.351721] RSP: 0018:ffffc90002e1f1e8 EFLAGS: 00010202 [ 1825.351790] RAX: 000000000000000e RBX: ffff88814f5783c0 RCX: ffffffffc0e4f887 [ 1825.351881] RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88814f578440 [ 1825.351971] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff88814f578447 [ 1825.352060] R10: ffffed1029eaf088 R11: 0000000000000001 R12: ffff88814f578440 [ 1825.352150] R13: ffff8882053f3a00 R14: 0000000000000000 R15: 0000000000000a20 [ 1825.352240] FS: 00007f992261c900(0000) GS:ffff889faec00000(0000) knlGS:0000000000000000 [ 1825.352343] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1825.352417] CR2: 000056070a4d1158 CR3: 000000015efe0000 CR4: 0000000000350ee0 [ 1825.352508] Call Trace: [ 1825.352544] nf_ct_helper_ext_add+0x10/0x60 [nf_conntrack] [ 1825.352641] nft_ct_expect_obj_eval+0x1b8/0x1e0 [nft_ct] [ 1825.352716] nft_do_chain+0x232/0x850 [nf_tables] Add the ct helper extension only for unconfirmed conntrack. Skip rule evaluation if the ct helper extension does not exist. Thus, you can only create expectations from the first packet. It should be possible to remove this limitation by adding a new action to attach a generic ct helper to the first packet. Then, use this ct helper extension from follow up packets to create the ct expectation. While at it, add a missing check to skip the template conntrack too and remove check for IPCT_UNTRACK which is implicit to !ct. Fixes: 857b46027d6f ("netfilter: nft_ct: add ct expectations support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-10ipvs: ignore IP_VS_SVC_F_HASHED flag when adding serviceJulian Anastasov
[ Upstream commit 56e4ee82e850026d71223262c07df7d6af3bd872 ] syzbot reported memory leak [1] when adding service with HASHED flag. We should ignore this flag both from sockopt and netlink provided data, otherwise the service is not hashed and not visible while releasing resources. [1] BUG: memory leak unreferenced object 0xffff888115227800 (size 512): comm "syz-executor263", pid 8658, jiffies 4294951882 (age 12.560s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff83977188>] kmalloc include/linux/slab.h:556 [inline] [<ffffffff83977188>] kzalloc include/linux/slab.h:686 [inline] [<ffffffff83977188>] ip_vs_add_service+0x598/0x7c0 net/netfilter/ipvs/ip_vs_ctl.c:1343 [<ffffffff8397d770>] do_ip_vs_set_ctl+0x810/0xa40 net/netfilter/ipvs/ip_vs_ctl.c:2570 [<ffffffff838449a8>] nf_setsockopt+0x68/0xa0 net/netfilter/nf_sockopt.c:101 [<ffffffff839ae4e9>] ip_setsockopt+0x259/0x1ff0 net/ipv4/ip_sockglue.c:1435 [<ffffffff839fa03c>] raw_setsockopt+0x18c/0x1b0 net/ipv4/raw.c:857 [<ffffffff83691f20>] __sys_setsockopt+0x1b0/0x360 net/socket.c:2117 [<ffffffff836920f2>] __do_sys_setsockopt net/socket.c:2128 [inline] [<ffffffff836920f2>] __se_sys_setsockopt net/socket.c:2125 [inline] [<ffffffff836920f2>] __x64_sys_setsockopt+0x22/0x30 net/socket.c:2125 [<ffffffff84350efa>] do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47 [<ffffffff84400068>] entry_SYSCALL_64_after_hwframe+0x44/0xae Reported-and-tested-by: syzbot+e562383183e4b1766930@syzkaller.appspotmail.com Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Julian Anastasov <ja@ssi.bg> Reviewed-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-10netfilter: conntrack: unregister ipv4 sockopts on error unwindFlorian Westphal
[ Upstream commit 22cbdbcfb61acc78d5fc21ebb13ccc0d7e29f793 ] When ipv6 sockopt register fails, the ipv4 one needs to be removed. Fixes: a0ae2562c6c ("netfilter: conntrack: remove l3proto abstraction") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19netfilter: conntrack: Make global sysctls readonly in non-init netnsJonathon Reinhart
commit 2671fa4dc0109d3fb581bc3078fdf17b5d9080f6 upstream. These sysctls point to global variables: - NF_SYSCTL_CT_MAX (&nf_conntrack_max) - NF_SYSCTL_CT_EXPECT_MAX (&nf_ct_expect_max) - NF_SYSCTL_CT_BUCKETS (&nf_conntrack_htable_size_user) Because their data pointers are not updated to point to per-netns structures, they must be marked read-only in a non-init_net ns. Otherwise, changes in any net namespace are reflected in (leaked into) all other net namespaces. This problem has existed since the introduction of net namespaces. The current logic marks them read-only only if the net namespace is owned by an unprivileged user (other than init_user_ns). Commit d0febd81ae77 ("netfilter: conntrack: re-visit sysctls in unprivileged namespaces") "exposes all sysctls even if the namespace is unpriviliged." Since we need to mark them readonly in any case, we can forego the unprivileged user check altogether. Fixes: d0febd81ae77 ("netfilter: conntrack: re-visit sysctls in unprivileged namespaces") Signed-off-by: Jonathon Reinhart <Jonathon.Reinhart@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-19netfilter: nftables: avoid overflows in nft_hash_buckets()Eric Dumazet
[ Upstream commit a54754ec9891830ba548e2010c889e3c8146e449 ] Number of buckets being stored in 32bit variables, we have to ensure that no overflows occur in nft_hash_buckets() syzbot injected a size == 0x40000000 and reported: UBSAN: shift-out-of-bounds in ./include/linux/log2.h:57:13 shift exponent 64 is too large for 64-bit type 'long unsigned int' CPU: 1 PID: 29539 Comm: syz-executor.4 Not tainted 5.12.0-rc7-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x141/0x1d7 lib/dump_stack.c:120 ubsan_epilogue+0xb/0x5a lib/ubsan.c:148 __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:327 __roundup_pow_of_two include/linux/log2.h:57 [inline] nft_hash_buckets net/netfilter/nft_set_hash.c:411 [inline] nft_hash_estimate.cold+0x19/0x1e net/netfilter/nft_set_hash.c:652 nft_select_set_ops net/netfilter/nf_tables_api.c:3586 [inline] nf_tables_newset+0xe62/0x3110 net/netfilter/nf_tables_api.c:4322 nfnetlink_rcv_batch+0xa09/0x24b0 net/netfilter/nfnetlink.c:488 nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:612 [inline] nfnetlink_rcv+0x3af/0x420 net/netfilter/nfnetlink.c:630 netlink_unicast_kernel net/netlink/af_netlink.c:1312 [inline] netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1338 netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1927 sock_sendmsg_nosec net/socket.c:654 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:674 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2350 ___sys_sendmsg+0xf3/0x170 net/socket.c:2404 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2433 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 Fixes: 0ed6389c483d ("netfilter: nf_tables: rename set implementations") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19netfilter: nfnetlink_osf: Fix a missing skb_header_pointer() NULL checkPablo Neira Ayuso
[ Upstream commit 5e024c325406470d1165a09c6feaf8ec897936be ] Do not assume that the tcph->doff field is correct when parsing for TCP options, skb_header_pointer() might fail to fetch these bits. Fixes: 11eeef41d5f6 ("netfilter: passive OS fingerprint xtables match") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19netfilter: xt_SECMARK: add new revision to fix structure layoutPablo Neira Ayuso
[ Upstream commit c7d13358b6a2f49f81a34aa323a2d0878a0532a2 ] This extension breaks when trying to delete rules, add a new revision to fix this. Fixes: 5e6874cdb8de ("[SECMARK]: Add xtables SECMARK target") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-04-21netfilter: nft_limit: avoid possible divide error in nft_limit_initEric Dumazet
commit b895bdf5d643b6feb7c60856326dd4feb6981560 upstream. div_u64() divides u64 by u32. nft_limit_init() wants to divide u64 by u64, use the appropriate math function (div64_u64) divide error: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 8390 Comm: syz-executor188 Not tainted 5.12.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:div_u64_rem include/linux/math64.h:28 [inline] RIP: 0010:div_u64 include/linux/math64.h:127 [inline] RIP: 0010:nft_limit_init+0x2a2/0x5e0 net/netfilter/nft_limit.c:85 Code: ef 4c 01 eb 41 0f 92 c7 48 89 de e8 38 a5 22 fa 4d 85 ff 0f 85 97 02 00 00 e8 ea 9e 22 fa 4c 0f af f3 45 89 ed 31 d2 4c 89 f0 <49> f7 f5 49 89 c6 e8 d3 9e 22 fa 48 8d 7d 48 48 b8 00 00 00 00 00 RSP: 0018:ffffc90009447198 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000200000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff875152e6 RDI: 0000000000000003 RBP: ffff888020f80908 R08: 0000200000000000 R09: 0000000000000000 R10: ffffffff875152d8 R11: 0000000000000000 R12: ffffc90009447270 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 FS: 000000000097a300(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000200001c4 CR3: 0000000026a52000 CR4: 00000000001506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: nf_tables_newexpr net/netfilter/nf_tables_api.c:2675 [inline] nft_expr_init+0x145/0x2d0 net/netfilter/nf_tables_api.c:2713 nft_set_elem_expr_alloc+0x27/0x280 net/netfilter/nf_tables_api.c:5160 nf_tables_newset+0x1997/0x3150 net/netfilter/nf_tables_api.c:4321 nfnetlink_rcv_batch+0x85a/0x21b0 net/netfilter/nfnetlink.c:456 nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:580 [inline] nfnetlink_rcv+0x3af/0x420 net/netfilter/nfnetlink.c:598 netlink_unicast_kernel net/netlink/af_netlink.c:1312 [inline] netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1338 netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1927 sock_sendmsg_nosec net/socket.c:654 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:674 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2350 ___sys_sendmsg+0xf3/0x170 net/socket.c:2404 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2433 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: c26844eda9d4 ("netfilter: nf_tables: Fix nft limit burst handling") Fixes: 3e0f64b7dd31 ("netfilter: nft_limit: fix packet ratelimiting") Signed-off-by: Eric Dumazet <edumazet@google.com> Diagnosed-by: Luigi Rizzo <lrizzo@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-21netfilter: conntrack: do not print icmpv6 as unknown via /procPablo Neira Ayuso
commit fbea31808ca124dd73ff6bb1e67c9af4607c3e32 upstream. /proc/net/nf_conntrack shows icmpv6 as unknown. Fixes: 09ec82f5af99 ("netfilter: conntrack: remove protocol name from l4proto struct") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-16netfilter: x_tables: fix compat match/target pad out-of-bound writeFlorian Westphal
commit b29c457a6511435960115c0f548c4360d5f4801d upstream. xt_compat_match/target_from_user doesn't check that zeroing the area to start of next rule won't write past end of allocated ruleset blob. Remove this code and zero the entire blob beforehand. Reported-by: syzbot+cfc0247ac173f597aaaa@syzkaller.appspotmail.com Reported-by: Andy Nguyen <theflow@google.com> Fixes: 9fa492cdc160c ("[NETFILTER]: x_tables: simplify compat API") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-10netfilter: conntrack: Fix gre tunneling over ipv6Ludovic Senecaux
[ Upstream commit 8b2030b4305951f44afef80225f1475618e25a73 ] This fix permits gre connections to be tracked within ip6tables rules Signed-off-by: Ludovic Senecaux <linuxludo@free.fr> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-30netfilter: x_tables: Use correct memory barriers.Mark Tomlinson
[ Upstream commit 175e476b8cdf2a4de7432583b49c871345e4f8a1 ] When a new table value was assigned, it was followed by a write memory barrier. This ensured that all writes before this point would complete before any writes after this point. However, to determine whether the rules are unused, the sequence counter is read. To ensure that all writes have been done before these reads, a full memory barrier is needed, not just a write memory barrier. The same argument applies when incrementing the counter, before the rules are read. Changing to using smp_mb() instead of smp_wmb() fixes the kernel panic reported in cc00bcaa5899 (which is still present), while still maintaining the same speed of replacing tables. The smb_mb() barriers potentially slow the packet path, however testing has shown no measurable change in performance on a 4-core MIPS64 platform. Fixes: 7f5c6d4f665b ("netfilter: get rid of atomic ops in fast path") Signed-off-by: Mark Tomlinson <mark.tomlinson@alliedtelesis.co.nz> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-30Revert "netfilter: x_tables: Switch synchronization to RCU"Mark Tomlinson
[ Upstream commit d3d40f237480abf3268956daf18cdc56edd32834 ] This reverts commit cc00bcaa589914096edef7fb87ca5cee4a166b5c. This (and the preceding) patch basically re-implemented the RCU mechanisms of patch 784544739a25. That patch was replaced because of the performance problems that it created when replacing tables. Now, we have the same issue: the call to synchronize_rcu() makes replacing tables slower by as much as an order of magnitude. Prior to using RCU a script calling "iptables" approx. 200 times was taking 1.16s. With RCU this increased to 11.59s. Revert these patches and fix the issue in a different way. Signed-off-by: Mark Tomlinson <mark.tomlinson@alliedtelesis.co.nz> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-30netfilter: ctnetlink: fix dump of the expect mask attributeFlorian Westphal
[ Upstream commit b58f33d49e426dc66e98ed73afb5d97b15a25f2d ] Before this change, the mask is never included in the netlink message, so "conntrack -E expect" always prints 0.0.0.0. In older kernels the l3num callback struct was passed as argument, based on tuple->src.l3num. After the l3num indirection got removed, the call chain is based on m.src.l3num, but this value is 0xffff. Init l3num to the correct value. Fixes: f957be9d349a3 ("netfilter: conntrack: remove ctnetlink callbacks from l3 protocol trackers") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-17netfilter: x_tables: gpf inside xt_find_revision()Vasily Averin
commit 8e24edddad152b998b37a7f583175137ed2e04a5 upstream. nested target/match_revfn() calls work with xt[NFPROTO_UNSPEC] lists without taking xt[NFPROTO_UNSPEC].mutex. This can race with module unload and cause host to crash: general protection fault: 0000 [#1] Modules linked in: ... [last unloaded: xt_cluster] CPU: 0 PID: 542455 Comm: iptables RIP: 0010:[<ffffffff8ffbd518>] [<ffffffff8ffbd518>] strcmp+0x18/0x40 RDX: 0000000000000003 RSI: ffff9a5a5d9abe10 RDI: dead000000000111 R13: ffff9a5a5d9abe10 R14: ffff9a5a5d9abd8c R15: dead000000000100 (VvS: %R15 -- &xt_match, %RDI -- &xt_match.name, xt_cluster unregister match in xt[NFPROTO_UNSPEC].match list) Call Trace: [<ffffffff902ccf44>] match_revfn+0x54/0xc0 [<ffffffff902ccf9f>] match_revfn+0xaf/0xc0 [<ffffffff902cd01e>] xt_find_revision+0x6e/0xf0 [<ffffffffc05a5be0>] do_ipt_get_ctl+0x100/0x420 [ip_tables] [<ffffffff902cc6bf>] nf_getsockopt+0x4f/0x70 [<ffffffff902dd99e>] ip_getsockopt+0xde/0x100 [<ffffffff903039b5>] raw_getsockopt+0x25/0x50 [<ffffffff9026c5da>] sock_common_getsockopt+0x1a/0x20 [<ffffffff9026b89d>] SyS_getsockopt+0x7d/0xf0 [<ffffffff903cbf92>] system_call_fastpath+0x25/0x2a Fixes: 656caff20e1 ("netfilter 04/09: x_tables: fix match/target revision lookup") Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-17netfilter: nf_nat: undo erroneous tcp edemux lookupFlorian Westphal
commit 03a3ca37e4c6478e3a84f04c8429dd5889e107fd upstream. Under extremely rare conditions TCP early demux will retrieve the wrong socket. 1. local machine establishes a connection to a remote server, S, on port p. This gives: laddr:lport -> S:p ... both in tcp and conntrack. 2. local machine establishes a connection to host H, on port p2. 2a. TCP stack choses same laddr:lport, so we have laddr:lport -> H:p2 from TCP point of view. 2b). There is a destination NAT rewrite in place, translating H:p2 to S:p. This results in following conntrack entries: I) laddr:lport -> S:p (origin) S:p -> laddr:lport (reply) II) laddr:lport -> H:p2 (origin) S:p -> laddr:lport2 (reply) NAT engine has rewritten laddr:lport to laddr:lport2 to map the reply packet to the correct origin. When server sends SYN/ACK to laddr:lport2, the PREROUTING hook will undo-the SNAT transformation, rewriting IP header to S:p -> laddr:lport This causes TCP early demux to associate the skb with the TCP socket of the first connection. The INPUT hook will then reverse the DNAT transformation, rewriting the IP header to H:p2 -> laddr:lport. Because packet ends up with the wrong socket, the new connection never completes: originator stays in SYN_SENT and conntrack entry remains in SYN_RECV until timeout, and responder retransmits SYN/ACK until it gives up. To resolve this, orphan the skb after the input rewrite: Because the source IP address changed, the socket must be incorrect. We can't move the DNAT undo to prerouting due to backwards compatibility, doing so will make iptables/nftables rules to no longer match the way they did. After orphan, the packet will be handed to the next protocol layer (tcp, udp, ...) and that will repeat the socket lookup just like as if early demux was disabled. Fixes: 41063e9dd1195 ("ipv4: Early TCP socket demux.") Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1427 Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-17netfilter: conntrack: skip identical origin tuple in same zone onlyFlorian Westphal
[ Upstream commit 07998281c268592963e1cd623fe6ab0270b65ae4 ] The origin skip check needs to re-test the zone. Else, we might skip a colliding tuple in the reply direction. This only occurs when using 'directional zones' where origin tuples reside in different zones but the reply tuples share the same zone. This causes the new conntrack entry to be dropped at confirmation time because NAT clash resolution was elided. Fixes: 4e35c1cb9460240 ("netfilter: nf_nat: skip nat clash resolution for same-origin entries") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-02-17netfilter: flowtable: fix tcp and udp header checksum updateSven Auhagen
[ Upstream commit 8d6bca156e47d68551750a384b3ff49384c67be3 ] When updating the tcp or udp header checksum on port nat the function inet_proto_csum_replace2 with the last parameter pseudohdr as true. This leads to an error in the case that GRO is used and packets are split up in GSO. The tcp or udp checksum of all packets is incorrect. The error is probably masked due to the fact the most network driver implement tcp/udp checksum offloading. It also only happens when GRO is applied and not on single packets. The error is most visible when using a pppoe connection which is not triggering the tcp/udp checksum offload. Fixes: ac2a66665e23 ("netfilter: add generic flow table infrastructure") Signed-off-by: Sven Auhagen <sven.auhagen@voleatech.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>