aboutsummaryrefslogtreecommitdiffstats
path: root/include
AgeCommit message (Collapse)Author
2022-05-25Merge tag 'v5.10.116' into v5.10/standard/baseBruce Ashfield
This is the 5.10.116 stable release # gpg: Signature made Sun 15 May 2022 02:00:18 PM EDT # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2022-05-25Merge tag 'v5.10.115' into v5.10/standard/baseBruce Ashfield
This is the 5.10.115 stable release # gpg: Signature made Thu 12 May 2022 06:27:51 AM EDT # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2022-05-15Bluetooth: Fix the creation of hdev->nameItay Iellin
commit 103a2f3255a95991252f8f13375c3a96a75011cd upstream. Set a size limit of 8 bytes of the written buffer to "hdev->name" including the terminating null byte, as the size of "hdev->name" is 8 bytes. If an id value which is greater than 9999 is allocated, then the "snprintf(hdev->name, sizeof(hdev->name), "hci%d", id)" function call would lead to a truncation of the id value in decimal notation. Set an explicit maximum id parameter in the id allocation function call. The id allocation function defines the maximum allocated id value as the maximum id parameter value minus one. Therefore, HCI_MAX_ID is defined as 10000. Signed-off-by: Itay Iellin <ieitayie@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-15arm: remove CONFIG_ARCH_HAS_HOLES_MEMORYMODELMike Rapoport
commit 5e545df3292fbd3d5963c68980f1527ead2a2b3f upstream. ARM is the only architecture that defines CONFIG_ARCH_HAS_HOLES_MEMORYMODEL which in turn enables memmap_valid_within() function that is intended to verify existence of struct page associated with a pfn when there are holes in the memory map. However, the ARCH_HAS_HOLES_MEMORYMODEL also enables HAVE_ARCH_PFN_VALID and arch-specific pfn_valid() implementation that also deals with the holes in the memory map. The only two users of memmap_valid_within() call this function after a call to pfn_valid() so the memmap_valid_within() check becomes redundant. Remove CONFIG_ARCH_HAS_HOLES_MEMORYMODEL and memmap_valid_within() and rely entirely on ARM's implementation of pfn_valid() that is now enabled unconditionally. Link: https://lkml.kernel.org/r/20201101170454.9567-9-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matt Turner <mattst88@gmail.com> Cc: Meelis Roos <mroos@linux.ee> Cc: Michael Schmitz <schmitzmic@gmail.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Tony Luck <tony.luck@intel.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: kernel test robot <lkp@intel.com> Fixes: 8dd559d53b3b ("arm: ioremap: don't abuse pfn_valid() to check if pfn is in RAM") Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-15regulator: consumer: Add missing stubs to regulator/consumer.hDmitry Osipenko
commit 51dfb6ca3728bd0a0a3c23776a12d2a15a1d2457 upstream. Add missing stubs to regulator/consumer.h in order to fix COMPILE_TEST of the kernel. In particular this should fix compile-testing of OPP core because of a missing stub for regulator_sync_voltage(). Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Dmitry Osipenko <digetx@gmail.com> Link: https://lore.kernel.org/r/20210120205844.12658-1-digetx@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org> Cc: Bjørn Mork <bjorn@mork.no> Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-12regulator: consumer: Add missing stubs to regulator/consumer.hDmitry Osipenko
commit 51dfb6ca3728bd0a0a3c23776a12d2a15a1d2457 upstream Add missing stubs to regulator/consumer.h in order to fix COMPILE_TEST of the kernel. In particular this should fix compile-testing of OPP core because of a missing stub for regulator_sync_voltage(). Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Dmitry Osipenko <digetx@gmail.com> Link: https://lore.kernel.org/r/20210120205844.12658-1-digetx@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org> Backport from mainline to fix build failure when CONFIG_REGULATOR=n which is default in linux-yocto. 5.10.114 in mainline stable tree does not handle this case. Link: https://lore.kernel.org/all/62796b82.1c69fb81.1bf47.0bcf@mx.google.com/ Signed-off-by: He Zhe <zhe.he@windriver.com> Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
2022-05-12net: stmmac: disable Split Header (SPH) for Intel platformsTan Tee Min
commit 47f753c1108e287edb3e27fad8a7511a9d55578e upstream. Based on DesignWare Ethernet QoS datasheet, we are seeing the limitation of Split Header (SPH) feature is not supported for Ipv4 fragmented packet. This SPH limitation will cause ping failure when the packets size exceed the MTU size. For example, the issue happens once the basic ping packet size is larger than the configured MTU size and the data is lost inside the fragmented packet, replaced by zeros/corrupted values, and leads to ping fail. So, disable the Split Header for Intel platforms. v2: Add fixes tag in commit message. Fixes: 67afd6d1cfdf("net: stmmac: Add Split Header support and enable it in XGMAC cores") Cc: <stable@vger.kernel.org> # 5.10.x Suggested-by: Ong, Boon Leong <boon.leong.ong@intel.com> Signed-off-by: Mohammad Athari Bin Ismail <mohammad.athari.ismail@intel.com> Signed-off-by: Wong Vee Khee <vee.khee.wong@linux.intel.com> Signed-off-by: Tan Tee Min <tee.min.tan@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-09Merge tag 'v5.10.114' into v5.10/standard/baseBruce Ashfield
This is the 5.10.114 stable release # gpg: Signature made Mon 09 May 2022 03:11:36 AM EDT # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2022-05-09tcp: make sure treq->af_specific is initializedEric Dumazet
[ Upstream commit ba5a4fdd63ae0c575707030db0b634b160baddd7 ] syzbot complained about a recent change in TCP stack, hitting a NULL pointer [1] tcp request sockets have an af_specific pointer, which was used before the blamed change only for SYNACK generation in non SYNCOOKIE mode. tcp requests sockets momentarily created when third packet coming from client in SYNCOOKIE mode were not using treq->af_specific. Make sure this field is populated, in the same way normal TCP requests sockets do in tcp_conn_request(). [1] TCP: request_sock_TCPv6: Possible SYN flooding on port 20002. Sending cookies. Check SNMP counters. general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f] CPU: 1 PID: 3695 Comm: syz-executor864 Not tainted 5.18.0-rc3-syzkaller-00224-g5fd1fe4807f9 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:tcp_create_openreq_child+0xe16/0x16b0 net/ipv4/tcp_minisocks.c:534 Code: 48 c1 ea 03 80 3c 02 00 0f 85 e5 07 00 00 4c 8b b3 28 01 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7e 08 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 c9 07 00 00 48 8b 3c 24 48 89 de 41 ff 56 08 48 RSP: 0018:ffffc90000de0588 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: ffff888076490330 RCX: 0000000000000100 RDX: 0000000000000001 RSI: ffffffff87d67ff0 RDI: 0000000000000008 RBP: ffff88806ee1c7f8 R08: 0000000000000000 R09: 0000000000000000 R10: ffffffff87d67f00 R11: 0000000000000000 R12: ffff88806ee1bfc0 R13: ffff88801b0e0368 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f517fe58700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffcead76960 CR3: 000000006f97b000 CR4: 00000000003506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <IRQ> tcp_v6_syn_recv_sock+0x199/0x23b0 net/ipv6/tcp_ipv6.c:1267 tcp_get_cookie_sock+0xc9/0x850 net/ipv4/syncookies.c:207 cookie_v6_check+0x15c3/0x2340 net/ipv6/syncookies.c:258 tcp_v6_cookie_check net/ipv6/tcp_ipv6.c:1131 [inline] tcp_v6_do_rcv+0x1148/0x13b0 net/ipv6/tcp_ipv6.c:1486 tcp_v6_rcv+0x3305/0x3840 net/ipv6/tcp_ipv6.c:1725 ip6_protocol_deliver_rcu+0x2e9/0x1900 net/ipv6/ip6_input.c:422 ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:464 NF_HOOK include/linux/netfilter.h:307 [inline] NF_HOOK include/linux/netfilter.h:301 [inline] ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:473 dst_input include/net/dst.h:461 [inline] ip6_rcv_finish net/ipv6/ip6_input.c:76 [inline] NF_HOOK include/linux/netfilter.h:307 [inline] NF_HOOK include/linux/netfilter.h:301 [inline] ipv6_rcv+0x27f/0x3b0 net/ipv6/ip6_input.c:297 __netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5405 __netif_receive_skb+0x24/0x1b0 net/core/dev.c:5519 process_backlog+0x3a0/0x7c0 net/core/dev.c:5847 __napi_poll+0xb3/0x6e0 net/core/dev.c:6413 napi_poll net/core/dev.c:6480 [inline] net_rx_action+0x8ec/0xc60 net/core/dev.c:6567 __do_softirq+0x29b/0x9c2 kernel/softirq.c:558 invoke_softirq kernel/softirq.c:432 [inline] __irq_exit_rcu+0x123/0x180 kernel/softirq.c:637 irq_exit_rcu+0x5/0x20 kernel/softirq.c:649 sysvec_apic_timer_interrupt+0x93/0xc0 arch/x86/kernel/apic/apic.c:1097 Fixes: 5b0b9e4c2c89 ("tcp: md5: incorrect tcp_header_len for incoming connections") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Francesco Ruggeri <fruggeri@arista.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-09tcp: fix potential xmit stalls caused by TCP_NOTSENT_LOWATEric Dumazet
[ Upstream commit 4bfe744ff1644fbc0a991a2677dc874475dd6776 ] I had this bug sitting for too long in my pile, it is time to fix it. Thanks to Doug Porter for reminding me of it! We had various attempts in the past, including commit 0cbe6a8f089e ("tcp: remove SOCK_QUEUE_SHRUNK"), but the issue is that TCP stack currently only generates EPOLLOUT from input path, when tp->snd_una has advanced and skb(s) cleaned from rtx queue. If a flow has a big RTT, and/or receives SACKs, it is possible that the notsent part (tp->write_seq - tp->snd_nxt) reaches 0 and no more data can be sent until tp->snd_una finally advances. What is needed is to also check if POLLOUT needs to be generated whenever tp->snd_nxt is advanced, from output path. This bug triggers more often after an idle period, as we do not receive ACK for at least one RTT. tcp_notsent_lowat could be a fraction of what CWND and pacing rate would allow to send during this RTT. In a followup patch, I will remove the bogus call to tcp_chrono_stop(sk, TCP_CHRONO_SNDBUF_LIMITED) from tcp_check_space(). Fact that we have decided to generate an EPOLLOUT does not mean the application has immediately refilled the transmit queue. This optimistic call might have been the reason the bug seemed not too serious. Tested: 200 ms rtt, 1% packet loss, 32 MB tcp_rmem[2] and tcp_wmem[2] $ echo 500000 >/proc/sys/net/ipv4/tcp_notsent_lowat $ cat bench_rr.sh SUM=0 for i in {1..10} do V=`netperf -H remote_host -l30 -t TCP_RR -- -r 10000000,10000 -o LOCAL_BYTES_SENT | egrep -v "MIGRATED|Bytes"` echo $V SUM=$(($SUM + $V)) done echo SUM=$SUM Before patch: $ bench_rr.sh 130000000 80000000 140000000 140000000 140000000 140000000 130000000 40000000 90000000 110000000 SUM=1140000000 After patch: $ bench_rr.sh 430000000 590000000 530000000 450000000 450000000 350000000 450000000 490000000 480000000 460000000 SUM=4680000000 # This is 410 % of the value before patch. Fixes: c9bee3b7fdec ("tcp: TCP_NOTSENT_LOWAT socket option") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Doug Porter <dsp@fb.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-09ip_gre, ip6_gre: Fix race condition on o_seqno in collect_md modePeilin Ye
[ Upstream commit 31c417c948d7f6909cb63f0ac3298f3c38f8ce20 ] As pointed out by Jakub Kicinski, currently using TUNNEL_SEQ in collect_md mode is racy for [IP6]GRE[TAP] devices. Consider the following sequence of events: 1. An [IP6]GRE[TAP] device is created in collect_md mode using "ip link add ... external". "ip" ignores "[o]seq" if "external" is specified, so TUNNEL_SEQ is off, and the device is marked as NETIF_F_LLTX (i.e. it uses lockless TX); 2. Someone sets TUNNEL_SEQ on outgoing skb's, using e.g. bpf_skb_set_tunnel_key() in an eBPF program attached to this device; 3. gre_fb_xmit() or __gre6_xmit() processes these skb's: gre_build_header(skb, tun_hlen, flags, protocol, tunnel_id_to_key32(tun_info->key.tun_id), (flags & TUNNEL_SEQ) ? htonl(tunnel->o_seqno++) : 0); ^^^^^^^^^^^^^^^^^ Since we are not using the TX lock (&txq->_xmit_lock), multiple CPUs may try to do this tunnel->o_seqno++ in parallel, which is racy. Fix it by making o_seqno atomic_t. As mentioned by Eric Dumazet in commit b790e01aee74 ("ip_gre: lockless xmit"), making o_seqno atomic_t increases "chance for packets being out of order at receiver" when NETIF_F_LLTX is on. Maybe a better fix would be: 1. Do not ignore "oseq" in external mode. Users MUST specify "oseq" if they want the kernel to allow sequencing of outgoing packets; 2. Reject all outgoing TUNNEL_SEQ packets if the device was not created with "oseq". Unfortunately, that would break userspace. We could now make [IP6]GRE[TAP] devices always NETIF_F_LLTX, but let us do it in separate patches to keep this fix minimal. Suggested-by: Jakub Kicinski <kuba@kernel.org> Fixes: 77a5196a804e ("gre: add sequence number for collect md mode.") Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-09tcp: ensure to use the most recently sent skb when filling the rate samplePengcheng Yang
[ Upstream commit b253a0680ceadc5d7b4acca7aa2d870326cad8ad ] If an ACK (s)acks multiple skbs, we favor the information from the most recently sent skb by choosing the skb with the highest prior_delivered count. But in the interval between receiving ACKs, we send multiple skbs with the same prior_delivered, because the tp->delivered only changes when we receive an ACK. We used RACK's solution, copying tcp_rack_sent_after() as tcp_skb_sent_after() helper to determine "which packet was sent last?". Later, we will use tcp_skb_sent_after() instead in RACK. Fixes: b9f64820fb22 ("tcp: track data delivery rate for a TCP connection") Signed-off-by: Pengcheng Yang <yangpc@wangsu.com> Cc: Paolo Abeni <pabeni@redhat.com> Acked-by: Neal Cardwell <ncardwell@google.com> Tested-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/1650422081-22153-1-git-send-email-yangpc@wangsu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-09memory: renesas-rpc-if: Fix HF/OSPI data transfer in Manual ModeGeert Uytterhoeven
[ Upstream commit 7e842d70fe599bc13594b650b2144c4b6e6d6bf1 ] HyperFlash devices fail to probe: rpc-if-hyperflash rpc-if-hyperflash: probing of hyperbus device failed In HyperFlash or Octal-SPI Flash mode, the Transfer Data Enable bits (SPIDE) in the Manual Mode Enable Setting Register (SMENR) are derived from half of the transfer size, cfr. the rpcif_bits_set() helper function. However, rpcif_reg_{read,write}() does not take the bus size into account, and does not double all Manual Mode Data Register access sizes when communicating with a HyperFlash or Octal-SPI Flash device. Fix this, and avoid the back-and-forth conversion between transfer size and Transfer Data Enable bits, by explicitly storing the transfer size in struct rpcif, and using that value to determine access size in rpcif_reg_{read,write}(). Enforce that the "high" Manual Mode Read/Write Data Registers (SM[RW]DR1) are only used for 8-byte data accesses. While at it, forbid writing to the Manual Mode Read Data Registers, as they are read-only. Fixes: fff53a551db50f5e ("memory: renesas-rpc-if: Correct QSPI data transfer in Manual mode") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Link: https://lore.kernel.org/r/cde9bfacf704c81865f57b15d1b48a4793da4286.1649681476.git.geert+renesas@glider.be Link: https://lore.kernel.org/r/20220420070526.9367-1-krzysztof.kozlowski@linaro.org' Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-09mtd: fix 'part' field data corruption in mtd_infoOleksandr Ocheretnyi
[ Upstream commit 37c5f9e80e015d0df17d0c377c18523002986851 ] Commit 46b5889cc2c5 ("mtd: implement proper partition handling") started using "mtd_get_master_ofs()" in mtd callbacks to determine memory offsets by means of 'part' field from mtd_info, what previously was smashed accessing 'master' field in the mtd_set_dev_defaults() method. That provides wrong offset what causes hardware access errors. Just make 'part', 'master' as separate fields, rather than using union type to avoid 'part' data corruption when mtd_set_dev_defaults() is called. Fixes: 46b5889cc2c5 ("mtd: implement proper partition handling") Signed-off-by: Oleksandr Ocheretnyi <oocheret@cisco.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/20220417184649.449289-1-oocheret@cisco.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-09hex2bin: make the function hex_to_bin constant-timeMikulas Patocka
commit e5be15767e7e284351853cbaba80cde8620341fb upstream. The function hex2bin is used to load cryptographic keys into device mapper targets dm-crypt and dm-integrity. It should take constant time independent on the processed data, so that concurrently running unprivileged code can't infer any information about the keys via microarchitectural convert channels. This patch changes the function hex_to_bin so that it contains no branches and no memory accesses. Note that this shouldn't cause performance degradation because the size of the new function is the same as the size of the old function (on x86-64) - and the new function causes no branch misprediction penalties. I compile-tested this function with gcc on aarch64 alpha arm hppa hppa64 i386 ia64 m68k mips32 mips64 powerpc powerpc64 riscv sh4 s390x sparc32 sparc64 x86_64 and with clang on aarch64 arm hexagon i386 mips32 mips64 powerpc powerpc64 s390x sparc32 sparc64 x86_64 to verify that there are no branches in the generated code. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-03Merge tag 'v5.10.113' into v5.10/standard/baseBruce Ashfield
This is the 5.10.113 stable release # gpg: Signature made Wed 27 Apr 2022 07:54:06 AM EDT # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2022-04-27oom_kill.c: futex: delay the OOM reaper to allow time for proper futex cleanupNico Pache
commit e4a38402c36e42df28eb1a5394be87e6571fb48a upstream. The pthread struct is allocated on PRIVATE|ANONYMOUS memory [1] which can be targeted by the oom reaper. This mapping is used to store the futex robust list head; the kernel does not keep a copy of the robust list and instead references a userspace address to maintain the robustness during a process death. A race can occur between exit_mm and the oom reaper that allows the oom reaper to free the memory of the futex robust list before the exit path has handled the futex death: CPU1 CPU2 -------------------------------------------------------------------- page_fault do_exit "signal" wake_oom_reaper oom_reaper oom_reap_task_mm (invalidates mm) exit_mm exit_mm_release futex_exit_release futex_cleanup exit_robust_list get_user (EFAULT- can't access memory) If the get_user EFAULT's, the kernel will be unable to recover the waiters on the robust_list, leaving userspace mutexes hung indefinitely. Delay the OOM reaper, allowing more time for the exit path to perform the futex cleanup. Reproducer: https://gitlab.com/jsavitz/oom_futex_reproducer Based on a patch by Michal Hocko. Link: https://elixir.bootlin.com/glibc/glibc-2.35/source/nptl/allocatestack.c#L370 [1] Link: https://lkml.kernel.org/r/20220414144042.677008-1-npache@redhat.com Fixes: 212925802454 ("mm: oom: let oom_reap_task and exit_mmap run concurrently") Signed-off-by: Joel Savitz <jsavitz@redhat.com> Signed-off-by: Nico Pache <npache@redhat.com> Co-developed-by: Joel Savitz <jsavitz@redhat.com> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Rafael Aquini <aquini@redhat.com> Cc: Waiman Long <longman@redhat.com> Cc: Herton R. Krzesinski <herton@redhat.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ben Segall <bsegall@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joel Savitz <jsavitz@redhat.com> Cc: Darren Hart <dvhart@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-27mm, hugetlb: allow for "high" userspace addressesChristophe Leroy
commit 5f24d5a579d1eace79d505b148808a850b417d4c upstream. This is a fix for commit f6795053dac8 ("mm: mmap: Allow for "high" userspace addresses") for hugetlb. This patch adds support for "high" userspace addresses that are optionally supported on the system and have to be requested via a hint mechanism ("high" addr parameter to mmap). Architectures such as powerpc and x86 achieve this by making changes to their architectural versions of hugetlb_get_unmapped_area() function. However, arm64 uses the generic version of that function. So take into account arch_get_mmap_base() and arch_get_mmap_end() in hugetlb_get_unmapped_area(). To allow that, move those two macros out of mm/mmap.c into include/linux/sched/mm.h If these macros are not defined in architectural code then they default to (TASK_SIZE) and (base) so should not introduce any behavioural changes to architectures that do not define them. For the time being, only ARM64 is affected by this change. Catalin (ARM64) said "We should have fixed hugetlb_get_unmapped_area() as well when we added support for 52-bit VA. The reason for commit f6795053dac8 was to prevent normal mmap() from returning addresses above 48-bit by default as some user-space had hard assumptions about this. It's a slight ABI change if you do this for hugetlb_get_unmapped_area() but I doubt anyone would notice. It's more likely that the current behaviour would cause issues, so I'd rather have them consistent. Basically when arm64 gained support for 52-bit addresses we did not want user-space calling mmap() to suddenly get such high addresses, otherwise we could have inadvertently broken some programs (similar behaviour to x86 here). Hence we added commit f6795053dac8. But we missed hugetlbfs which could still get such high mmap() addresses. So in theory that's a potential regression that should have bee addressed at the same time as commit f6795053dac8 (and before arm64 enabled 52-bit addresses)" Link: https://lkml.kernel.org/r/ab847b6edb197bffdfe189e70fb4ac76bfe79e0d.1650033747.git.christophe.leroy@csgroup.eu Fixes: f6795053dac8 ("mm: mmap: Allow for "high" userspace addresses") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Steve Capper <steve.capper@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: <stable@vger.kernel.org> [5.0.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-27ipv6: make ip6_rt_gc_expire an atomic_tEric Dumazet
[ Upstream commit 9cb7c013420f98fa6fd12fc6a5dc055170c108db ] Reads and Writes to ip6_rt_gc_expire always have been racy, as syzbot reported lately [1] There is a possible risk of under-flow, leading to unexpected high value passed to fib6_run_gc(), although I have not observed this in the field. Hosts hitting ip6_dst_gc() very hard are under pretty bad state anyway. [1] BUG: KCSAN: data-race in ip6_dst_gc / ip6_dst_gc read-write to 0xffff888102110744 of 4 bytes by task 13165 on cpu 1: ip6_dst_gc+0x1f3/0x220 net/ipv6/route.c:3311 dst_alloc+0x9b/0x160 net/core/dst.c:86 ip6_dst_alloc net/ipv6/route.c:344 [inline] icmp6_dst_alloc+0xb2/0x360 net/ipv6/route.c:3261 mld_sendpack+0x2b9/0x580 net/ipv6/mcast.c:1807 mld_send_cr net/ipv6/mcast.c:2119 [inline] mld_ifc_work+0x576/0x800 net/ipv6/mcast.c:2651 process_one_work+0x3d3/0x720 kernel/workqueue.c:2289 worker_thread+0x618/0xa70 kernel/workqueue.c:2436 kthread+0x1a9/0x1e0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 read-write to 0xffff888102110744 of 4 bytes by task 11607 on cpu 0: ip6_dst_gc+0x1f3/0x220 net/ipv6/route.c:3311 dst_alloc+0x9b/0x160 net/core/dst.c:86 ip6_dst_alloc net/ipv6/route.c:344 [inline] icmp6_dst_alloc+0xb2/0x360 net/ipv6/route.c:3261 mld_sendpack+0x2b9/0x580 net/ipv6/mcast.c:1807 mld_send_cr net/ipv6/mcast.c:2119 [inline] mld_ifc_work+0x576/0x800 net/ipv6/mcast.c:2651 process_one_work+0x3d3/0x720 kernel/workqueue.c:2289 worker_thread+0x618/0xa70 kernel/workqueue.c:2436 kthread+0x1a9/0x1e0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 value changed: 0x00000bb3 -> 0x00000ba9 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 11607 Comm: kworker/0:21 Not tainted 5.18.0-rc1-syzkaller-00037-g42e7a03d3bad-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: mld mld_ifc_work Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20220413181333.649424-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-27esp: limit skb_page_frag_refill use to a single pageSabrina Dubroca
[ Upstream commit 5bd8baab087dff657e05387aee802e70304cc813 ] Commit ebe48d368e97 ("esp: Fix possible buffer overflow in ESP transformation") tried to fix skb_page_frag_refill usage in ESP by capping allocsize to 32k, but that doesn't completely solve the issue, as skb_page_frag_refill may return a single page. If that happens, we will write out of bounds, despite the check introduced in the previous patch. This patch forces COW in cases where we would end up calling skb_page_frag_refill with a size larger than a page (first in esp_output_head with tailen, then in esp_output_tail with skb->data_len). Fixes: cac2661c53f3 ("esp4: Avoid skb_cow_data whenever possible") Fixes: 03e2a30f6a27 ("esp6: Avoid skb_cow_data whenever possible") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-27etherdevice: Adjust ether_addr* prototypes to silence -Wstringop-overeadKees Cook
commit 2618a0dae09ef37728dab89ff60418cbe25ae6bd upstream. With GCC 12, -Wstringop-overread was warning about an implicit cast from char[6] to char[8]. However, the extra 2 bytes are always thrown away, alignment doesn't matter, and the risk of hitting the edge of unallocated memory has been accepted, so this prototype can just be converted to a regular char *. Silences: net/core/dev.c: In function ‘bpf_prog_run_generic_xdp’: net/core/dev.c:4618:21: warning: ‘ether_addr_equal_64bits’ reading 8 bytes from a region of size 6 [-Wstringop-overread] 4618 | orig_host = ether_addr_equal_64bits(eth->h_dest, > skb->dev->dev_addr); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ net/core/dev.c:4618:21: note: referencing argument 1 of type ‘const u8[8]’ {aka ‘const unsigned char[8]’} net/core/dev.c:4618:21: note: referencing argument 2 of type ‘const u8[8]’ {aka ‘const unsigned char[8]’} In file included from net/core/dev.c:91: include/linux/etherdevice.h:375:20: note: in a call to function ‘ether_addr_equal_64bits’ 375 | static inline bool ether_addr_equal_64bits(const u8 addr1[6+2], | ^~~~~~~~~~~~~~~~~~~~~~~ Reported-by: Marc Kleine-Budde <mkl@pengutronix.de> Tested-by: Marc Kleine-Budde <mkl@pengutronix.de> Link: https://lore.kernel.org/netdev/20220212090811.uuzk6d76agw2vv73@pengutronix.de Cc: Jakub Kicinski <kuba@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Khem Raj <raj.khem@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20Merge tag 'v5.10.112' into v5.10/standard/baseBruce Ashfield
This is the 5.10.112 stable release # gpg: Signature made Wed 20 Apr 2022 03:23:39 AM EDT # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2022-04-20Merge tag 'v5.10.111' into v5.10/standard/baseBruce Ashfield
This is the 5.10.111 stable release # gpg: Signature made Wed 13 Apr 2022 03:01:25 PM EDT # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2022-04-20ax25: fix reference count leaks of ax25_devDuoming Zhou
commit 87563a043cef044fed5db7967a75741cc16ad2b1 upstream. The previous commit d01ffb9eee4a ("ax25: add refcount in ax25_dev to avoid UAF bugs") introduces refcount into ax25_dev, but there are reference leak paths in ax25_ctl_ioctl(), ax25_fwd_ioctl(), ax25_rt_add(), ax25_rt_del() and ax25_rt_opt(). This patch uses ax25_dev_put() and adjusts the position of ax25_addr_ax25dev() to fix reference cout leaks of ax25_dev. Fixes: d01ffb9eee4a ("ax25: add refcount in ax25_dev to avoid UAF bugs") Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/20220203150811.42256-1-duoming@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org> [OP: backport to 5.10: adjust context] Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20ax25: add refcount in ax25_dev to avoid UAF bugsDuoming Zhou
commit d01ffb9eee4af165d83b08dd73ebdf9fe94a519b upstream. If we dereference ax25_dev after we call kfree(ax25_dev) in ax25_dev_device_down(), it will lead to concurrency UAF bugs. There are eight syscall functions suffer from UAF bugs, include ax25_bind(), ax25_release(), ax25_connect(), ax25_ioctl(), ax25_getname(), ax25_sendmsg(), ax25_getsockopt() and ax25_info_show(). One of the concurrency UAF can be shown as below: (USE) | (FREE) | ax25_device_event | ax25_dev_device_down ax25_bind | ... ... | kfree(ax25_dev) ax25_fillin_cb() | ... ax25_fillin_cb_from_dev() | ... | The root cause of UAF bugs is that kfree(ax25_dev) in ax25_dev_device_down() is not protected by any locks. When ax25_dev, which there are still pointers point to, is released, the concurrency UAF bug will happen. This patch introduces refcount into ax25_dev in order to guarantee that there are no pointers point to it when ax25_dev is released. Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net> [OP: backport to 5.10: adjusted context] Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20tlb: hugetlb: Add more sizes to tlb_remove_huge_tlb_entrySteve Capper
[ Upstream commit 697a1d44af8ba0477ee729e632f4ade37999249a ] tlb_remove_huge_tlb_entry only considers PMD_SIZE and PUD_SIZE when updating the mmu_gather structure. Unfortunately on arm64 there are two additional huge page sizes that need to be covered: CONT_PTE_SIZE and CONT_PMD_SIZE. Where an end-user attempts to employ contiguous huge pages, a VM_BUG_ON can be experienced due to the fact that the tlb structure hasn't been correctly updated by the relevant tlb_flush_p.._range() call from tlb_remove_huge_tlb_entry. This patch adds inequality logic to the generic implementation of tlb_remove_huge_tlb_entry s.t. CONT_PTE_SIZE and CONT_PMD_SIZE are effectively covered on arm64. Also, as well as ptes, pmds and puds; p4ds are now considered too. Reported-by: David Hildenbrand <david@redhat.com> Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/linux-mm/811c5c8e-b3a2-85d2-049c-717f17c3a03a@redhat.com/ Signed-off-by: Steve Capper <steve.capper@arm.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220330112543.863-1-steve.capper@arm.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-20scsi: iscsi: Fix conn cleanup and stop race during iscsid restartMike Christie
[ Upstream commit 7c6e99c18167ed89729bf167ccb4a7e3ab3115ba ] If iscsid is doing a stop_conn at the same time the kernel is starting error recovery we can hit a race that allows the cleanup work to run on a valid connection. In the race, iscsi_if_stop_conn sees the cleanup bit set, but it calls flush_work on the clean_work before iscsi_conn_error_event has queued it. The flush then returns before the queueing and so the cleanup_work can run later and disconnect/stop a conn while it's in a connected state. The patch: Commit 0ab710458da1 ("scsi: iscsi: Perform connection failure entirely in kernel space") added the late stop_conn call bug originally, and the patch: Commit 23d6fefbb3f6 ("scsi: iscsi: Fix in-kernel conn failure handling") attempted to fix it but only fixed the normal EH case and left the above race for the iscsid restart case. For the normal EH case we don't hit the race because we only signal userspace to start recovery after we have done the queueing, so the flush will always catch the queued work or see it completed. For iscsid restart cases like boot, we can hit the race because iscsid will call down to the kernel before the kernel has signaled any error, so both code paths can be running at the same time. This adds a lock around the setting of the cleanup bit and queueing so they happen together. Link: https://lore.kernel.org/r/20220408001314.5014-6-michael.christie@oracle.com Fixes: 0ab710458da1 ("scsi: iscsi: Perform connection failure entirely in kernel space") Tested-by: Manish Rangankar <mrangankar@marvell.com> Reviewed-by: Lee Duncan <lduncan@suse.com> Reviewed-by: Chris Leech <cleech@redhat.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-20scsi: iscsi: Fix in-kernel conn failure handlingMike Christie
[ Upstream commit 23d6fefbb3f6b1cc29794427588b470ed06ff64e ] Commit 0ab710458da1 ("scsi: iscsi: Perform connection failure entirely in kernel space") has the following regressions/bugs that this patch fixes: 1. It can return cmds to upper layers like dm-multipath where that can retry them. After they are successful the fs/app can send new I/O to the same sectors, but we've left the cmds running in FW or in the net layer. We need to be calling ep_disconnect if userspace is not up. This patch only fixes the issue for offload drivers. iscsi_tcp will be fixed in separate commit because it doesn't have a ep_disconnect call. 2. The drivers that implement ep_disconnect expect that it's called before conn_stop. Besides crashes, if the cleanup_task callout is called before ep_disconnect it might free up driver/card resources for session1 then they could be allocated for session2. But because the driver's ep_disconnect is not called it has not cleaned up the firmware so the card is still using the resources for the original cmd. 3. The stop_conn_work_fn can run after userspace has done its recovery and we are happily using the session. We will then end up with various bugs depending on what is going on at the time. We may also run stop_conn_work_fn late after userspace has called stop_conn and ep_disconnect and is now going to call start/bind conn. If stop_conn_work_fn runs after bind but before start, we would leave the conn in a unbound but sort of started state where IO might be allowed even though the drivers have been set in a state where they no longer expect I/O. 4. Returning -EAGAIN in iscsi_if_destroy_conn if we haven't yet run the in kernel stop_conn function is breaking userspace. We should have been doing this for the caller. Link: https://lore.kernel.org/r/20210525181821.7617-8-michael.christie@oracle.com Fixes: 0ab710458da1 ("scsi: iscsi: Perform connection failure entirely in kernel space") Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-20scsi: iscsi: Rel ref after iscsi_lookup_endpoint()Mike Christie
[ Upstream commit 9e5fe1700896c85040943fdc0d3fee0dd3e0d36f ] Subsequent commits allow the kernel to do ep_disconnect. In that case we will have to get a proper refcount on the ep so one thread does not delete it from under another. Link: https://lore.kernel.org/r/20210525181821.7617-7-michael.christie@oracle.com Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-20scsi: iscsi: Stop queueing during ep_disconnectMike Christie
[ Upstream commit 891e2639deae721dc43764a44fa255890dc34313 ] During ep_disconnect we have been doing iscsi_suspend_tx/queue to block new I/O but every driver except cxgbi and iscsi_tcp can still get I/O from __iscsi_conn_send_pdu() if we haven't called iscsi_conn_failure() before ep_disconnect. This could happen if we were terminating the session, and the logout timed out before it was even sent to libiscsi. Fix the issue by adding a helper which reverses the bind_conn call that allows new I/O to be queued. Drivers implementing ep_disconnect can use this to make sure new I/O is not queued to them when handling the disconnect. Link: https://lore.kernel.org/r/20210525181821.7617-3-michael.christie@oracle.com Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-20net/sched: flower: fix parsing of ethertype following VLAN headerVlad Buslov
[ Upstream commit 2105f700b53c24aa48b65c15652acc386044d26a ] A tc flower filter matching TCA_FLOWER_KEY_VLAN_ETH_TYPE is expected to match the L2 ethertype following the first VLAN header, as confirmed by linked discussion with the maintainer. However, such rule also matches packets that have additional second VLAN header, even though filter has both eth_type and vlan_ethtype set to "ipv4". Looking at the code this seems to be mostly an artifact of the way flower uses flow dissector. First, even though looking at the uAPI eth_type and vlan_ethtype appear like a distinct fields, in flower they are all mapped to the same key->basic.n_proto. Second, flow dissector skips following VLAN header as no keys for FLOW_DISSECTOR_KEY_CVLAN are set and eventually assigns the value of n_proto to last parsed header. With these, such filters ignore any headers present between first VLAN header and first "non magic" header (ipv4 in this case) that doesn't result FLOW_DISSECT_RET_PROTO_AGAIN. Fix the issue by extending flow dissector VLAN key structure with new 'vlan_eth_type' field that matches first ethertype following previously parsed VLAN header. Modify flower classifier to set the new flow_dissector_key_vlan->vlan_eth_type with value obtained from TCA_FLOWER_KEY_VLAN_ETH_TYPE/TCA_FLOWER_KEY_CVLAN_ETH_TYPE uAPIs. Link: https://lore.kernel.org/all/Yjhgi48BpTGh6dig@nanopsycho/ Fixes: 9399ae9a6cb2 ("net_sched: flower: Add vlan support") Fixes: d64efd0926ba ("net/sched: flower: Add supprt for matching on QinQ vlan headers") Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-20SUNRPC: Fix the svc_deferred_event trace classChuck Lever
[ Upstream commit 4d5004451ab2218eab94a30e1841462c9316ba19 ] Fix a NULL deref crash that occurs when an svc_rqst is deferred while the sunrpc tracing subsystem is enabled. svc_revisit() sets dr->xprt to NULL, so it can't be relied upon in the tracepoint to provide the remote's address. Unfortunately we can't revert the "svc_deferred_class" hunk in commit ece200ddd54b ("sunrpc: Save remote presentation address in svc_xprt for trace events") because there is now a specific check of event format specifiers for unsafe dereferences. The warning that check emits is: event svc_defer_recv has unsafe dereference of argument 1 A "%pISpc" format specifier with a "struct sockaddr *" is indeed flagged by this check. Instead, take the brute-force approach used by the svcrdma_qp_error tracepoint. Convert the dr::addr field into a presentation address in the TP_fast_assign() arm of the trace event, and store that as a string. This fix can be backported to -stable kernels. In the meantime, commit c6ced22997ad ("tracing: Update print fmt check to handle new __get_sockaddr() macro") is now in v5.18, so this wonky fix can be replaced with __sockaddr() and friends properly during the v5.19 merge window. Fixes: ece200ddd54b ("sunrpc: Save remote presentation address in svc_xprt for trace events") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-13mm/sparsemem: fix 'mem_section' will never be NULL gcc 12 warningWaiman Long
commit a431dbbc540532b7465eae4fc8b56a85a9fc7d17 upstream. The gcc 12 compiler reports a "'mem_section' will never be NULL" warning on the following code: static inline struct mem_section *__nr_to_section(unsigned long nr) { #ifdef CONFIG_SPARSEMEM_EXTREME if (!mem_section) return NULL; #endif if (!mem_section[SECTION_NR_TO_ROOT(nr)]) return NULL; : It happens with CONFIG_SPARSEMEM_EXTREME off. The mem_section definition is #ifdef CONFIG_SPARSEMEM_EXTREME extern struct mem_section **mem_section; #else extern struct mem_section mem_section[NR_SECTION_ROOTS][SECTIONS_PER_ROOT]; #endif In the !CONFIG_SPARSEMEM_EXTREME case, mem_section is a static 2-dimensional array and so the check "!mem_section[SECTION_NR_TO_ROOT(nr)]" doesn't make sense. Fix this warning by moving the "!mem_section[SECTION_NR_TO_ROOT(nr)]" check up inside the CONFIG_SPARSEMEM_EXTREME block and adding an explicit NR_SECTION_ROOTS check to make sure that there is no out-of-bound array access. Link: https://lkml.kernel.org/r/20220331180246.2746210-1-longman@redhat.com Fixes: 3e347261a80b ("sparsemem extreme implementation") Signed-off-by: Waiman Long <longman@redhat.com> Reported-by: Justin Forbes <jforbes@redhat.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Rafael Aquini <aquini@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-13gpio: Restrict usage of GPIO chip irq members before initializationShreeya Patel
commit 5467801f1fcbdc46bc7298a84dbf3ca1ff2a7320 upstream. GPIO chip irq members are exposed before they could be completely initialized and this leads to race conditions. One such issue was observed for the gc->irq.domain variable which was accessed through the I2C interface in gpiochip_to_irq() before it could be initialized by gpiochip_add_irqchip(). This resulted in Kernel NULL pointer dereference. Following are the logs for reference :- kernel: Call Trace: kernel: gpiod_to_irq+0x53/0x70 kernel: acpi_dev_gpio_irq_get_by+0x113/0x1f0 kernel: i2c_acpi_get_irq+0xc0/0xd0 kernel: i2c_device_probe+0x28a/0x2a0 kernel: really_probe+0xf2/0x460 kernel: RIP: 0010:gpiochip_to_irq+0x47/0xc0 To avoid such scenarios, restrict usage of GPIO chip irq members before they are completely initialized. Signed-off-by: Shreeya Patel <shreeya.patel@collabora.com> Cc: stable@vger.kernel.org Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-13NFS: nfsiod should not block forever in mempool_alloc()Trond Myklebust
[ Upstream commit 515dcdcd48736576c6f5c197814da6f81c60a21e ] The concern is that since nfsiod is sometimes required to kick off a commit, it can get locked up waiting forever in mempool_alloc() instead of failing gracefully and leaving the commit until later. Try to allocate from the slab first, with GFP_KERNEL | __GFP_NORETRY, then fall back to a non-blocking attempt to allocate from the memory pool. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-13NFS: swap IO handling is slightly different for O_DIRECT IONeilBrown
[ Upstream commit 64158668ac8b31626a8ce48db4cad08496eb8340 ] 1/ Taking the i_rwsem for swap IO triggers lockdep warnings regarding possible deadlocks with "fs_reclaim". These deadlocks could, I believe, eventuate if a buffered read on the swapfile was attempted. We don't need coherence with the page cache for a swap file, and buffered writes are forbidden anyway. There is no other need for i_rwsem during direct IO. So never take it for swap_rw() 2/ generic_write_checks() explicitly forbids writes to swap, and performs checks that are not needed for swap. So bypass it for swap_rw(). Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-13can: isotp: set default value for N_As to 50 micro secondsOliver Hartkopp
[ Upstream commit 530e0d46c61314c59ecfdb8d3bcb87edbc0f85d3 ] The N_As value describes the time a CAN frame needs on the wire when transmitted by the CAN controller. Even very short CAN FD frames need arround 100 usecs (bitrate 1Mbit/s, data bitrate 8Mbit/s). Having N_As to be zero (the former default) leads to 'no CAN frame separation' when STmin is set to zero by the receiving node. This 'burst mode' should not be enabled by default as it could potentially dump a high number of CAN frames into the netdev queue from the soft hrtimer context. This does not affect the system stability but is just not nice and cooperative. With this N_As/frame_txtime value the 'burst mode' is disabled by default. As user space applications usually do not set the frame_txtime element of struct can_isotp_options the new in-kernel default is very likely overwritten with zero when the sockopt() CAN_ISOTP_OPTS is invoked. To make sure that a N_As value of zero is only set intentional the value '0' is now interpreted as 'do not change the current value'. When a frame_txtime of zero is required for testing purposes this CAN_ISOTP_FRAME_TXTIME_ZERO u32 value has to be set in frame_txtime. Link: https://lore.kernel.org/all/20220309120416.83514-2-socketcan@hartkopp.net Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-13Bluetooth: Fix not checking for valid hdev on bt_dev_{info,warn,err,dbg}Luiz Augusto von Dentz
[ Upstream commit 9b392e0e0b6d026da5a62bb79a08f32e27af858e ] This fixes attemting to print hdev->name directly which causes them to print an error: kernel: read_version:367: (efault): sock 000000006a3008f2 Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-13ipv4: Invalidate neighbour for broadcast address upon address additionIdo Schimmel
[ Upstream commit 0c51e12e218f20b7d976158fdc18019627326f7a ] In case user space sends a packet destined to a broadcast address when a matching broadcast route is not configured, the kernel will create a unicast neighbour entry that will never be resolved [1]. When the broadcast route is configured, the unicast neighbour entry will not be invalidated and continue to linger, resulting in packets being dropped. Solve this by invalidating unresolved neighbour entries for broadcast addresses after routes for these addresses are internally configured by the kernel. This allows the kernel to create a broadcast neighbour entry following the next route lookup. Another possible solution that is more generic but also more complex is to have the ARP code register a listener to the FIB notification chain and invalidate matching neighbour entries upon the addition of broadcast routes. It is also possible to wave off the issue as a user space problem, but it seems a bit excessive to expect user space to be that intimately familiar with the inner workings of the FIB/neighbour kernel code. [1] https://lore.kernel.org/netdev/55a04a8f-56f3-f73c-2aea-2195923f09d1@huawei.com/ Reported-by: Wang Hai <wanghai38@huawei.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Wang Hai <wanghai38@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-13ipv6: make mc_forwarding atomicEric Dumazet
[ Upstream commit 145c7a793838add5e004e7d49a67654dc7eba147 ] This fixes minor data-races in ip6_mc_input() and batadv_mcast_mla_rtr_flags_softif_get_ipv6() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-13bpf: Make dst_port field in struct bpf_sock 16-bit wideJakub Sitnicki
[ Upstream commit 4421a582718ab81608d8486734c18083b822390d ] Menglong Dong reports that the documentation for the dst_port field in struct bpf_sock is inaccurate and confusing. From the BPF program PoV, the field is a zero-padded 16-bit integer in network byte order. The value appears to the BPF user as if laid out in memory as so: offsetof(struct bpf_sock, dst_port) + 0 <port MSB> + 8 <port LSB> +16 0x00 +24 0x00 32-, 16-, and 8-bit wide loads from the field are all allowed, but only if the offset into the field is 0. 32-bit wide loads from dst_port are especially confusing. The loaded value, after converting to host byte order with bpf_ntohl(dst_port), contains the port number in the upper 16-bits. Remove the confusion by splitting the field into two 16-bit fields. For backward compatibility, allow 32-bit wide loads from offsetof(struct bpf_sock, dst_port). While at it, allow loads 8-bit loads at offset [0] and [1] from dst_port. Reported-by: Menglong Dong <imagedong@tencent.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/r/20220130115518.213259-2-jakub@cloudflare.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-11Merge tag 'v5.10.110' into v5.10/standard/baseBruce Ashfield
This is the 5.10.110 stable release # gpg: Signature made Fri 08 Apr 2022 08:41:16 AM EDT # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2022-04-08coredump: Use the vma snapshot in fill_files_noteEric W. Biederman
commit 390031c942116d4733310f0684beb8db19885fe6 upstream. Matthew Wilcox reported that there is a missing mmap_lock in file_files_note that could possibly lead to a user after free. Solve this by using the existing vma snapshot for consistency and to avoid the need to take the mmap_lock anywhere in the coredump code except for dump_vma_snapshot. Update the dump_vma_snapshot to capture vm_pgoff and vm_file that are neeeded by fill_files_note. Add free_vma_snapshot to free the captured values of vm_file. Reported-by: Matthew Wilcox <willy@infradead.org> Link: https://lkml.kernel.org/r/20220131153740.2396974-1-willy@infradead.org Cc: stable@vger.kernel.org Fixes: a07279c9a8cd ("binfmt_elf, binfmt_elf_fdpic: use a VMA list snapshot") Fixes: 2aa362c49c31 ("coredump: extend core dump note section to contain file names of mapped files") Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-08coredump: Snapshot the vmas in do_coredumpEric W. Biederman
commit 95c5436a4883841588dae86fb0b9325f47ba5ad3 upstream. Move the call of dump_vma_snapshot and kvfree(vma_meta) out of the individual coredump routines into do_coredump itself. This makes the code less error prone and easier to maintain. Make the vma snapshot available to the coredump routines in struct coredump_params. This makes it easier to change and update what is captures in the vma snapshot and will be needed for fixing fill_file_notes. Reviewed-by: Jann Horn <jannh@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-08bpf: Fix comment for helper bpf_current_task_under_cgroup()Hengqi Chen
commit 58617014405ad5c9f94f464444f4972dabb71ca7 upstream. Fix the descriptions of the return values of helper bpf_current_task_under_cgroup(). Fixes: c6b5fb8690fa ("bpf: add documentation for eBPF helpers (42-50)") Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20220310155335.1278783-1-hengqi.chen@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-08bpf: Adjust BPF stack helper functions to accommodate skip > 0Namhyung Kim
commit ee2a098851bfbe8bcdd964c0121f4246f00ff41e upstream. Let's say that the caller has storage for num_elem stack frames. Then, the BPF stack helper functions walk the stack for only num_elem frames. This means that if skip > 0, one keeps only 'num_elem - skip' frames. This is because it sets init_nr in the perf_callchain_entry to the end of the buffer to save num_elem entries only. I believe it was because the perf callchain code unwound the stack frames until it reached the global max size (sysctl_perf_event_max_stack). However it now has perf_callchain_entry_ctx.max_stack to limit the iteration locally. This simplifies the code to handle init_nr in the BPF callstack entries and removes the confusion with the perf_event's __PERF_SAMPLE_CALLCHAIN_EARLY which sets init_nr to 0. Also change the comment on bpf_get_stack() in the header file to be more explicit what the return value means. Fixes: c195651e565a ("bpf: add bpf_get_stack helper") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/30a7b5d5-6726-1cc2-eaee-8da2828a9a9c@oracle.com Link: https://lore.kernel.org/bpf/20220314182042.71025-1-namhyung@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Based-on-patch-by: Eugene Loh <eugene.loh@oracle.com>
2022-04-08rxrpc: Fix call timer start racing with call destructionDavid Howells
commit 4a7f62f91933c8ae5308f9127fd8ea48188b6bc3 upstream. The rxrpc_call struct has a timer used to handle various timed events relating to a call. This timer can get started from the packet input routines that are run in softirq mode with just the RCU read lock held. Unfortunately, because only the RCU read lock is held - and neither ref or other lock is taken - the call can start getting destroyed at the same time a packet comes in addressed to that call. This causes the timer - which was already stopped - to get restarted. Later, the timer dispatch code may then oops if the timer got deallocated first. Fix this by trying to take a ref on the rxrpc_call struct and, if successful, passing that ref along to the timer. If the timer was already running, the ref is discarded. The timer completion routine can then pass the ref along to the call's work item when it queues it. If the timer or work item where already queued/running, the extra ref is discarded. Fixes: a158bdd3247b ("rxrpc: Fix call timeouts") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> Tested-by: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: http://lists.infradead.org/pipermail/linux-afs/2022-March/005073.html Link: https://lore.kernel.org/r/164865115696.2943015.11097991776647323586.stgit@warthog.procyon.org.uk Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-08serial: 8250: fix XOFF/XON sending when DMA is usedIlpo Järvinen
[ Upstream commit f58c252e30cf74f68b0054293adc03b5923b9f0e ] When 8250 UART is using DMA, x_char (XON/XOFF) is never sent to the wire. After this change, x_char is injected correctly. Create uart_xchar_out() helper for sending the x_char out and accounting related to it. It seems that almost every driver does these same steps with x_char. Except for 8250, however, almost all currently lack .serial_out so they cannot immediately take advantage of this new helper. The downside of this patch is that it might reintroduce the problems some devices faced with mixed DMA/non-DMA transfer which caused revert f967fc8f165f (Revert "serial: 8250_dma: don't bother DMA with small transfers"). However, the impact should be limited to cases with XON/XOFF (that didn't work with DMA capable devices to begin with so this problem is not very likely to cause a major issue, if any at all). Fixes: 9ee4b83e51f74 ("serial: 8250: Add support for dmaengine") Reported-by: Gilles Buloz <gilles.buloz@kontron.com> Tested-by: Gilles Buloz <gilles.buloz@kontron.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/r/20220314091432.4288-2-ilpo.jarvinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-08PCI: Reduce warnings on possible RW1C corruptionMark Tomlinson
[ Upstream commit 92c45b63ce22c8898aa41806e8d6692bcd577510 ] For hardware that only supports 32-bit writes to PCI there is the possibility of clearing RW1C (write-one-to-clear) bits. A rate-limited messages was introduced by fb2659230120, but rate-limiting is not the best choice here. Some devices may not show the warnings they should if another device has just produced a bunch of warnings. Also, the number of messages can be a nuisance on devices which are otherwise working fine. Change the ratelimit to a single warning per bus. This ensures no bus is 'starved' of emitting a warning and also that there isn't a continuous stream of warnings. It would be preferable to have a warning per device, but the pci_dev structure is not available here, and a lookup from devfn would be far too slow. Suggested-by: Bjorn Helgaas <helgaas@kernel.org> Fixes: fb2659230120 ("PCI: Warn on possible RW1C corruption for sub-32 bit config writes") Link: https://lore.kernel.org/r/20200806041455.11070-1-mark.tomlinson@alliedtelesis.co.nz Signed-off-by: Mark Tomlinson <mark.tomlinson@alliedtelesis.co.nz> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Rob Herring <robh@kernel.org> Acked-by: Scott Branden <scott.branden@broadcom.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-08firmware: ti_sci: Fix compilation failure when CONFIG_TI_SCI_PROTOCOL is not ↵Christophe JAILLET
defined [ Upstream commit 043cfff99a18933fda2fb2e163daee73cc07910b ] Remove an extra ";" which breaks compilation. Fixes: 53bf2b0e4e4c ("firmware: ti_sci: Add support for getting resource with subtype") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Nishanth Menon <nm@ti.com> Link: https://lore.kernel.org/r/e6c3cb793e1a6a2a0ae2528d5a5650dfe6a4b6ff.1640276505.git.christophe.jaillet@wanadoo.fr Signed-off-by: Sasha Levin <sashal@kernel.org>