summaryrefslogtreecommitdiffstats
path: root/include
AgeCommit message (Collapse)Author
2020-12-12Merge tag 'for-linus-5.10c-rc8-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: "A short series fixing a regression introduced in 5.9 for running as Xen dom0 on a system with NVMe backed storage" * tag 'for-linus-5.10c-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: don't use page->lru for ZONE_DEVICE memory xen: add helpers for caching grant mapping pages
2020-12-11bpf: Fix enum names for bpf_this_cpu_ptr() and bpf_per_cpu_ptr() helpersAndrii Nakryiko
Remove bpf_ prefix, which causes these helpers to be reported in verifier dump as bpf_bpf_this_cpu_ptr() and bpf_bpf_per_cpu_ptr(), respectively. Lets fix it as long as it is still possible before UAPI freezes on these helpers. Fixes: eaa6bcb71ef6 ("bpf: Introduce bpf_per_cpu_ptr()") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-11elfcore: fix building with clangArnd Bergmann
kernel/elfcore.c only contains weak symbols, which triggers a bug with clang in combination with recordmcount: Cannot find symbol for section 2: .text. kernel/elfcore.o: failed Move the empty stubs into linux/elfcore.h as inline functions. As only two architectures use these, just use the architecture specific Kconfig symbols to key off the declaration. Link: https://lkml.kernel.org/r/20201204165742.3815221-2-arnd@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Nathan Chancellor <natechancellor@gmail.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Barret Rhoden <brho@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-11kbuild: avoid static_assert for genksymsArnd Bergmann
genksyms does not know or care about the _Static_assert() built-in, and sometimes falls back to ignoring the later symbols, which causes undefined behavior such as WARNING: modpost: EXPORT symbol "ethtool_set_ethtool_phy_ops" [vmlinux] version generation failed, symbol will not be versioned. ld: net/ethtool/common.o: relocation R_AARCH64_ABS32 against `__crc_ethtool_set_ethtool_phy_ops' can not be used when making a shared object net/ethtool/common.o:(_ftrace_annotated_branch+0x0): dangerous relocation: unsupported relocation Redefine static_assert for genksyms to avoid that. Link: https://lkml.kernel.org/r/20201203230955.1482058-1-arnd@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Suggested-by: Ard Biesheuvel <ardb@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Michal Marek <michal.lkml@markovi.net> Cc: Kees Cook <keescook@chromium.org> Cc: Rikard Falkeborn <rikard.falkeborn@gmail.com> Cc: Marco Elver <elver@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-10Merge tag 'fixes-v5.10a' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull namespaced fscaps fix from James Morris: "Fix namespaced fscaps when !CONFIG_SECURITY (Serge Hallyn)" * tag 'fixes-v5.10a' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: [SECURITY] fix namespaced fscaps when !CONFIG_SECURITY
2020-12-10Merge tag 'nfs-for-5.10-3' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client fixes from Anna Schumaker: "Here are a handful more bugfixes for 5.10. Unfortunately, we found some problems with the new READ_PLUS operation that aren't easy to fix. We've decided to disable this codepath through a Kconfig option for now, but a series of patches going into 5.11 will clean up the code and fix the issues at the same time. This seemed like the best way to go about it. Summary: - Fix array overflow when flexfiles mirroring is enabled - Fix rpcrdma_inline_fixup() crash with new LISTXATTRS - Fix 5 second delay when doing inter-server copy - Disable READ_PLUS by default" * tag 'nfs-for-5.10-3' of git://git.linux-nfs.org/projects/anna/linux-nfs: NFS: Disable READ_PLUS by default NFSv4.2: Fix 5 seconds delay when doing inter server copy NFS: Fix rpcrdma_inline_fixup() crash with new LISTXATTRS operation pNFS/flexfiles: Fix array overflow when flexfiles mirroring is enabled
2020-12-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from David Miller: 1) IPsec compat fixes, from Dmitry Safonov. 2) Fix memory leak in xfrm_user_policy(). Fix from Yu Kuai. 3) Fix polling in xsk sockets by using sk_poll_wait() instead of datagram_poll() which keys off of sk_wmem_alloc and such which xsk sockets do not update. From Xuan Zhuo. 4) Missing init of rekey_data in cfgh80211, from Sara Sharon. 5) Fix destroy of timer before init, from Davide Caratti. 6) Missing CRYPTO_CRC32 selects in ethernet driver Kconfigs, from Arnd Bergmann. 7) Missing error return in rtm_to_fib_config() switch case, from Zhang Changzhong. 8) Fix some src/dest address handling in vrf and add a testcase. From Stephen Suryaputra. 9) Fix multicast handling in Seville switches driven by mscc-ocelot driver. From Vladimir Oltean. 10) Fix proto value passed to skb delivery demux in udp, from Xin Long. 11) HW pkt counters not reported correctly in enetc driver, from Claudiu Manoil. 12) Fix deadlock in bridge, from Joseph Huang. 13) Missing of_node_pur() in dpaa2 driver, fromn Christophe JAILLET. 14) Fix pid fetching in bpftool when there are a lot of results, from Andrii Nakryiko. 15) Fix long timeouts in nft_dynset, from Pablo Neira Ayuso. 16) Various stymmac fixes, from Fugang Duan. 17) Fix null deref in tipc, from Cengiz Can. 18) When mss is biog, coose more resonable rcvq_space in tcp, fromn Eric Dumazet. 19) Revert a geneve change that likely isnt necessary, from Jakub Kicinski. 20) Avoid premature rx buffer reuse in various Intel driversm from Björn Töpel. 21) retain EcT bits during TIS reflection in tcp, from Wei Wang. 22) Fix Tso deferral wrt. cwnd limiting in tcp, from Neal Cardwell. 23) MPLS_OPT_LSE_LABEL attribute is 342 ot 8 bits, from Guillaume Nault 24) Fix propagation of 32-bit signed bounds in bpf verifier and add test cases, from Alexei Starovoitov. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (81 commits) selftests: fix poll error in udpgro.sh selftests/bpf: Fix "dubious pointer arithmetic" test selftests/bpf: Fix array access with signed variable test selftests/bpf: Add test for signed 32-bit bound check bug bpf: Fix propagation of 32-bit signed bounds from 64-bit bounds. MAINTAINERS: Add entry for Marvell Prestera Ethernet Switch driver net: sched: Fix dump of MPLS_OPT_LSE_LABEL attribute in cls_flower net/mlx4_en: Handle TX error CQE net/mlx4_en: Avoid scheduling restart task if it is already running tcp: fix cwnd-limited bug for TSO deferral where we send nothing net: flow_offload: Fix memory leak for indirect flow block tcp: Retain ECT bits for tos reflection ethtool: fix stack overflow in ethnl_parse_bitset() e1000e: fix S0ix flow to allow S0i3.2 subset entry ice: avoid premature Rx buffer reuse ixgbe: avoid premature Rx buffer reuse i40e: avoid premature Rx buffer reuse igb: avoid transmit queue timeout in xdp path igb: use xdp_do_flush igb: skb add metasize for xdp ...
2020-12-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Alexei Starovoitov says: ==================== pull-request: bpf 2020-12-10 The following pull-request contains BPF updates for your *net* tree. We've added 21 non-merge commits during the last 12 day(s) which contain a total of 21 files changed, 163 insertions(+), 88 deletions(-). The main changes are: 1) Fix propagation of 32-bit signed bounds from 64-bit bounds, from Alexei. 2) Fix ring_buffer__poll() return value, from Andrii. 3) Fix race in lwt_bpf, from Cong. 4) Fix test_offload, from Toke. 5) Various xsk fixes. Please consider pulling these changes from: git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git Thanks a lot! Also thanks to reporters, reviewers and testers of commits in this pull-request: Cong Wang, Hulk Robot, Jakub Kicinski, Jean-Philippe Brucker, John Fastabend, Magnus Karlsson, Maxim Mikityanskiy, Yonghong Song ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Switch to RCU in x_tables to fix possible NULL pointer dereference, from Subash Abhinov Kasiviswanathan. 2) Fix netlink dump of dynset timeouts later than 23 days. 3) Add comment for the indirect serialization of the nft commit mutex with rtnl_mutex. 4) Remove bogus check for confirmed conntrack when matching on the conntrack ID, from Brett Mastbergen. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-09xdp: Remove the xdp_attachment_flags_ok() callbackToke Høiland-Jørgensen
Since commit 7f0a838254bd ("bpf, xdp: Maintain info on attached XDP BPF programs in net_device"), the XDP program attachment info is now maintained in the core code. This interacts badly with the xdp_attachment_flags_ok() check that prevents unloading an XDP program with different load flags than it was loaded with. In practice, two kinds of failures are seen: - An XDP program loaded without specifying a mode (and which then ends up in driver mode) cannot be unloaded if the program mode is specified on unload. - The dev_xdp_uninstall() hook always calls the driver callback with the mode set to the type of the program but an empty flags argument, which means the flags_ok() check prevents the program from being removed, leading to bpf prog reference leaks. The original reason this check was added was to avoid ambiguity when multiple programs were loaded. With the way the checks are done in the core now, this is quite simple to enforce in the core code, so let's add a check there and get rid of the xdp_attachment_flags_ok() callback entirely. Fixes: 7f0a838254bd ("bpf, xdp: Maintain info on attached XDP BPF programs in net_device") Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/bpf/160752225751.110217.10267659521308669050.stgit@toke.dk
2020-12-09xen: don't use page->lru for ZONE_DEVICE memoryJuergen Gross
Commit 9e2369c06c8a18 ("xen: add helpers to allocate unpopulated memory") introduced usage of ZONE_DEVICE memory for foreign memory mappings. Unfortunately this collides with using page->lru for Xen backend private page caches. Fix that by using page->zone_device_data instead. Cc: <stable@vger.kernel.org> # 5.9 Fixes: 9e2369c06c8a18 ("xen: add helpers to allocate unpopulated memory") Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovksy@oracle.com> Reviewed-by: Jason Andryuk <jandryuk@gmail.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2020-12-09xen: add helpers for caching grant mapping pagesJuergen Gross
Instead of having similar helpers in multiple backend drivers use common helpers for caching pages allocated via gnttab_alloc_pages(). Make use of those helpers in blkback and scsiback. Cc: <stable@vger.kernel.org> # 5.9 Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovksy@oracle.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2020-12-08net: stmmac: overwrite the dma_cap.addr64 according to HW designFugang Duan
The current IP register MAC_HW_Feature1[ADDR64] only defines 32/40/64 bit width, but some SOCs support others like i.MX8MP support 34 bits but it maps to 40 bits width in MAC_HW_Feature1[ADDR64]. So overwrite dma_cap.addr64 according to HW real design. Fixes: 94abdad6974a ("net: ethernet: dwmac: add ethernet glue logic for NXP imx8 chip") Signed-off-by: Fugang Duan <fugang.duan@nxp.com> Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-08netfilter: nft_dynset: fix timeouts later than 23 daysPablo Neira Ayuso
Use nf_msecs_to_jiffies64 and nf_jiffies64_to_msecs as provided by 8e1102d5a159 ("netfilter: nf_tables: support timeouts larger than 23 days"), otherwise ruleset listing breaks. Fixes: a8b1e36d0d1d ("netfilter: nft_dynset: fix element timeout for HZ != 1000") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-12-08bonding: fix feature flag setting at init timeJarod Wilson
Don't try to adjust XFRM support flags if the bond device isn't yet registered. Bad things can currently happen when netdev_change_features() is called without having wanted_features fully filled in yet. This code runs both on post-module-load mode changes, as well as at module init time, and when run at module init time, it is before register_netdevice() has been called and filled in wanted_features. The empty wanted_features led to features also getting emptied out, which was definitely not the intended behavior, so prevent that from happening. Originally, I'd hoped to stop adjusting wanted_features at all in the bonding driver, as it's documented as being something only the network core should touch, but we actually do need to do this to properly update both the features and wanted_features fields when changing the bond type, or we get to a situation where ethtool sees: esp-hw-offload: off [requested on] I do think we should be using netdev_update_features instead of netdev_change_features here though, so we only send notifiers when the features actually changed. Fixes: a3b658cfb664 ("bonding: allow xfrm offload setup post-module-load") Reported-by: Ivan Vecera <ivecera@redhat.com> Suggested-by: Ivan Vecera <ivecera@redhat.com> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Jarod Wilson <jarod@redhat.com> Link: https://lore.kernel.org/r/20201205172229.576587-1-jarod@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-08netfilter: x_tables: Switch synchronization to RCUSubash Abhinov Kasiviswanathan
When running concurrent iptables rules replacement with data, the per CPU sequence count is checked after the assignment of the new information. The sequence count is used to synchronize with the packet path without the use of any explicit locking. If there are any packets in the packet path using the table information, the sequence count is incremented to an odd value and is incremented to an even after the packet process completion. The new table value assignment is followed by a write memory barrier so every CPU should see the latest value. If the packet path has started with the old table information, the sequence counter will be odd and the iptables replacement will wait till the sequence count is even prior to freeing the old table info. However, this assumes that the new table information assignment and the memory barrier is actually executed prior to the counter check in the replacement thread. If CPU decides to execute the assignment later as there is no user of the table information prior to the sequence check, the packet path in another CPU may use the old table information. The replacement thread would then free the table information under it leading to a use after free in the packet processing context- Unable to handle kernel NULL pointer dereference at virtual address 000000000000008e pc : ip6t_do_table+0x5d0/0x89c lr : ip6t_do_table+0x5b8/0x89c ip6t_do_table+0x5d0/0x89c ip6table_filter_hook+0x24/0x30 nf_hook_slow+0x84/0x120 ip6_input+0x74/0xe0 ip6_rcv_finish+0x7c/0x128 ipv6_rcv+0xac/0xe4 __netif_receive_skb+0x84/0x17c process_backlog+0x15c/0x1b8 napi_poll+0x88/0x284 net_rx_action+0xbc/0x23c __do_softirq+0x20c/0x48c This could be fixed by forcing instruction order after the new table information assignment or by switching to RCU for the synchronization. Fixes: 80055dab5de0 ("netfilter: x_tables: make xt_replace_table wait until old rules are not used anymore") Reported-by: Sean Tranchetti <stranche@codeaurora.org> Reported-by: kernel test robot <lkp@intel.com> Suggested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-12-06Merge tag 'tty-5.10-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty fixes from Greg KH: "Here are two tty core fixes for 5.10-rc7. They resolve some reported locking issues in the tty core. While they have not been in a released linux-next yet, they have passed all of the 0-day bot testing as well as the submitter's testing" * tag 'tty-5.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: tty: Fix ->session locking tty: Fix ->pgrp locking in tiocspgrp()
2020-12-06Merge tag 'irq-urgent-2020-12-06' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Thomas Gleixner: "A set of updates for the interrupt subsystem: - Make multiqueue devices which use the managed interrupt affinity infrastructure work on PowerPC/Pseries. PowerPC does not use the generic infrastructure for setting up PCI/MSI interrupts and the multiqueue changes failed to update the legacy PCI/MSI infrastructure. Make this work by passing the affinity setup information down to the mapping and allocation functions. - Move Jason Cooper from MAINTAINERS to CREDITS as his mail is bouncing and he's not reachable. We hope all is well with him and say thanks for his work over the years" * tag 'irq-urgent-2020-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: powerpc/pseries: Pass MSI affinity to irq_create_mapping() genirq/irqdomain: Add an irq_create_mapping_affinity() function MAINTAINERS: Move Jason Cooper to CREDITS
2020-12-06mm/zsmalloc.c: drop ZSMALLOC_PGTABLE_MAPPINGMinchan Kim
While I was doing zram testing, I found sometimes decompression failed since the compression buffer was corrupted. With investigation, I found below commit calls cond_resched unconditionally so it could make a problem in atomic context if the task is reschedule. BUG: sleeping function called from invalid context at mm/vmalloc.c:108 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 946, name: memhog 3 locks held by memhog/946: #0: ffff9d01d4b193e8 (&mm->mmap_lock#2){++++}-{4:4}, at: __mm_populate+0x103/0x160 #1: ffffffffa3d53de0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_slowpath.constprop.0+0xa98/0x1160 #2: ffff9d01d56b8110 (&zspage->lock){.+.+}-{3:3}, at: zs_map_object+0x8e/0x1f0 CPU: 0 PID: 946 Comm: memhog Not tainted 5.9.3-00011-gc5bfc0287345-dirty #316 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 Call Trace: unmap_kernel_range_noflush+0x2eb/0x350 unmap_kernel_range+0x14/0x30 zs_unmap_object+0xd5/0xe0 zram_bvec_rw.isra.0+0x38c/0x8e0 zram_rw_page+0x90/0x101 bdev_write_page+0x92/0xe0 __swap_writepage+0x94/0x4a0 pageout+0xe3/0x3a0 shrink_page_list+0xb94/0xd60 shrink_inactive_list+0x158/0x460 We can fix this by removing the ZSMALLOC_PGTABLE_MAPPING feature (which contains the offending calling code) from zsmalloc. Even though this option showed some amount improvement(e.g., 30%) in some arm32 platforms, it has been headache to maintain since it have abused APIs[1](e.g., unmap_kernel_range in atomic context). Since we are approaching to deprecate 32bit machines and already made the config option available for only builtin build since v5.8, lastly it has been not default option in zsmalloc, it's time to drop the option for better maintenance. [1] http://lore.kernel.org/linux-mm/20201105170249.387069-1-minchan@kernel.org Fixes: e47110e90584 ("mm/vunmap: add cond_resched() in vunmap_pmd_range") Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Tony Lindgren <tony@atomide.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Harish Sriram <harish@linux.ibm.com> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20201117202916.GA3856507@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-05net: mscc: ocelot: fix dropping of unknown IPv4 multicast on SevilleVladimir Oltean
The current assumption is that the felix DSA driver has flooding knobs per traffic class, while ocelot switchdev has a single flooding knob. This was correct for felix VSC9959 and ocelot VSC7514, but with the introduction of seville VSC9953, we see a switch driven by felix.c which has a single flooding knob. So it is clear that we must do what should have been done from the beginning, which is not to overwrite the configuration done by ocelot.c in felix, but instead to teach the common ocelot library about the differences in our switches, and set up the flooding PGIDs centrally. The effect that the bogus iteration through FELIX_NUM_TC has upon seville is quite dramatic. ANA_FLOODING is located at 0x00b548, and ANA_FLOODING_IPMC is located at 0x00b54c. So the bogus iteration will actually overwrite ANA_FLOODING_IPMC when attempting to write ANA_FLOODING[1]. There is no ANA_FLOODING[1] in sevile, just ANA_FLOODING. And when ANA_FLOODING_IPMC is overwritten with a bogus value, the effect is that ANA_FLOODING_IPMC gets the value of 0x0003CF7D: MC6_DATA = 61, MC6_CTRL = 61, MC4_DATA = 60, MC4_CTRL = 0. Because MC4_CTRL is zero, this means that IPv4 multicast control packets are not flooded, but dropped. An invalid configuration, and this is how the issue was actually spotted. Reported-by: Eldar Gasanov <eldargasanov2@gmail.com> Reported-by: Maxim Kochetkov <fido_max@inbox.ru> Tested-by: Eldar Gasanov <eldargasanov2@gmail.com> Fixes: 84705fc16552 ("net: dsa: felix: introduce support for Seville VSC9953 switch") Fixes: 3c7b51bd39b2 ("net: dsa: felix: allow flooding for all traffic classes") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20201204175416.1445937-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-05Merge tag 'for-5.10/dm-fixes-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull fix for device mapper fixes from Mike Snitzer: "Apologies for the glaring bug I introduced with my previous pull request! Fix incorrect branching at top of blk_max_size_offset()" * tag 'for-5.10/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: block: fix incorrect branching in blk_max_size_offset()
2020-12-04[SECURITY] fix namespaced fscaps when !CONFIG_SECURITYSerge Hallyn
Namespaced file capabilities were introduced in 8db6c34f1dbc . When userspace reads an xattr for a namespaced capability, a virtualized representation of it is returned if the caller is in a user namespace owned by the capability's owning rootid. The function which performs this virtualization was not hooked up if CONFIG_SECURITY=n. Therefore in that case the original xattr was shown instead of the virtualized one. To test this using libcap-bin (*1), $ v=$(mktemp) $ unshare -Ur setcap cap_sys_admin-eip $v $ unshare -Ur setcap -v cap_sys_admin-eip $v /tmp/tmp.lSiIFRvt8Y: OK "setcap -v" verifies the values instead of setting them, and will check whether the rootid value is set. Therefore, with this bug un-fixed, and with CONFIG_SECURITY=n, setcap -v will fail: $ v=$(mktemp) $ unshare -Ur setcap cap_sys_admin=eip $v $ unshare -Ur setcap -v cap_sys_admin=eip $v nsowner[got=1000, want=0],/tmp/tmp.HHDiOOl9fY differs in [] Fix this bug by calling cap_inode_getsecurity() in security_inode_getsecurity() instead of returning -EOPNOTSUPP, when CONFIG_SECURITY=n. *1 - note, if libcap is too old for getcap to have the '-n' option, then use verify-caps instead. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=209689 Cc: Hervé Guillemet <herve@guillemet.org> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Serge Hallyn <shallyn@cisco.com> Signed-off-by: Andrew G. Morgan <morgan@kernel.org> Signed-off-by: James Morris <jamorris@linux.microsoft.com>
2020-12-04block: fix incorrect branching in blk_max_size_offset()Mike Snitzer
If non-zero 'chunk_sectors' is passed in to blk_max_size_offset() that override will be incorrectly ignored. Old blk_max_size_offset() branching, prior to commit 3ee16db390b4, must be used only if passed 'chunk_sectors' override is zero. Fixes: 3ee16db390b4 ("dm: fix IO splitting") Cc: stable@vger.kernel.org # 5.9 Reported-by: John Dorminy <jdorminy@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-12-04Merge tag 'for-5.10/dm-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mike Snitzer: - Fix DM's bio splitting changes that were made during v5.9. This restores splitting in terms of varied per-target ti->max_io_len rather than use block core's single stacked 'chunk_sectors' limit. - Like DM crypt, update DM integrity to not use crypto drivers that have CRYPTO_ALG_ALLOCATES_MEMORY set. - Fix DM writecache target's argument parsing and status display. - Remove needless BUG() from dm writecache's persistent_memory_claim() - Remove old gcc workaround in DM cache target's block_div() for ARM link errors now that gcc >= 4.9 is required. - Fix RCU locking in dm_blk_report_zones and dm_dax_zero_page_range. - Remove old, and now frowned upon, BUG_ON(in_interrupt()) in dm_table_event(). - Remove invalid sparse annotations from dm_prepare_ioctl() and dm_unprepare_ioctl(). * tag 'for-5.10/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm: remove invalid sparse __acquires and __releases annotations dm: fix double RCU unlock in dm_dax_zero_page_range() error path dm: fix IO splitting dm writecache: remove BUG() and fail gracefully instead dm table: Remove BUG_ON(in_interrupt()) dm: fix bug with RCU locking in dm_blk_report_zones Revert "dm cache: fix arm link errors with inline" dm writecache: fix the maximum number of arguments dm writecache: advance the number of arguments when reporting max_age dm integrity: don't use drivers that have CRYPTO_ALG_ALLOCATES_MEMORY
2020-12-04dm: fix IO splittingMike Snitzer
Commit 882ec4e609c1 ("dm table: stack 'chunk_sectors' limit to account for target-specific splitting") caused a couple regressions: 1) Using lcm_not_zero() when stacking chunk_sectors was a bug because chunk_sectors must reflect the most limited of all devices in the IO stack. 2) DM targets that set max_io_len but that do _not_ provide an .iterate_devices method no longer had there IO split properly. And commit 5091cdec56fa ("dm: change max_io_len() to use blk_max_size_offset()") also caused a regression where DM no longer supported varied (per target) IO splitting. The implication being the potential for severely reduced performance for IO stacks that use a DM target like dm-cache to hide performance limitations of a slower device (e.g. one that requires 4K IO splitting). Coming full circle: Fix all these issues by discontinuing stacking chunk_sectors up using ti->max_io_len in dm_calculate_queue_limits(), add optional chunk_sectors override argument to blk_max_size_offset() and update DM's max_io_len() to pass ti->max_io_len to its blk_max_size_offset() call. Passing in an optional chunk_sectors override to blk_max_size_offset() allows for code reuse of block's centralized calculation for max IO size based on provided offset and split boundary. Fixes: 882ec4e609c1 ("dm table: stack 'chunk_sectors' limit to account for target-specific splitting") Fixes: 5091cdec56fa ("dm: change max_io_len() to use blk_max_size_offset()") Cc: stable@vger.kernel.org Reported-by: John Dorminy <jdorminy@redhat.com> Reported-by: Bruce Johnston <bjohnsto@redhat.com> Reported-by: Kirill Tkhai <ktkhai@virtuozzo.com> Reviewed-by: John Dorminy <jdorminy@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Jens Axboe <axboe@kernel.dk>
2020-12-04tty: Fix ->session lockingJann Horn
Currently, locking of ->session is very inconsistent; most places protect it using the legacy tty mutex, but disassociate_ctty(), __do_SAK(), tiocspgrp() and tiocgsid() don't. Two of the writers hold the ctrl_lock (because they already need it for ->pgrp), but __proc_set_tty() doesn't do that yet. On a PREEMPT=y system, an unprivileged user can theoretically abuse this broken locking to read 4 bytes of freed memory via TIOCGSID if tiocgsid() is preempted long enough at the right point. (Other things might also go wrong, especially if root-only ioctls are involved; I'm not sure about that.) Change the locking on ->session such that: - tty_lock() is held by all writers: By making disassociate_ctty() hold it. This should be fine because the same lock can already be taken through the call to tty_vhangup_session(). The tricky part is that we need to shorten the area covered by siglock to be able to take tty_lock() without ugly retry logic; as far as I can tell, this should be fine, since nothing in the signal_struct is touched in the `if (tty)` branch. - ctrl_lock is held by all writers: By changing __proc_set_tty() to hold the lock a little longer. - All readers that aren't holding tty_lock() hold ctrl_lock: By adding locking to tiocgsid() and __do_SAK(), and expanding the area covered by ctrl_lock in tiocspgrp(). Cc: stable@kernel.org Signed-off-by: Jann Horn <jannh@google.com> Reviewed-by: Jiri Slaby <jirislaby@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-03Merge tag 'net-5.10-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Networking fixes for 5.10-rc7, including fixes from bpf, netfilter, wireless drivers, wireless mesh and can. Current release - regressions: - mt76: usb: fix crash on device removal Current release - always broken: - xsk: Fix umem cleanup from wrong context in socket destruct Previous release - regressions: - net: ip6_gre: set dev->hard_header_len when using header_ops - ipv4: Fix TOS mask in inet_rtm_getroute() - net, xsk: Avoid taking multiple skbuff references Previous release - always broken: - net/x25: prevent a couple of overflows - netfilter: ipset: prevent uninit-value in hash_ip6_add - geneve: pull IP header before ECN decapsulation - mpls: ensure LSE is pullable in TC and openvswitch paths - vxlan: respect needed_headroom of lower device - batman-adv: Consider fragmentation for needed packet headroom - can: drivers: don't count arbitration loss as an error - netfilter: bridge: reset skb->pkt_type after POST_ROUTING traversal - inet_ecn: Fix endianness of checksum update when setting ECT(1) - ibmvnic: fix various corner cases around reset handling - net/mlx5: fix rejecting unsupported Connect-X6DX SW steering - net/mlx5: Enforce HW TX csum offload with kTLS" * tag 'net-5.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (62 commits) net/mlx5: DR, Proper handling of unsupported Connect-X6DX SW steering net/mlx5e: kTLS, Enforce HW TX csum offload with kTLS net: mlx5e: fix fs_tcp.c build when IPV6 is not enabled net/mlx5: Fix wrong address reclaim when command interface is down net/sched: act_mpls: ensure LSE is pullable before reading it net: openvswitch: ensure LSE is pullable before reading it net: skbuff: ensure LSE is pullable before decrementing the MPLS ttl net: mvpp2: Fix error return code in mvpp2_open() chelsio/chtls: fix a double free in chtls_setkey() rtw88: debug: Fix uninitialized memory in debugfs code vxlan: fix error return code in __vxlan_dev_create() net: pasemi: fix error return code in pasemi_mac_open() cxgb3: fix error return code in t3_sge_alloc_qset() net/x25: prevent a couple of overflows dpaa_eth: copy timestamp fields to new skb in A-050385 workaround net: ip6_gre: set dev->hard_header_len when using header_ops mt76: usb: fix crash on device removal iwlwifi: pcie: add some missing entries for AX210 iwlwifi: pcie: invert values of NO_160 device config entries iwlwifi: pcie: add one missing entry for AX210 ...
2020-12-03net/mlx5: DR, Proper handling of unsupported Connect-X6DX SW steeringYevgeny Kliteynik
STEs format for Connect-X5 and Connect-X6DX different. Currently, on Connext-X6DX the SW steering would break at some point when building STEs w/o giving a proper error message. Fix this by checking the STE format of the current device when initializing domain: add mlx5_ifc definitions for Connect-X6DX SW steering, read FW capability to get the current format version, and check this version when domain is being created. Fixes: 26d688e33f88 ("net/mlx5: DR, Add Steering entry (STE) utilities") Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-03uapi: fix statx attribute value overlap for DAX & MOUNT_ROOTEric Sandeen
STATX_ATTR_MOUNT_ROOT and STATX_ATTR_DAX got merged with the same value, so one of them needs fixing. Move STATX_ATTR_DAX. While we're in here, clarify the value-matching scheme for some of the attributes, and explain why the value for DAX does not match. Fixes: 80340fe3605c ("statx: add mount_root") Fixes: 712b2698e4c0 ("fs/stat: Define DAX statx attribute") Link: https://lore.kernel.org/linux-fsdevel/7027520f-7c79-087e-1d00-743bdefa1a1e@redhat.com/ Link: https://lore.kernel.org/lkml/20201202214629.1563760-1-ira.weiny@intel.com/ Reported-by: David Howells <dhowells@redhat.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: David Howells <dhowells@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Cc: <stable@vger.kernel.org> # 5.8 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-01inet_ecn: Fix endianness of checksum update when setting ECT(1)Toke Høiland-Jørgensen
When adding support for propagating ECT(1) marking in IP headers it seems I suffered from endianness-confusion in the checksum update calculation: In fact the ECN field is in the *lower* bits of the first 16-bit word of the IP header when calculating in network byte order. This means that the addition performed to update the checksum field was wrong; let's fix that. Fixes: b723748750ec ("tunnel: Propagate ECT(1) when decapsulating as recommended by RFC6040") Reported-by: Jonathan Morton <chromatix99@gmail.com> Tested-by: Pete Heist <pete@heistp.net> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20201130183705.17540-1-toke@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-01Merge tag 'trace-v5.10-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: - Use correct timestamp variable for ring buffer write stamp update - Fix up before stamp and write stamp when crossing ring buffer sub buffers - Keep a zero delta in ring buffer in slow path if cmpxchg fails - Fix trace_printk static buffer for archs that care - Fix ftrace record accounting for ftrace ops with trampolines - Fix DYNAMIC_FTRACE_WITH_DIRECT_CALLS dependency - Remove WARN_ON in hwlat tracer that triggers on something that is OK - Make "my_tramp" trampoline in ftrace direct sample code global - Fixes in the bootconfig tool for better alignment management * tag 'trace-v5.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ring-buffer: Always check to put back before stamp when crossing pages ftrace: Fix DYNAMIC_FTRACE_WITH_DIRECT_CALLS dependency ftrace: Fix updating FTRACE_FL_TRAMP tracing: Fix alignment of static buffer tracing: Remove WARN_ON in start_thread() samples/ftrace: Mark my_tramp[12]? global ring-buffer: Set the right timestamp in the slow path of __rb_reserve_next() ring-buffer: Update write stamp with the correct ts docs: bootconfig: Update file format on initrd image tools/bootconfig: Align the bootconfig applied initrd image size to 4 tools/bootconfig: Fix to check the write failure correctly tools/bootconfig: Fix errno reference after printf()
2020-11-30pNFS/flexfiles: Fix array overflow when flexfiles mirroring is enabledTrond Myklebust
If the flexfiles mirroring is enabled, then the read code expects to be able to set pgio->pg_mirror_idx to point to the data server that is being used for this particular read. However it does not change the pg_mirror_count because we only need to send a single read. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-11-30genirq/irqdomain: Add an irq_create_mapping_affinity() functionLaurent Vivier
There is currently no way to convey the affinity of an interrupt via irq_create_mapping(), which creates issues for devices that expect that affinity to be managed by the kernel. In order to sort this out, rename irq_create_mapping() to irq_create_mapping_affinity() with an additional affinity parameter that can be passed down to irq_domain_alloc_descs(). irq_create_mapping() is re-implemented as a wrapper around irq_create_mapping_affinity(). No functional change. Fixes: e75eafb9b039 ("genirq/msi: Switch to new irq spreading infrastructure") Signed-off-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Greg Kurz <groug@kaod.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20201126082852.1178497-2-lvivier@redhat.com
2020-11-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfJakub Kicinski
Pablo Neira Ayuso says: ==================== Netfilter fixes for net 1) Fix insufficient validation of IPSET_ATTR_IPADDR_IPV6 reported by syzbot. 2) Remove spurious reports on nf_tables when lockdep gets disabled, from Florian Westphal. 3) Fix memleak in the error path of error path of ip_vs_control_net_init(), from Wang Hai. 4) Fix missing control data in flow dissector, otherwise IP address matching in hardware offload infra does not work. 5) Fix hardware offload match on prefix IP address when userspace does not send a bitwise expression to represent the prefix. * git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf: netfilter: nftables_offload: build mask based from the matching bytes netfilter: nftables_offload: set address type in control dissector ipvs: fix possible memory leak in ip_vs_control_net_init netfilter: nf_tables: avoid false-postive lockdep splat netfilter: ipset: prevent uninit-value in hash_ip6_add ==================== Link: https://lore.kernel.org/r/20201127190313.24947-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-28Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfJakub Kicinski
Daniel Borkmann says: ==================== pull-request: bpf 2020-11-28 1) Do not reference the skb for xsk's generic TX side since when looped back into RX it might crash in generic XDP, from Björn Töpel. 2) Fix umem cleanup on a partially set up xsk socket when being destroyed, from Magnus Karlsson. 3) Fix an incorrect netdev reference count when failing xsk_bind() operation, from Marek Majtyka. 4) Fix bpftool to set an error code on failed calloc() in build_btf_type_table(), from Zhen Lei. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Add MAINTAINERS entry for BPF LSM bpftool: Fix error return value in build_btf_type_table net, xsk: Avoid taking multiple skbuff references xsk: Fix incorrect netdev reference count xsk: Fix umem cleanup bug at socket destruct MAINTAINERS: Update XDP and AF_XDP entries ==================== Link: https://lore.kernel.org/r/20201128005104.1205-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-27Merge tag 'asm-generic-fixes-5.10-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic fix from Arnd Bergmann: "Add correct MAX_POSSIBLE_PHYSMEM_BITS setting to asm-generic. This is a single bugfix for a bug that Stefan Agner found on 32-bit Arm, but that exists on several other architectures" * tag 'asm-generic-fixes-5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: arch: pgtable: define MAX_POSSIBLE_PHYSMEM_BITS where needed
2020-11-27Merge tag 'arm-soc-fixes-v5.10-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC fixes from Arnd Bergmann: "Another set of patches for devicetree files and Arm SoC specific drivers: - A fix for OP-TEE shared memory on non-SMP systems - multiple code fixes for the OMAP platform, including one regression for the CPSW network driver and a few runtime warning fixes - Some DT patches for the Rockchip RK3399 platform, in particular fixing the MMC device ordering that recently became nondeterministic with async probe. - Multiple DT fixes for the Tegra platform, including a regression fix for suspend/resume on TX2 - A regression fix for a user-triggered fault in the NXP dpio driver - A regression fix for a bug caused by an earlier bug fix in the xilinx firmware driver - Two more DTC warning fixes - Sylvain Lemieux steps down as maintainer for the NXP LPC32xx platform" * tag 'arm-soc-fixes-v5.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (24 commits) arm64: tegra: Fix Tegra234 VDK node names arm64: tegra: Wrong AON HSP reg property size arm64: tegra: Fix USB_VBUS_EN0 regulator on Jetson TX1 arm64: tegra: Correct the UART for Jetson Xavier NX arm64: tegra: Disable the ACONNECT for Jetson TX2 optee: add writeback to valid memory type firmware: xilinx: Use hash-table for api feature check firmware: xilinx: Fix SD DLL node reset issue soc: fsl: dpio: Get the cpumask through cpumask_of(cpu) ARM: dts: dra76x: m_can: fix order of clocks bus: ti-sysc: suppress err msg for timers used as clockevent/source MAINTAINERS: Remove myself as LPC32xx maintainers arm64: dts: qcom: clear the warnings caused by empty dma-ranges arm64: dts: broadcom: clear the warnings caused by empty dma-ranges ARM: dts: am437x-l4: fix compatible for cpsw switch dt node arm64: dts: rockchip: Reorder LED triggers from mmc devices on rk3399-roc-pc. arm64: dts: rockchip: Assign a fixed index to mmc devices on rk3399 boards. arm64: dts: rockchip: Remove system-power-controller from pmic on Odroid Go Advance arm64: dts: rockchip: fix NanoPi R2S GMAC clock name ARM: OMAP2+: Manage MPU state properly for omap_enter_idle_coupled() ...
2020-11-27Merge tag 'net-5.10-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Networking fixes for 5.10-rc6, including fixes from the WiFi driver, and CAN subtrees. Current release - regressions: - gro_cells: reduce number of synchronize_net() calls - ch_ktls: release a lock before jumping to an error path Current release - always broken: - tcp: Allow full IP tos/IPv6 tclass to be reflected in L3 header Previous release - regressions: - net/tls: fix missing received data after fast remote close - vsock/virtio: discard packets only when socket is really closed - sock: set sk_err to ee_errno on dequeue from errq - cxgb4: fix the panic caused by non smac rewrite Previous release - always broken: - tcp: fix corner cases around setting ECN with BPF selection of congestion control - tcp: fix race condition when creating child sockets from syncookies on loopback interface - usbnet: ipheth: fix connectivity with iOS 14 - tun: honor IOCB_NOWAIT flag - net/packet: fix packet receive on L3 devices without visible hard header - devlink: Make sure devlink instance and port are in same net namespace - net: openvswitch: fix TTL decrement action netlink message format - bonding: wait for sysfs kobject destruction before freeing struct slave - net: stmmac: fix upstream patch applied to the wrong context - bnxt_en: fix return value and unwind in probe error paths Misc: - devlink: add extra layer of categorization to the reload stats uAPI before it's released" * tag 'net-5.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (68 commits) sock: set sk_err to ee_errno on dequeue from errq mptcp: fix NULL ptr dereference on bad MPJ net: openvswitch: fix TTL decrement action netlink message format can: af_can: can_rx_unregister(): remove WARN() statement from list operation sanity check can: m_can: m_can_dev_setup(): add support for bosch mcan version 3.3.0 can: m_can: fix nominal bitiming tseg2 min for version >= 3.1 can: m_can: m_can_open(): remove IRQF_TRIGGER_FALLING from request_threaded_irq()'s flags can: mcp251xfd: mcp251xfd_probe(): bail out if no IRQ was given can: gs_usb: fix endianess problem with candleLight firmware ch_ktls: lock is not freed net/tls: Protect from calling tls_dev_del for TLS RX twice devlink: Make sure devlink instance and port are in same net namespace devlink: Hold rtnl lock while reading netdev attributes ptp: clockmatrix: bug fix for idtcm_strverscmp enetc: Let the hardware auto-advance the taprio base-time of 0 gro_cells: reduce number of synchronize_net() calls net: stmmac: fix incorrect merge of patch upstream ipv6: addrlabel: fix possible memory leak in ip6addrlbl_net_init Documentation: netdev-FAQ: suggest how to post co-dependent series ibmvnic: enhance resetting status check during module exit ...
2020-11-27net: openvswitch: fix TTL decrement action netlink message formatEelco Chaudron
Currently, the openvswitch module is not accepting the correctly formated netlink message for the TTL decrement action. For both setting and getting the dec_ttl action, the actions should be nested in the OVS_DEC_TTL_ATTR_ACTION attribute as mentioned in the openvswitch.h uapi. When the original patch was sent, it was tested with a private OVS userspace implementation. This implementation was unfortunately not upstreamed and reviewed, hence an erroneous version of this patch was sent out. Leaving the patch as-is would cause problems as the kernel module could interpret additional attributes as actions and vice-versa, due to the actions not being encapsulated/nested within the actual attribute, but being concatinated after it. Fixes: 744676e77720 ("openvswitch: add TTL decrement action") Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Link: https://lore.kernel.org/r/160622121495.27296.888010441924340582.stgit@wsfd-netdev64.ntdv.lab.eng.bos.redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-27Merge tag 'writeback_for_v5.10-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull writeback fix from Jan Kara: "A fix of possible missing string termination in writeback tracepoints" * tag 'writeback_for_v5.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: trace: fix potenial dangerous pointer
2020-11-27Merge tag 'omap-for-v5.10/fixes-rc5-signed' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into arm/fixes Fixes for omaps for various issues noticed during the -rc cycle: - Earlier omap4 cpuidle fix was incomplete and needs to use a configured idle state instead - Fix am4 cpsw driver compatible to avoid invalid resource error for the legacy driver - Two kconfig fixes for genpd support that we added for for v5.10 for proper location of the option and adding missing option - Fix ti-sysc reset status checking on enabling modules to ignore quirky modules with reset status only usable when the quirk is activated during reset. Also fix bogus resetdone warning for cpsw and modules with no sysst register reset status bit - Suppress a ti-sysc warning for timers reserved as system timers - Fix the ordering of clocks for dra7 m_can * tag 'omap-for-v5.10/fixes-rc5-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap: ARM: dts: dra76x: m_can: fix order of clocks bus: ti-sysc: suppress err msg for timers used as clockevent/source ARM: dts: am437x-l4: fix compatible for cpsw switch dt node ARM: OMAP2+: Manage MPU state properly for omap_enter_idle_coupled() bus: ti-sysc: Fix bogus resetdone warning on enable for cpsw bus: ti-sysc: Fix reset status check for modules with quirks ARM: OMAP2+: Fix missing select PM_GENERIC_DOMAINS_OF ARM: OMAP2+: Fix location for select PM_GENERIC_DOMAINS Link: https://lore.kernel.org/r/pull-1606460270-864284@atomide.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2020-11-27netfilter: nftables_offload: build mask based from the matching bytesPablo Neira Ayuso
Userspace might match on prefix bytes of header fields if they are on the byte boundary, this requires that the mask is adjusted accordingly. Use NFT_OFFLOAD_MATCH_EXACT() for meta since prefix byte matching is not allowed for this type of selector. The bitwise expression might be optimized out by userspace, hence the kernel needs to infer the prefix from the number of payload bytes to match on. This patch adds nft_payload_offload_mask() to calculate the bitmask to match on the prefix. Fixes: c9626a2cbdb2 ("netfilter: nf_tables: add hardware offload support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-11-27netfilter: nftables_offload: set address type in control dissectorPablo Neira Ayuso
This patch adds nft_flow_rule_set_addr_type() to set the address type from the nft_payload expression accordingly. If the address type is not set in the control dissector then a rule that matches either on source or destination IP address does not work. After this patch, nft hardware offload generates the flow dissector configuration as tc-flower does to match on an IP address. This patch has been also tested functionally to make sure packets are filtered out by the NIC. This is also getting the code aligned with the existing netfilter flow offload infrastructure which is also setting the control dissector. Fixes: c9626a2cbdb2 ("netfilter: nf_tables: add hardware offload support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-11-26mm: memcg: relayout structure mem_cgroup to avoid cache interferenceFeng Tang
0day reported one -22.7% regression for will-it-scale page_fault2 case [1] on a 4 sockets 144 CPU platform, and bisected to it to be caused by Waiman's optimization (commit bd0b230fe1) of saving one 'struct page_counter' space for 'struct mem_cgroup'. Initially we thought it was due to the cache alignment change introduced by the patch, but further debug shows that it is due to some hot data members ('vmstats_local', 'vmstats_percpu', 'vmstats') sit in 2 adjacent cacheline (2N and 2N+1 cacheline), and when adjacent cache line prefetch is enabled, it triggers an "extended level" of cache false sharing for 2 adjacent cache lines. So exchange the 2 member blocks, while keeping mostly the original cache alignment, which can restore and even enhance the performance, and save 64 bytes of space for 'struct mem_cgroup' (from 2880 to 2816, with 0day's default RHEL-8.3 kernel config) [1]. https://lore.kernel.org/lkml/20201102091543.GM31092@shao2-debian/ Fixes: bd0b230fe145 ("mm/memcg: unify swap and memsw page counters") Reported-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: Feng Tang <feng.tang@intel.com> Acked-by: Waiman Long <longman@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-11-25net/tls: Protect from calling tls_dev_del for TLS RX twiceMaxim Mikityanskiy
tls_device_offload_cleanup_rx doesn't clear tls_ctx->netdev after calling tls_dev_del if TLX TX offload is also enabled. Clearing tls_ctx->netdev gets postponed until tls_device_gc_task. It leaves a time frame when tls_device_down may get called and call tls_dev_del for RX one extra time, confusing the driver, which may lead to a crash. This patch corrects this racy behavior by adding a flag to prevent tls_device_down from calling tls_dev_del the second time. Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20201125221810.69870-1-saeedm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-25trace: fix potenial dangerous pointerHui Su
The bdi_dev_name() returns a char [64], and the __entry->name is a char [32]. It maybe dangerous to TP_printk("%s", __entry->name) after the strncpy(). CC: stable@vger.kernel.org Link: https://lore.kernel.org/r/20201124165205.GA23937@rlk Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Hui Su <sh_def@163.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-11-24net, xsk: Avoid taking multiple skbuff referencesBjörn Töpel
Commit 642e450b6b59 ("xsk: Do not discard packet when NETDEV_TX_BUSY") addressed the problem that packets were discarded from the Tx AF_XDP ring, when the driver returned NETDEV_TX_BUSY. Part of the fix was bumping the skbuff reference count, so that the buffer would not be freed by dev_direct_xmit(). A reference count larger than one means that the skbuff is "shared", which is not the case. If the "shared" skbuff is sent to the generic XDP receive path, netif_receive_generic_xdp(), and pskb_expand_head() is entered the BUG_ON(skb_shared(skb)) will trigger. This patch adds a variant to dev_direct_xmit(), __dev_direct_xmit(), where a user can select the skbuff free policy. This allows AF_XDP to avoid bumping the reference count, but still keep the NETDEV_TX_BUSY behavior. Fixes: 642e450b6b59 ("xsk: Do not discard packet when NETDEV_TX_BUSY") Reported-by: Yonghong Song <yhs@fb.com> Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20201123175600.146255-1-bjorn.topel@gmail.com
2020-11-24devlink: Fix reload stats structureMoshe Shemesh
Fix reload stats structure exposed to the user. Change stats structure hierarchy to have the reload action as a parent of the stat entry and then stat entry includes value per limit. This will also help to avoid string concatenation on iproute2 output. Reload stats structure before this fix: "stats": { "reload": { "driver_reinit": 2, "fw_activate": 1, "fw_activate_no_reset": 0 } } After this fix: "stats": { "reload": { "driver_reinit": { "unspecified": 2 }, "fw_activate": { "unspecified": 1, "no_reset": 0 } } Fixes: a254c264267e ("devlink: Add reload stats") Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/1606109785-25197-1-git-send-email-moshe@mellanox.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-24firmware: xilinx: Use hash-table for api feature checkAmit Sunil Dhamne
Currently array of fix length PM_API_MAX is used to cache the pm_api version (valid or invalid). However ATF based PM APIs values are much higher then PM_API_MAX. So to include ATF based PM APIs also, use hash-table to store the pm_api version status. Signed-off-by: Amit Sunil Dhamne <amit.sunil.dhamne@xilinx.com> Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Ravi Patel <ravi.patel@xilinx.com> Signed-off-by: Rajan Vaja <rajan.vaja@xilinx.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Tested-by: Michal Simek <michal.simek@xilinx.com> Fixes: f3217d6f2f7a ("firmware: xilinx: fix out-of-bounds access") Cc: stable <stable@vger.kernel.org> Link: https://lore.kernel.org/r/1606197161-25976-1-git-send-email-rajan.vaja@xilinx.com Signed-off-by: Michal Simek <michal.simek@xilinx.com>
2020-11-23net/packet: fix packet receive on L3 devices without visible hard headerEyal Birger
In the patchset merged by commit b9fcf0a0d826 ("Merge branch 'support-AF_PACKET-for-layer-3-devices'") L3 devices which did not have header_ops were given one for the purpose of protocol parsing on af_packet transmit path. That change made af_packet receive path regard these devices as having a visible L3 header and therefore aligned incoming skb->data to point to the skb's mac_header. Some devices, such as ipip, xfrmi, and others, do not reset their mac_header prior to ingress and therefore their incoming packets became malformed. Ideally these devices would reset their mac headers, or af_packet would be able to rely on dev->hard_header_len being 0 for such cases, but it seems this is not the case. Fix by changing af_packet RX ll visibility criteria to include the existence of a '.create()' header operation, which is used when creating a device hard header - via dev_hard_header() - by upper layers, and does not exist in these L3 devices. As this predicate may be useful in other situations, add it as a common dev_has_header() helper in netdevice.h. Fixes: b9fcf0a0d826 ("Merge branch 'support-AF_PACKET-for-layer-3-devices'") Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Acked-by: Jason A. Donenfeld <Jason@zx2c4.com> Acked-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20201121062817.3178900-1-eyal.birger@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>