summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)Author
2019-11-09Linux 5.2.22v5.2.22Paul Gortmaker
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-11-09nbd: verify socket is supported during setupMike Christie
commit cf1b2326b734896734c6e167e41766f9cee7686a upstream. nbd requires socket families to support the shutdown method so the nbd recv workqueue can be woken up from its sock_recvmsg call. If the socket does not support the callout we will leave recv works running or get hangs later when the device or module is removed. This adds a check during socket connection/reconnection to make sure the socket being passed in supports the needed callout. Reported-by: syzbot+24c12fa8d218ed26011a@syzkaller.appspotmail.com Fixes: e9e006f5fcf2 ("nbd: fix max number of supported devs") Tested-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-11-09USB: usblp: fix use-after-free on disconnectJohan Hovold
commit 7a759197974894213621aa65f0571b51904733d6 upstream. A recent commit addressing a runtime PM use-count regression, introduced a use-after-free by not making sure we held a reference to the struct usb_interface for the lifetime of the driver data. Fixes: 9a31535859bf ("USB: usblp: fix runtime PM after driver unbind") Cc: stable <stable@vger.kernel.org> Reported-by: syzbot+cd24df4d075c319ebfc5@syzkaller.appspotmail.com Signed-off-by: Johan Hovold <johan@kernel.org> Link: https://lore.kernel.org/r/20191015175522.18490-1-johan@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-11-09USB: legousbtower: fix a signedness bug in tower_probe()Dan Carpenter
commit fd47a417e75e2506eb3672ae569b1c87e3774155 upstream. The problem is that sizeof() is unsigned long so negative error codes are type promoted to high positive values and the condition becomes false. Fixes: 1d427be4a39d ("USB: legousbtower: fix slab info leak at probe") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Johan Hovold <johan@kernel.org> Link: https://lore.kernel.org/r/20191011141115.GA4521@mwanda Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-11-09CIFS: Fix use after free of file info structuresPavel Shilovsky
commit 1a67c415965752879e2e9fad407bc44fc7f25f23 upstream. Currently the code assumes that if a file info entry belongs to lists of open file handles of an inode and a tcon then it has non-zero reference. The recent changes broke that assumption when putting the last reference of the file info. There may be a situation when a file is being deleted but nothing prevents another thread to reference it again and start using it. This happens because we do not hold the inode list lock while checking the number of references of the file info structure. Fix this by doing the proper locking when doing the check. Fixes: 487317c99477d ("cifs: add spinlock for the openFileList to cifsInodeInfo") Fixes: cb248819d209d ("cifs: use cifsInodeInfo->open_file_lock while iterating to avoid a panic") Cc: Stable <stable@vger.kernel.org> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-11-09scsi: mpt3sas_ctl: fix double-fetch bug in _ctl_ioctl_main()Gen Zhang
commit f9e3ebeea4521652318af903cddeaf033527e93e upstream. In _ctl_ioctl_main(), 'ioctl_header' is fetched the first time from userspace. 'ioctl_header.ioc_number' is then checked. The legal result is saved to 'ioc'. Then, in condition MPT3COMMAND, the whole struct is fetched again from the userspace. Then _ctl_do_mpt_command() is called, 'ioc' and 'karg' as inputs. However, a malicious user can change the 'ioc_number' between the two fetches, which will cause a potential security issues. Moreover, a malicious user can provide a valid 'ioc_number' to pass the check in first fetch, and then modify it in the second fetch. To fix this, we need to recheck the 'ioc_number' in the second fetch. Signed-off-by: Gen Zhang <blackgod016574@gmail.com> Acked-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-11-09wcd9335: fix a incorrect use of kstrndup()Gen Zhang
commit a54988113985ca22e414e132054f234fc8a92604 upstream. In wcd9335_codec_enable_dec(), 'widget_name' is allocated by kstrndup(). However, according to doc: "Note: Use kmemdup_nul() instead if the size is known exactly." So we should use kmemdup_nul() here instead of kstrndup(). Signed-off-by: Gen Zhang <blackgod016574@gmail.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-11-09clk-sunxi: fix a missing-check bug in sunxi_divs_clk_setup()Gen Zhang
commit fcdf445ff42f036d22178b49cf64e92d527c1330 upstream. In sunxi_divs_clk_setup(), 'derived_name' is allocated by kstrndup(). It returns NULL when fails. 'derived_name' should be checked. Signed-off-by: Gen Zhang <blackgod016574@gmail.com> Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-11-09cfg80211: wext: avoid copying malformed SSIDsWill Deacon
commit 4ac2813cc867ae563a1ba5a9414bfb554e5796fa upstream. Ensure the SSID element is bounds-checked prior to invoking memcpy() with its length field, when copying to userspace. Cc: <stable@vger.kernel.org> Cc: Kees Cook <keescook@chromium.org> Reported-by: Nicolas Waisman <nico@semmle.com> Signed-off-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20191004095132.15777-2-will@kernel.org [adjust commit log a bit] Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-11-09rtlwifi: Fix potential overflow on P2P codeLaura Abbott
commit 8c55dedb795be8ec0cf488f98c03a1c2176f7fb1 upstream. Nicolas Waisman noticed that even though noa_len is checked for a compatible length it's still possible to overrun the buffers of p2pinfo since there's no check on the upper bound of noa_num. Bound noa_num against P2P_MAX_NOA_NUM. Reported-by: Nicolas Waisman <nico@semmle.com> Signed-off-by: Laura Abbott <labbott@redhat.com> Acked-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-11-09io_uring: only flush workqueues on fileset removalJens Axboe
commit 8a99734081775c012a4a6c442fdef0379fe52bdf upstream. We should not remove the workqueue, we just need to ensure that the workqueues are synced. The workqueues are torn down on ctx removal. Cc: stable@vger.kernel.org Fixes: 6b06314c47e1 ("io_uring: add file set registration") Reported-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [PG: use 5.3.x-stable version for this v5.2.x codebase.] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-11-09x86/asm: Fix MWAITX C-state hint valueJanakarajan Natarajan
commit 454de1e7d970d6bc567686052329e4814842867c upstream. As per "AMD64 Architecture Programmer's Manual Volume 3: General-Purpose and System Instructions", MWAITX EAX[7:4]+1 specifies the optional hint of the optimized C-state. For C0 state, EAX[7:4] should be set to 0xf. Currently, a value of 0xf is set for EAX[3:0] instead of EAX[7:4]. Fix this by changing MWAITX_DISABLE_CSTATES from 0xf to 0xf0. This hasn't had any implications so far because setting reserved bits in EAX is simply ignored by the CPU. [ bp: Fixup comment in delay_mwaitx() and massage. ] Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "x86@kernel.org" <x86@kernel.org> Cc: Zhenzhong Duan <zhenzhong.duan@oracle.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20191007190011.4859-1-Janakarajan.Natarajan@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-11-09mtd: rawnand: au1550nd: Fix au_read_buf16() prototypePaul Burton
commit df8fed831cbcdce7b283b2d9c1aadadcf8940d05 upstream. Commit 7e534323c416 ("mtd: rawnand: Pass a nand_chip object to chip->read_xxx() hooks") modified the prototype of the struct nand_chip read_buf function pointer. In the au1550nd driver we have 2 implementations of read_buf. The previously mentioned commit modified the au_read_buf() implementation to match the function pointer, but not au_read_buf16(). This results in a compiler warning for MIPS db1xxx_defconfig builds: drivers/mtd/nand/raw/au1550nd.c:443:57: warning: pointer type mismatch in conditional expression Fix this by updating the prototype of au_read_buf16() to take a struct nand_chip pointer as its first argument, as is expected after commit 7e534323c416 ("mtd: rawnand: Pass a nand_chip object to chip->read_xxx() hooks"). Note that this shouldn't have caused any functional issues at runtime, since the offset of the struct mtd_info within struct nand_chip is 0 making mtd_to_nand() effectively a type-cast. Signed-off-by: Paul Burton <paul.burton@mips.com> Fixes: 7e534323c416 ("mtd: rawnand: Pass a nand_chip object to chip->read_xxx() hooks") Cc: stable@vger.kernel.org # v4.20+ Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-11-09hwmon: Fix HWMON_P_MIN_ALARM maskNuno Sá
commit 30945d31e5761436d9eba6b8cff468a5f7c9c266 upstream. Both HWMON_P_MIN_ALARM and HWMON_P_MAX_ALARM were using BIT(hwmon_power_max_alarm). Fixes: aa7f29b07c870 ("hwmon: Add support for power min, lcrit, min_alarm and lcrit_alarm") CC: <stable@vger.kernel.org> Signed-off-by: Nuno Sá <nuno.sa@analog.com> Link: https://lore.kernel.org/r/20190924124945.491326-2-nuno.sa@analog.com Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-11-09tracing: Get trace_array reference for available_tracers filesSteven Rostedt (VMware)
commit 194c2c74f5532e62c218adeb8e2b683119503907 upstream. As instances may have different tracers available, we need to look at the trace_array descriptor that shows the list of the available tracers for the instance. But there's a race between opening the file and an admin deleting the instance. The trace_array_get() needs to be called before accessing the trace_array. Cc: stable@vger.kernel.org Fixes: 607e2ea167e56 ("tracing: Set up infrastructure to allow tracers for instances") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-11-09ftrace: Get a reference counter for the trace_array on filter filesSteven Rostedt (VMware)
commit 9ef16693aff8137faa21d16ffe65bb9832d24d71 upstream. The ftrace set_ftrace_filter and set_ftrace_notrace files are specific for an instance now. They need to take a reference to the instance otherwise there could be a race between accessing the files and deleting the instance. It wasn't until the :mod: caching where these file operations started referencing the trace_array directly. Cc: stable@vger.kernel.org Fixes: 673feb9d76ab3 ("ftrace: Add :mod: caching infrastructure to trace_array") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-11-09tracing/hwlat: Don't ignore outer-loop duration when calculating max_latencySrivatsa S. Bhat (VMware)
commit fc64e4ad80d4b72efce116f87b3174f0b7196f8e upstream. max_latency is intended to record the maximum ever observed hardware latency, which may occur in either part of the loop (inner/outer). So we need to also consider the outer-loop sample when updating max_latency. Link: http://lkml.kernel.org/r/157073345463.17189.18124025522664682811.stgit@srivatsa-ubuntu Fixes: e7c15cd8a113 ("tracing: Added hardware latency tracer") Cc: stable@vger.kernel.org Signed-off-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-11-09tracing/hwlat: Report total time spent in all NMIs during the sampleSrivatsa S. Bhat (VMware)
commit 98dc19c11470ee6048aba723d77079ad2cda8a52 upstream. nmi_total_ts is supposed to record the total time spent in *all* NMIs that occur on the given CPU during the (active portion of the) sampling window. However, the code seems to be overwriting this variable for each NMI, thereby only recording the time spent in the most recent NMI. Fix it by accumulating the duration instead. Link: http://lkml.kernel.org/r/157073343544.17189.13911783866738671133.stgit@srivatsa-ubuntu Fixes: 7b2c86250122 ("tracing: Add NMI tracing in hwlat detector") Cc: stable@vger.kernel.org Signed-off-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-11-09arm64/sve: Fix wrong free for task->thread.sve_stateMasayoshi Mizuma
commit 4585fc59c0e813188d6a4c5de1f6976fce461fc2 upstream. The system which has SVE feature crashed because of the memory pointed by task->thread.sve_state was destroyed by someone. That is because sve_state is freed while the forking the child process. The child process has the pointer of sve_state which is same as the parent's because the child's task_struct is copied from the parent's one. If the copy_process() fails as an error on somewhere, for example, copy_creds(), then the sve_state is freed even if the parent is alive. The flow is as follows. copy_process p = dup_task_struct => arch_dup_task_struct *dst = *src; // copy the entire region. : retval = copy_creds if (retval < 0) goto bad_fork_free; : bad_fork_free: ... delayed_free_task(p); => free_task => arch_release_task_struct => fpsimd_release_task => __sve_free => kfree(task->thread.sve_state); // free the parent's sve_state Move child's sve_state = NULL and clearing TIF_SVE flag to arch_dup_task_struct() so that the child doesn't free the parent's one. There is no need to wait until copy_process() to clear TIF_SVE for dst, because the thread flags for dst are initialized already by copying the src task_struct. This change simplifies the code, so get rid of comments that are no longer needed. As a note, arm64 used to have thread_info on the stack. So it would not be possible to clear TIF_SVE until the stack is initialized. From commit c02433dd6de3 ("arm64: split thread_info from task stack"), the thread_info is part of the task, so it should be valid to modify the flag from arch_dup_task_struct(). Cc: stable@vger.kernel.org # 4.15.x- Fixes: bc0ee4760364 ("arm64/sve: Core task context handling") Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Reported-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Suggested-by: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Dave Martin <Dave.Martin@arm.com> Tested-by: Julien Grall <julien.grall@arm.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-11-09media: stkwebcam: fix runtime PM after driver unbindJohan Hovold
commit 30045f2174aab7fb4db7a9cf902d0aa6c75856a7 upstream. Since commit c2b71462d294 ("USB: core: Fix bug caused by duplicate interface PM usage counter") USB drivers must always balance their runtime PM gets and puts, including when the driver has already been unbound from the interface. Leaving the interface with a positive PM usage counter would prevent a later bound driver from suspending the device. Note that runtime PM has never actually been enabled for this driver since the support_autosuspend flag in its usb_driver struct is not set. Fixes: c2b71462d294 ("USB: core: Fix bug caused by duplicate interface PM usage counter") Cc: stable <stable@vger.kernel.org> Acked-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Johan Hovold <johan@kernel.org> Link: https://lore.kernel.org/r/20191001084908.2003-5-johan@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-11-09drm/i915: Mark contents as dirty on a write faultChris Wilson
commit b925708f28c2b7a3a362d709bd7f77bc75c1daac upstream. Since dropping the set-to-gtt-domain in commit a679f58d0510 ("drm/i915: Flush pages on acquisition"), we no longer mark the contents as dirty on a write fault. This has the issue of us then not marking the pages as dirty on releasing the buffer, which means the contents are not written out to the swap device (should we ever pick that buffer as a victim). Notably, this is visible in the dumb buffer interface used for cursors. Having updated the cursor contents via mmap, and swapped away, if the shrinker should evict the old cursor, upon next reuse, the cursor would be invisible. E.g. echo 80 > /proc/sys/kernel/sysrq ; echo f > /proc/sysrq-trigger Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111541 Fixes: a679f58d0510 ("drm/i915: Flush pages on acquisition") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.william.auld@gmail.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: <stable@vger.kernel.org> # v5.2+ Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190920121821.7223-1-chris@chris-wilson.co.uk (cherry picked from commit 5028851cdfdf78dc22eacbc44a0ab0b3f599ee4a) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-11-09drm/i915: Whitelist COMMON_SLICE_CHICKEN2Kenneth Graunke
commit 282b7fd5f5ab4eba499e1162c1e2802c6d0bb82e upstream. This allows userspace to use "legacy" mode for push constants, where they are committed at 3DPRIMITIVE or flush time, rather than being committed at 3DSTATE_BINDING_TABLE_POINTERS_XS time. Gen6-8 and Gen11 both use the "legacy" behavior - only Gen9 works in the "new" way. Conflating push constants with binding tables is painful for userspace, we would like to be able to avoid doing so. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Cc: stable@vger.kernel.org Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190911014801.26821-1-kenneth@whitecape.org (cherry picked from commit 0606259e3b3a1220a0f04a92a1654a3f674f47ee) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-11-09Fix the locking in dcache_readdir() and friendsAl Viro
commit d4f4de5e5ef8efde85febb6876cd3c8ab1631999 upstream. There are two problems in dcache_readdir() - one is that lockless traversal of the list needs non-trivial cooperation of d_alloc() (at least a switch to list_add_rcu(), and probably more than just that) and another is that it assumes that no removal will happen without the directory locked exclusive. Said assumption had always been there, never had been stated explicitly and is violated by several places in the kernel (devpts and selinuxfs). * replacement of next_positive() with different calling conventions: it returns struct list_head * instead of struct dentry *; the latter is passed in and out by reference, grabbing the result and dropping the original value. * scan is under ->d_lock. If we run out of timeslice, cursor is moved after the last position we'd reached and we reschedule; then the scan continues from that place. To avoid livelocks between multiple lseek() (with cursors getting moved past each other, never reaching the real entries) we always skip the cursors, need_resched() or not. * returned list_head * is either ->d_child of dentry we'd found or ->d_subdirs of parent (if we got to the end of the list). * dcache_readdir() and dcache_dir_lseek() switched to new helper. dcache_readdir() always holds a reference to dentry passed to dir_emit() now. Cursor is moved to just before the entry where dir_emit() has failed or into the very end of the list, if we'd run out. * move_cursor() eliminated - it had sucky calling conventions and after fixing that it became simply list_move() (in lseek and scan_positives) or list_move_tail() (in readdir). All operations with the list are under ->d_lock now, and we do not depend upon having all file removals done with parent locked exclusive anymore. Cc: stable@vger.kernel.org Reported-by: "zhengbin (A)" <zhengbin13@huawei.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-11-09arm64: topology: Use PPTT to determine if PE is a threadJeremy Linton
commit 98dc19902a0b2e5348e43d6a2c39a0a7d0fc639e upstream. ACPI 6.3 adds a thread flag to represent if a CPU/PE is actually a thread. Given that the MPIDR_MT bit may not represent this information consistently on homogeneous machines we should prefer the PPTT flag if its available. Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Reviewed-by: Robert Richter <rrichter@marvell.com> [will: made acpi_cpu_is_threaded() return 'bool'] Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-11-09ACPI/PPTT: Add support for ACPI 6.3 thread flagJeremy Linton
commit bbd1b70639f785a970d998f35155c713f975e3ac upstream. ACPI 6.3 adds a flag to the CPU node to indicate whether the given PE is a thread. Add a function to return that information for a given linux logical CPU. Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Reviewed-by: Robert Richter <rrichter@marvell.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-11-09RDMA/vmw_pvrdma: Free SRQ only onceAdit Ranadive
commit 18545e8b6871d21aa3386dc42867138da9948a33 upstream. An extra kfree cleanup was missed since these are now deallocated by core. Link: https://lore.kernel.org/r/1568848066-12449-1-git-send-email-aditr@vmware.com Cc: <stable@vger.kernel.org> Fixes: 68e326dea1db ("RDMA: Handle SRQ allocations by IB/core") Signed-off-by: Adit Ranadive <aditr@vmware.com> Reviewed-by: Vishnu Dasa <vdasa@vmware.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-11-09MIPS: elf_hwcap: Export userspace ASEsJiaxun Yang
commit 38dffe1e4dde1d3174fdce09d67370412843ebb5 upstream. A Golang developer reported MIPS hwcap isn't reflecting instructions that the processor actually supported so programs can't apply optimized code at runtime. Thus we export the ASEs that can be used in userspace programs. Reported-by: Meng Zhuo <mengzhuo1203@gmail.com> Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Cc: linux-mips@vger.kernel.org Cc: Paul Burton <paul.burton@mips.com> Cc: <stable@vger.kernel.org> # 4.14+ Signed-off-by: Paul Burton <paul.burton@mips.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-11-09MIPS: Disable Loongson MMI instructions for kernel buildPaul Burton
commit 2f2b4fd674cadd8c6b40eb629e140a14db4068fd upstream. GCC 9.x automatically enables support for Loongson MMI instructions when using some -march= flags, and then errors out when -msoft-float is specified with: cc1: error: ‘-mloongson-mmi’ must be used with ‘-mhard-float’ The kernel shouldn't be using these MMI instructions anyway, just as it doesn't use floating point instructions. Explicitly disable them in order to fix the build with GCC 9.x. Signed-off-by: Paul Burton <paul.burton@mips.com> Fixes: 3702bba5eb4f ("MIPS: Loongson: Add GCC 4.4 support for Loongson2E") Fixes: 6f7a251a259e ("MIPS: Loongson: Add basic Loongson 2F support") Fixes: 5188129b8c9f ("MIPS: Loongson-3: Improve -march option and move it to Platform") Cc: Huacai Chen <chenhc@lemote.com> Cc: Jiaxun Yang <jiaxun.yang@flygoat.com> Cc: stable@vger.kernel.org # v2.6.32+ Cc: linux-mips@vger.kernel.org Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-11-09NFS: Fix O_DIRECT accounting of number of bytes read/writtenTrond Myklebust
commit 031d73ed768a40684f3ca21992265ffdb6a270bf upstream. When a series of O_DIRECT reads or writes are truncated, either due to eof or due to an error, then we should return the number of contiguous bytes that were received/sent starting at the offset specified by the application. Currently, we are failing to correctly check contiguity, and so we're failing the generic/465 in xfstests when the race between the read and write RPCs causes the file to get extended while the 2 reads are outstanding. If the first read RPC call wins the race and returns with eof set, we should treat the second read RPC as being truncated. Reported-by: Su Yanjun <suyj.fnst@cn.fujitsu.com> Fixes: 1ccbad9f9f9bd ("nfs: fix DIO good bytes calculation") Cc: stable@vger.kernel.org # 4.1+ Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-11-09btrfs: fix uninitialized ret in ref-verifyJosef Bacik
commit c5f4987e86f6692fdb12533ea1fc7a7bb98e555a upstream. Coverity caught a case where we could return with a uninitialized value in ret in process_leaf. This is actually pretty likely because we could very easily run into a block group item key and have a garbage value in ret and think there was an errror. Fix this by initializing ret to 0. Reported-by: Colin Ian King <colin.king@canonical.com> Fixes: fd708b81d972 ("Btrfs: add a extent ref verify tool") CC: stable@vger.kernel.org # 4.19+ Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-11-09btrfs: fix incorrect updating of log root treeJosef Bacik
commit 4203e968947071586a98b5314fd7ffdea3b4f971 upstream. We've historically had reports of being unable to mount file systems because the tree log root couldn't be read. Usually this is the "parent transid failure", but could be any of the related errors, including "fsid mismatch" or "bad tree block", depending on which block got allocated. The modification of the individual log root items are serialized on the per-log root root_mutex. This means that any modification to the per-subvol log root_item is completely protected. However we update the root item in the log root tree outside of the log root tree log_mutex. We do this in order to allow multiple subvolumes to be updated in each log transaction. This is problematic however because when we are writing the log root tree out we update the super block with the _current_ log root node information. Since these two operations happen independently of each other, you can end up updating the log root tree in between writing out the dirty blocks and setting the super block to point at the current root. This means we'll point at the new root node that hasn't been written out, instead of the one we should be pointing at. Thus whatever garbage or old block we end up pointing at complains when we mount the file system later and try to replay the log. Fix this by copying the log's root item into a local root item copy. Then once we're safely under the log_root_tree->log_mutex we update the root item in the log_root_tree. This way we do not modify the log_root_tree while we're committing it, fixing the problem. CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Chris Mason <clm@fb.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-11-09Btrfs: fix memory leak due to concurrent append writes with fiemapFilipe Manana
commit c67d970f0ea8dcc423e112137d34334fa0abb8ec upstream. When we have a buffered write that starts at an offset greater than or equals to the file's size happening concurrently with a full ranged fiemap, we can end up leaking an extent state structure. Suppose we have a file with a size of 1Mb, and before the buffered write and fiemap are performed, it has a single extent state in its io tree representing the range from 0 to 1Mb, with the EXTENT_DELALLOC bit set. The following sequence diagram shows how the memory leak happens if a fiemap a buffered write, starting at offset 1Mb and with a length of 4Kb, are performed concurrently. CPU 1 CPU 2 extent_fiemap() --> it's a full ranged fiemap range from 0 to LLONG_MAX - 1 (9223372036854775807) --> locks range in the inode's io tree --> after this we have 2 extent states in the io tree: --> 1 for range [0, 1Mb[ with the bits EXTENT_LOCKED and EXTENT_DELALLOC_BITS set --> 1 for the range [1Mb, LLONG_MAX[ with the EXTENT_LOCKED bit set --> start buffered write at offset 1Mb with a length of 4Kb btrfs_file_write_iter() btrfs_buffered_write() --> cached_state is NULL lock_and_cleanup_extent_if_need() --> returns 0 and does not lock range because it starts at current i_size / eof --> cached_state remains NULL btrfs_dirty_pages() btrfs_set_extent_delalloc() (...) __set_extent_bit() --> splits extent state for range [1Mb, LLONG_MAX[ and now we have 2 extent states: --> one for the range [1Mb, 1Mb + 4Kb[ with EXTENT_LOCKED set --> another one for the range [1Mb + 4Kb, LLONG_MAX[ with EXTENT_LOCKED set as well --> sets EXTENT_DELALLOC on the extent state for the range [1Mb, 1Mb + 4Kb[ --> caches extent state [1Mb, 1Mb + 4Kb[ into @cached_state because it has the bit EXTENT_LOCKED set --> btrfs_buffered_write() ends up with a non-NULL cached_state and never calls anything to release its reference on it, resulting in a memory leak Fix this by calling free_extent_state() on cached_state if the range was not locked by lock_and_cleanup_extent_if_need(). The same issue can happen if anything else other than fiemap locks a range that covers eof and beyond. This could be triggered, sporadically, by test case generic/561 from the fstests suite, which makes duperemove run concurrently with fsstress, and duperemove does plenty of calls to fiemap. When CONFIG_BTRFS_DEBUG is set the leak is reported in dmesg/syslog when removing the btrfs module with a message like the following: [77100.039461] BTRFS: state leak: start 6574080 end 6582271 state 16402 in tree 0 refs 1 Otherwise (CONFIG_BTRFS_DEBUG not set) detectable with kmemleak. CC: stable@vger.kernel.org # 4.16+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-11-09btrfs: fix balance convert to single on 32-bit host CPUsZygo Blaxell
commit 7a54789074a54f64addf5b49bf1994f478337a83 upstream. Currently, the command: btrfs balance start -dconvert=single,soft . on a Raspberry Pi produces the following kernel message: BTRFS error (device mmcblk0p2): balance: invalid convert data profile single This fails because we use is_power_of_2(unsigned long) to validate the new data profile, the constant for 'single' profile uses bit 48, and there are only 32 bits in a long on ARM. Fix by open-coding the check using u64 variables. Tested by completing the original balance command on several Raspberry Pis. Fixes: 818255feece6 ("btrfs: use common helper instead of open coding a bit test") CC: stable@vger.kernel.org # 4.20+ Signed-off-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-11-09btrfs: allocate new inode in NOFS contextJosef Bacik
commit 11a19a90870ea5496a8ded69b86f5b476b6d3355 upstream. A user reported a lockdep splat ====================================================== WARNING: possible circular locking dependency detected 5.2.11-gentoo #2 Not tainted ------------------------------------------------------ kswapd0/711 is trying to acquire lock: 000000007777a663 (sb_internal){.+.+}, at: start_transaction+0x3a8/0x500 but task is already holding lock: 000000000ba86300 (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x0/0x30 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (fs_reclaim){+.+.}: kmem_cache_alloc+0x1f/0x1c0 btrfs_alloc_inode+0x1f/0x260 alloc_inode+0x16/0xa0 new_inode+0xe/0xb0 btrfs_new_inode+0x70/0x610 btrfs_symlink+0xd0/0x420 vfs_symlink+0x9c/0x100 do_symlinkat+0x66/0xe0 do_syscall_64+0x55/0x1c0 entry_SYSCALL_64_after_hwframe+0x49/0xbe -> #0 (sb_internal){.+.+}: __sb_start_write+0xf6/0x150 start_transaction+0x3a8/0x500 btrfs_commit_inode_delayed_inode+0x59/0x110 btrfs_evict_inode+0x19e/0x4c0 evict+0xbc/0x1f0 inode_lru_isolate+0x113/0x190 __list_lru_walk_one.isra.4+0x5c/0x100 list_lru_walk_one+0x32/0x50 prune_icache_sb+0x36/0x80 super_cache_scan+0x14a/0x1d0 do_shrink_slab+0x131/0x320 shrink_node+0xf7/0x380 balance_pgdat+0x2d5/0x640 kswapd+0x2ba/0x5e0 kthread+0x147/0x160 ret_from_fork+0x24/0x30 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(fs_reclaim); lock(sb_internal); lock(fs_reclaim); lock(sb_internal); *** DEADLOCK *** 3 locks held by kswapd0/711: #0: 000000000ba86300 (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x0/0x30 #1: 000000004a5100f8 (shrinker_rwsem){++++}, at: shrink_node+0x9a/0x380 #2: 00000000f956fa46 (&type->s_umount_key#30){++++}, at: super_cache_scan+0x35/0x1d0 stack backtrace: CPU: 7 PID: 711 Comm: kswapd0 Not tainted 5.2.11-gentoo #2 Hardware name: Dell Inc. Precision Tower 3620/0MWYPT, BIOS 2.4.2 09/29/2017 Call Trace: dump_stack+0x85/0xc7 print_circular_bug.cold.40+0x1d9/0x235 __lock_acquire+0x18b1/0x1f00 lock_acquire+0xa6/0x170 ? start_transaction+0x3a8/0x500 __sb_start_write+0xf6/0x150 ? start_transaction+0x3a8/0x500 start_transaction+0x3a8/0x500 btrfs_commit_inode_delayed_inode+0x59/0x110 btrfs_evict_inode+0x19e/0x4c0 ? var_wake_function+0x20/0x20 evict+0xbc/0x1f0 inode_lru_isolate+0x113/0x190 ? discard_new_inode+0xc0/0xc0 __list_lru_walk_one.isra.4+0x5c/0x100 ? discard_new_inode+0xc0/0xc0 list_lru_walk_one+0x32/0x50 prune_icache_sb+0x36/0x80 super_cache_scan+0x14a/0x1d0 do_shrink_slab+0x131/0x320 shrink_node+0xf7/0x380 balance_pgdat+0x2d5/0x640 kswapd+0x2ba/0x5e0 ? __wake_up_common_lock+0x90/0x90 kthread+0x147/0x160 ? balance_pgdat+0x640/0x640 ? __kthread_create_on_node+0x160/0x160 ret_from_fork+0x24/0x30 This is because btrfs_new_inode() calls new_inode() under the transaction. We could probably move the new_inode() outside of this but for now just wrap it in memalloc_nofs_save(). Reported-by: Zdenek Sojka <zsojka@seznam.cz> Fixes: 712e36c5f2a7 ("btrfs: use GFP_KERNEL in btrfs_alloc_inode") CC: stable@vger.kernel.org # 4.16+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-11-09btrfs: relocation: fix use-after-free on dead relocation rootsQu Wenruo
commit 1fac4a54374f7ef385938f3c6cf7649c0fe4f6cd upstream. [BUG] One user reported a reproducible KASAN report about use-after-free: BTRFS info (device sdi1): balance: start -dvrange=1256811659264..1256811659265 BTRFS info (device sdi1): relocating block group 1256811659264 flags data|raid0 ================================================================== BUG: KASAN: use-after-free in btrfs_init_reloc_root+0x2cd/0x340 [btrfs] Write of size 8 at addr ffff88856f671710 by task kworker/u24:10/261579 CPU: 2 PID: 261579 Comm: kworker/u24:10 Tainted: P OE 5.2.11-arch1-1-kasan #4 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X99 Extreme4, BIOS P3.80 04/06/2018 Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs] Call Trace: dump_stack+0x7b/0xba print_address_description+0x6c/0x22e ? btrfs_init_reloc_root+0x2cd/0x340 [btrfs] __kasan_report.cold+0x1b/0x3b ? btrfs_init_reloc_root+0x2cd/0x340 [btrfs] kasan_report+0x12/0x17 __asan_report_store8_noabort+0x17/0x20 btrfs_init_reloc_root+0x2cd/0x340 [btrfs] record_root_in_trans+0x2a0/0x370 [btrfs] btrfs_record_root_in_trans+0xf4/0x140 [btrfs] start_transaction+0x1ab/0xe90 [btrfs] btrfs_join_transaction+0x1d/0x20 [btrfs] btrfs_finish_ordered_io+0x7bf/0x18a0 [btrfs] ? lock_repin_lock+0x400/0x400 ? __kmem_cache_shutdown.cold+0x140/0x1ad ? btrfs_unlink_subvol+0x9b0/0x9b0 [btrfs] finish_ordered_fn+0x15/0x20 [btrfs] normal_work_helper+0x1bd/0xca0 [btrfs] ? process_one_work+0x819/0x1720 ? kasan_check_read+0x11/0x20 btrfs_endio_write_helper+0x12/0x20 [btrfs] process_one_work+0x8c9/0x1720 ? pwq_dec_nr_in_flight+0x2f0/0x2f0 ? worker_thread+0x1d9/0x1030 worker_thread+0x98/0x1030 kthread+0x2bb/0x3b0 ? process_one_work+0x1720/0x1720 ? kthread_park+0x120/0x120 ret_from_fork+0x35/0x40 Allocated by task 369692: __kasan_kmalloc.part.0+0x44/0xc0 __kasan_kmalloc.constprop.0+0xba/0xc0 kasan_kmalloc+0x9/0x10 kmem_cache_alloc_trace+0x138/0x260 btrfs_read_tree_root+0x92/0x360 [btrfs] btrfs_read_fs_root+0x10/0xb0 [btrfs] create_reloc_root+0x47d/0xa10 [btrfs] btrfs_init_reloc_root+0x1e2/0x340 [btrfs] record_root_in_trans+0x2a0/0x370 [btrfs] btrfs_record_root_in_trans+0xf4/0x140 [btrfs] start_transaction+0x1ab/0xe90 [btrfs] btrfs_start_transaction+0x1e/0x20 [btrfs] __btrfs_prealloc_file_range+0x1c2/0xa00 [btrfs] btrfs_prealloc_file_range+0x13/0x20 [btrfs] prealloc_file_extent_cluster+0x29f/0x570 [btrfs] relocate_file_extent_cluster+0x193/0xc30 [btrfs] relocate_data_extent+0x1f8/0x490 [btrfs] relocate_block_group+0x600/0x1060 [btrfs] btrfs_relocate_block_group+0x3a0/0xa00 [btrfs] btrfs_relocate_chunk+0x9e/0x180 [btrfs] btrfs_balance+0x14e4/0x2fc0 [btrfs] btrfs_ioctl_balance+0x47f/0x640 [btrfs] btrfs_ioctl+0x119d/0x8380 [btrfs] do_vfs_ioctl+0x9f5/0x1060 ksys_ioctl+0x67/0x90 __x64_sys_ioctl+0x73/0xb0 do_syscall_64+0xa5/0x370 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Freed by task 369692: __kasan_slab_free+0x14f/0x210 kasan_slab_free+0xe/0x10 kfree+0xd8/0x270 btrfs_drop_snapshot+0x154c/0x1eb0 [btrfs] clean_dirty_subvols+0x227/0x340 [btrfs] relocate_block_group+0x972/0x1060 [btrfs] btrfs_relocate_block_group+0x3a0/0xa00 [btrfs] btrfs_relocate_chunk+0x9e/0x180 [btrfs] btrfs_balance+0x14e4/0x2fc0 [btrfs] btrfs_ioctl_balance+0x47f/0x640 [btrfs] btrfs_ioctl+0x119d/0x8380 [btrfs] do_vfs_ioctl+0x9f5/0x1060 ksys_ioctl+0x67/0x90 __x64_sys_ioctl+0x73/0xb0 do_syscall_64+0xa5/0x370 entry_SYSCALL_64_after_hwframe+0x44/0xa9 The buggy address belongs to the object at ffff88856f671100 which belongs to the cache kmalloc-4k of size 4096 The buggy address is located 1552 bytes inside of 4096-byte region [ffff88856f671100, ffff88856f672100) The buggy address belongs to the page: page:ffffea0015bd9c00 refcount:1 mapcount:0 mapping:ffff88864400e600 index:0x0 compound_mapcount: 0 flags: 0x2ffff0000010200(slab|head) raw: 02ffff0000010200 dead000000000100 dead000000000200 ffff88864400e600 raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88856f671600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88856f671680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff88856f671700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff88856f671780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88856f671800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== BTRFS info (device sdi1): 1 enospc errors during balance BTRFS info (device sdi1): balance: ended with status: -28 [CAUSE] The problem happens when finish_ordered_io() get called with balance still running, while the reloc root of that subvolume is already dead. (Tree is swap already done, but tree not yet deleted for possible qgroup usage.) That means root->reloc_root still exists, but that reloc_root can be under btrfs_drop_snapshot(), thus we shouldn't access it. The following race could cause the use-after-free problem: CPU1 | CPU2 -------------------------------------------------------------------------- | relocate_block_group() | |- unset_reloc_control(rc) | |- btrfs_commit_transaction() btrfs_finish_ordered_io() | |- clean_dirty_subvols() |- btrfs_join_transaction() | | |- record_root_in_trans() | | |- btrfs_init_reloc_root() | | |- if (root->reloc_root) | | | | |- root->reloc_root = NULL | | |- btrfs_drop_snapshot(reloc_root); |- reloc_root->last_trans| = trans->transid | ^^^^^^^^^^^^^^^^^^^^^^ Use after free [FIX] Fix it by the following modifications: - Test if the root has dead reloc tree before accessing root->reloc_root If the root has BTRFS_ROOT_DEAD_RELOC_TREE, then we don't need to create or update root->reloc_tree - Clear the BTRFS_ROOT_DEAD_RELOC_TREE flag until we have fully dropped reloc tree To co-operate with above modification, so as long as BTRFS_ROOT_DEAD_RELOC_TREE is still set, we won't try to re-create reloc tree at record_root_in_trans(). Reported-by: Cebtenzzre <cebtenzzre@gmail.com> Fixes: d2311e698578 ("btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots") CC: stable@vger.kernel.org # 5.1+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-11-09firmware: google: increment VPD key_len properlyBrian Norris
commit 442f1e746e8187b9deb1590176f6b0ff19686b11 upstream. Commit 4b708b7b1a2c ("firmware: google: check if size is valid when decoding VPD data") adds length checks, but the new vpd_decode_entry() function botched the logic -- it adds the key length twice, instead of adding the key and value lengths separately. On my local system, this means vpd.c's vpd_section_create_attribs() hits an error case after the first attribute it parses, since it's no longer looking at the correct offset. With this patch, I'm back to seeing all the correct attributes in /sys/firmware/vpd/... Fixes: 4b708b7b1a2c ("firmware: google: check if size is valid when decoding VPD data") Cc: <stable@vger.kernel.org> Cc: Hung-Te Lin <hungte@chromium.org> Signed-off-by: Brian Norris <briannorris@chromium.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Guenter Roeck <groeck@chromium.org> Link: https://lore.kernel.org/r/20190930214522.240680-1-briannorris@chromium.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-11-09IB/core: Fix wrong iterating on portsMohamad Heib
commit 1cbe866cbcb53338de33cf67262e73f9315a9725 upstream. rdma_for_each_port is already incrementing the iterator's value it receives therefore, after the first iteration the iterator is increased by 2 which eventually causing wrong queries and possible traces. Fix the above by removing the old redundant incrementation that was used before rdma_for_each_port() macro. Cc: <stable@vger.kernel.org> Fixes: ea1075edcbab ("RDMA: Add and use rdma_for_each_port") Link: https://lore.kernel.org/r/20191002122127.17571-1-leon@kernel.org Signed-off-by: Mohamad Heib <mohamadh@mellanox.com> Reviewed-by: Erez Alfasi <ereza@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-11-09mm/vmpressure.c: fix a signedness bug in vmpressure_register_event()Dan Carpenter
commit 518a86713078168acd67cf50bc0b45d54b4cce6c upstream. The "mode" and "level" variables are enums and in this context GCC will treat them as unsigned ints so the error handling is never triggered. I also removed the bogus initializer because it isn't required any more and it's sort of confusing. [akpm@linux-foundation.org: reduce implicit and explicit typecasting] [akpm@linux-foundation.org: fix return value, add comment, per Matthew] Link: http://lkml.kernel.org/r/20190925110449.GO3264@mwanda Fixes: 3cadfa2b9497 ("mm/vmpressure.c: convert to use match_string() helper") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: David Rientjes <rientjes@google.com> Reviewed-by: Matthew Wilcox <willy@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Enrico Weigelt <info@metux.net> Cc: Kate Stewart <kstewart@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-11-09mm/page_alloc.c: fix a crash in free_pages_prepare()Qian Cai
commit 234fdce892f905cbc2674349a9eb4873e288e5b3 upstream. On architectures like s390, arch_free_page() could mark the page unused (set_page_unused()) and any access later would trigger a kernel panic. Fix it by moving arch_free_page() after all possible accessing calls. Hardware name: IBM 2964 N96 400 (z/VM 6.4.0) Krnl PSW : 0404e00180000000 0000000026c2b96e (__free_pages_ok+0x34e/0x5d8) R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3 Krnl GPRS: 0000000088d43af7 0000000000484000 000000000000007c 000000000000000f 000003d080012100 000003d080013fc0 0000000000000000 0000000000100000 00000000275cca48 0000000000000100 0000000000000008 000003d080010000 00000000000001d0 000003d000000000 0000000026c2b78a 000000002717fdb0 Krnl Code: 0000000026c2b95c: ec1100b30659 risbgn %r1,%r1,0,179,6 0000000026c2b962: e32014000036 pfd 2,1024(%r1) #0000000026c2b968: d7ff10001000 xc 0(256,%r1),0(%r1) >0000000026c2b96e: 41101100 la %r1,256(%r1) 0000000026c2b972: a737fff8 brctg %r3,26c2b962 0000000026c2b976: d7ff10001000 xc 0(256,%r1),0(%r1) 0000000026c2b97c: e31003400004 lg %r1,832 0000000026c2b982: ebff1430016a asi 5168(%r1),-1 Call Trace: __free_pages_ok+0x16a/0x5d8) memblock_free_all+0x206/0x290 mem_init+0x58/0x120 start_kernel+0x2b0/0x570 startup_continue+0x6a/0xc0 INFO: lockdep is turned off. Last Breaking-Event-Address: __free_pages_ok+0x372/0x5d8 Kernel panic - not syncing: Fatal exception: panic_on_oops 00: HCPGIR450W CP entered; disabled wait PSW 00020001 80000000 00000000 26A2379C In the past, only kernel_poison_pages() would trigger this but it needs "page_poison=on" kernel cmdline, and I suspect nobody tested that on s390. Recently, kernel_init_free_pages() (commit 6471384af2a6 ("mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options")) was added and could trigger this as well. [akpm@linux-foundation.org: add comment] Link: http://lkml.kernel.org/r/1569613623-16820-1-git-send-email-cai@lca.pw Fixes: 8823b1dbc05f ("mm/page_poison.c: enable PAGE_POISONING as a separate option") Fixes: 6471384af2a6 ("mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options") Signed-off-by: Qian Cai <cai@lca.pw> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Duyck <alexander.duyck@gmail.com> Cc: <stable@vger.kernel.org> [5.3+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-11-09mm/z3fold.c: claim page in the beginning of freeVitaly Wool
commit 5b6807de11445c05b537df8324f5d7ab1c2782f9 upstream. There's a really hard to reproduce race in z3fold between z3fold_free() and z3fold_reclaim_page(). z3fold_reclaim_page() can claim the page after z3fold_free() has checked if the page was claimed and z3fold_free() will then schedule this page for compaction which may in turn lead to random page faults (since that page would have been reclaimed by then). Fix that by claiming page in the beginning of z3fold_free() and not forgetting to clear the claim in the end. [vitalywool@gmail.com: v2] Link: http://lkml.kernel.org/r/20190928113456.152742cf@bigdell Link: http://lkml.kernel.org/r/20190926104844.4f0c6efa1366b8f5741eaba9@gmail.com Signed-off-by: Vitaly Wool <vitalywool@gmail.com> Reported-by: Markus Linnala <markus.linnala@gmail.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Henry Burns <henrywolfeburns@gmail.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Markus Linnala <markus.linnala@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-11-09kernel/sysctl.c: do not override max_threads provided by userspaceMichal Hocko
commit b0f53dbc4bc4c371f38b14c391095a3bb8a0bb40 upstream. Partially revert 16db3d3f1170 ("kernel/sysctl.c: threads-max observe limits") because the patch is causing a regression to any workload which needs to override the auto-tuning of the limit provided by kernel. set_max_threads is implementing a boot time guesstimate to provide a sensible limit of the concurrently running threads so that runaways will not deplete all the memory. This is a good thing in general but there are workloads which might need to increase this limit for an application to run (reportedly WebSpher MQ is affected) and that is simply not possible after the mentioned change. It is also very dubious to override an admin decision by an estimation that doesn't have any direct relation to correctness of the kernel operation. Fix this by dropping set_max_threads from sysctl_max_threads so any value is accepted as long as it fits into MAX_THREADS which is important to check because allowing more threads could break internal robust futex restriction. While at it, do not use MIN_THREADS as the lower boundary because it is also only a heuristic for automatic estimation and admin might have a good reason to stop new threads to be created even when below this limit. This became more severe when we switched x86 from 4k to 8k kernel stacks. Starting since 6538b8ea886e ("x86_64: expand kernel stack to 16K") (3.16) we use THREAD_SIZE_ORDER = 2 and that halved the auto-tuned value. In the particular case 3.12 kernel.threads-max = 515561 4.4 kernel.threads-max = 200000 Neither of the two values is really insane on 32GB machine. I am not sure we want/need to tune the max_thread value further. If anything the tuning should be removed altogether if proven not useful in general. But we definitely need a way to override this auto-tuning. Link: http://lkml.kernel.org/r/20190922065801.GB18814@dhcp22.suse.cz Fixes: 16db3d3f1170 ("kernel/sysctl.c: threads-max observe limits") Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Heinrich Schuchardt <xypron.glpk@gmx.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-11-09cifs: use cifsInodeInfo->open_file_lock while iterating to avoid a panicDave Wysochanski
commit cb248819d209d113e45fed459773991518e8e80b upstream. Commit 487317c99477 ("cifs: add spinlock for the openFileList to cifsInodeInfo") added cifsInodeInfo->open_file_lock spin_lock to protect the openFileList, but missed a few places where cifs_inode->openFileList was enumerated. Change these remaining tcon->open_file_lock to cifsInodeInfo->open_file_lock to avoid panic in is_size_safe_to_change. [17313.245641] RIP: 0010:is_size_safe_to_change+0x57/0xb0 [cifs] [17313.245645] Code: 68 40 48 89 ef e8 19 67 b7 f1 48 8b 43 40 48 8d 4b 40 48 8d 50 f0 48 39 c1 75 0f eb 47 48 8b 42 10 48 8d 50 f0 48 39 c1 74 3a <8b> 80 88 00 00 00 83 c0 01 a8 02 74 e6 48 89 ef c6 07 00 0f 1f 40 [17313.245649] RSP: 0018:ffff94ae1baefa30 EFLAGS: 00010202 [17313.245654] RAX: dead000000000100 RBX: ffff88dc72243300 RCX: ffff88dc72243340 [17313.245657] RDX: dead0000000000f0 RSI: 00000000098f7940 RDI: ffff88dd3102f040 [17313.245659] RBP: ffff88dd3102f040 R08: 0000000000000000 R09: ffff94ae1baefc40 [17313.245661] R10: ffffcdc8bb1c4e80 R11: ffffcdc8b50adb08 R12: 00000000098f7940 [17313.245663] R13: ffff88dc72243300 R14: ffff88dbc8f19600 R15: ffff88dc72243428 [17313.245667] FS: 00007fb145485700(0000) GS:ffff88dd3e000000(0000) knlGS:0000000000000000 [17313.245670] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [17313.245672] CR2: 0000026bb46c6000 CR3: 0000004edb110003 CR4: 00000000007606e0 [17313.245753] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [17313.245756] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [17313.245759] PKRU: 55555554 [17313.245761] Call Trace: [17313.245803] cifs_fattr_to_inode+0x16b/0x580 [cifs] [17313.245838] cifs_get_inode_info+0x35c/0xa60 [cifs] [17313.245852] ? kmem_cache_alloc_trace+0x151/0x1d0 [17313.245885] cifs_open+0x38f/0x990 [cifs] [17313.245921] ? cifs_revalidate_dentry_attr+0x3e/0x350 [cifs] [17313.245953] ? cifsFileInfo_get+0x30/0x30 [cifs] [17313.245960] ? do_dentry_open+0x132/0x330 [17313.245963] do_dentry_open+0x132/0x330 [17313.245969] path_openat+0x573/0x14d0 [17313.245974] do_filp_open+0x93/0x100 [17313.245979] ? __check_object_size+0xa3/0x181 [17313.245986] ? audit_alloc_name+0x7e/0xd0 [17313.245992] do_sys_open+0x184/0x220 [17313.245999] do_syscall_64+0x5b/0x1b0 Fixes: 487317c99477 ("cifs: add spinlock for the openFileList to cifsInodeInfo") CC: Stable <stable@vger.kernel.org> Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-11-09CIFS: Force reval dentry if LOOKUP_REVAL flag is setPavel Shilovsky
commit 0b3d0ef9840f7be202393ca9116b857f6f793715 upstream. Mark inode for force revalidation if LOOKUP_REVAL flag is set. This tells the client to actually send a QueryInfo request to the server to obtain the latest metadata in case a directory or a file were changed remotely. Only do that if the client doesn't have a lease for the file to avoid unneeded round trips to the server. Cc: <stable@vger.kernel.org> Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-11-09CIFS: Force revalidate inode when dentry is stalePavel Shilovsky
commit c82e5ac7fe3570a269c0929bf7899f62048e7dbc upstream. Currently the client indicates that a dentry is stale when inode numbers or type types between a local inode and a remote file don't match. If this is the case attributes is not being copied from remote to local, so, it is already known that the local copy has stale metadata. That's why the inode needs to be marked for revalidation in order to tell the VFS to lookup the dentry again before openning a file. This prevents unexpected stale errors to be returned to the user space when openning a file. Cc: <stable@vger.kernel.org> Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-11-09CIFS: Gracefully handle QueryInfo errors during openPavel Shilovsky
commit 30573a82fb179420b8aac30a3a3595aa96a93156 upstream. Currently if the client identifies problems when processing metadata returned in CREATE response, the open handle is being leaked. This causes multiple problems like a file missing a lease break by that client which causes high latencies to other clients accessing the file. Another side-effect of this is that the file can't be deleted. Fix this by closing the file after the client hits an error after the file was opened and the open descriptor wasn't returned to the user space. Also convert -ESTALE to -EOPENSTALE to allow the VFS to revalidate a dentry and retry the open. Cc: <stable@vger.kernel.org> Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-11-09selinux: fix context string corruption in convert_context()Ondrej Mosnacek
commit 2a5243937c700ffe6a28e6557a4562a9ab0a17a4 upstream. string_to_context_struct() may garble the context string, so we need to copy back the contents again from the old context struct to avoid storing the corrupted context. Since string_to_context_struct() tokenizes (and therefore truncates) the context string and we are later potentially copying it with kstrdup(), this may eventually cause pieces of uninitialized kernel memory to be disclosed to userspace (when copying to userspace based on the stored length and not the null character). How to reproduce on Fedora and similar: # dnf install -y memcached # systemctl start memcached # semodule -d memcached # load_policy # load_policy # systemctl stop memcached # ausearch -m AVC type=AVC msg=audit(1570090572.648:313): avc: denied { signal } for pid=1 comm="systemd" scontext=system_u:system_r:init_t:s0 tcontext=system_u:object_r:unlabeled_t:s0 tclass=process permissive=0 trawcon=73797374656D5F75007400000000000070BE6E847296FFFF726F6D000096FFFF76 Cc: stable@vger.kernel.org Reported-by: Milos Malik <mmalik@redhat.com> Fixes: ee1a84fdfeed ("selinux: overhaul sidtab to fix bug and improve performance") Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-11-09blk-wbt: fix performance regression in wbt scale_up/scale_downHarshad Shirwadkar
commit b84477d3ebb96294f87dc3161e53fa8fe22d9bfd upstream. scale_up wakes up waiters after scaling up. But after scaling max, it should not wake up more waiters as waiters will not have anything to do. This patch fixes this by making scale_up (and also scale_down) return when threshold is reached. This bug causes increased fdatasync latency when fdatasync and dd conv=sync are performed in parallel on 4.19 compared to 4.14. This bug was introduced during refactoring of blk-wbt code. Fixes: a79050434b45 ("blk-rq-qos: refactor out common elements of blk-wbt") Cc: stable@vger.kernel.org Cc: Josef Bacik <jbacik@fb.com> Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-11-09perf inject jit: Fix JIT_CODE_MOVE filenameSteve MacLean
commit b59711e9b0d22fd47abfa00602fd8c365cdd3ab7 upstream. During perf inject --jit, JIT_CODE_MOVE records were injecting MMAP records with an incorrect filename. Specifically it was missing the ".so" suffix. Further the JIT_CODE_LOAD record were silently truncating the jr->load.code_index field to 32 bits before generating the filename. Make both records emit the same filename based on the full 64 bit code_index field. Fixes: 9b07e27f88b9 ("perf inject: Add jitdump mmap injection support") Cc: stable@vger.kernel.org # v4.6+ Signed-off-by: Steve MacLean <Steve.MacLean@Microsoft.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Brian Robbins <brianrob@microsoft.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Eric Saint-Etienne <eric.saint.etienne@oracle.com> Cc: John Keeping <john@metanate.com> Cc: John Salem <josalem@microsoft.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Tom McDonald <thomas.mcdonald@microsoft.com> Link: http://lore.kernel.org/lkml/BN8PR21MB1362FF8F127B31DBF4121528F7800@BN8PR21MB1362.namprd21.prod.outlook.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-11-09perf llvm: Don't access out-of-scope arrayIan Rogers
commit 7d4c85b7035eb2f9ab217ce649dcd1bfaf0cacd3 upstream. The 'test_dir' variable is assigned to the 'release' array which is out-of-scope 3 lines later. Extend the scope of the 'release' array so that an out-of-scope array isn't accessed. Bug detected by clang's address sanitizer. Fixes: 07bc5c699a3d ("perf tools: Make fetch_kernel_version() publicly available") Cc: stable@vger.kernel.org # v4.4+ Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lore.kernel.org/lkml/20190926220018.25402-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-11-09efivar/ssdt: Don't iterate over EFI vars if no SSDT override was specifiedArd Biesheuvel
commit c05f8f92b701576b615f30aac31fabdc0648649b upstream. The kernel command line option efivar_ssdt= allows the name to be specified of an EFI variable containing an ACPI SSDT table that should be loaded into memory by the OS, and treated as if it was provided by the firmware. Currently, that code will always iterate over the EFI variables and compare each name with the provided name, even if the command line option wasn't set to begin with. So bail early when no variable name was provided. This works around a boot regression on the 2012 Mac Pro, as reported by Scott. Tested-by: Scott Talbert <swt@techie.net> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: <stable@vger.kernel.org> # v4.9+ Cc: Ben Dooks <ben.dooks@codethink.co.uk> Cc: Dave Young <dyoung@redhat.com> Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Cc: Jerry Snitselaar <jsnitsel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lukas Wunner <lukas@wunner.de> Cc: Lyude Paul <lyude@redhat.com> Cc: Matthew Garrett <mjg59@google.com> Cc: Octavian Purdila <octavian.purdila@intel.com> Cc: Peter Jones <pjones@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Cc: linux-integrity@vger.kernel.org Fixes: 475fb4e8b2f4 ("efi / ACPI: load SSTDs from EFI variables") Link: https://lkml.kernel.org/r/20191002165904.8819-3-ard.biesheuvel@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>