aboutsummaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)Author
2017-09-14Merge branch 'standard/base' into standard/arm-versatile-926ejsstandard/arm-versatile-926ejsBruce Ashfield
2017-09-14Bluetooth: Properly check L2CAP config option output buffer lengthstandard/tiny/common-pcstandard/tiny/basestandard/qemuppcstandard/qemuarm64standard/preempt-rt/intelstandard/preempt-rt/basestandard/edgerouterstandard/beaglebonestandard/baseBen Seri
Validate the output buffer length for L2CAP config requests and responses to avoid overflowing the stack buffer used for building the option blocks. Cc: stable@vger.kernel.org Signed-off-by: Ben Seri <ben@armis.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Bruce Ashfield <bruce.ashfield@windriver.com>
2017-09-14Merge branch 'standard/base' into standard/arm-versatile-926ejsBruce Ashfield
2017-09-14udp: consistently apply ufo or fragmentationWillem de Bruijn
[ Upstream commit 85f1bd9a7b5a79d5baa8bf44af19658f7bf77bfa ] When iteratively building a UDP datagram with MSG_MORE and that datagram exceeds MTU, consistently choose UFO or fragmentation. Once skb_is_gso, always apply ufo. Conversely, once a datagram is split across multiple skbs, do not consider ufo. Sendpage already maintains the first invariant, only add the second. IPv6 does not have a sendpage implementation to modify. A gso skb must have a partial checksum, do not follow sk_no_check_tx in udp_send_skb. Found by syzkaller. Fixes: e89e9cf539a2 ("[IPv4/IPv6]: UFO Scatter-gather approach") Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-24Merge branch 'standard/base' into standard/arm-versatile-926ejsBruce Ashfield
2017-08-24aufs: sync to 4.10-latestBruce Ashfield
Syncing to the latest aufs4 upstream 4.10 branch to fix compile (and runtime) issues with aufs. Signed-off-by: Bruce Ashfield <bruce.ashfield@windriver.com>
2017-08-18Merge branch 'standard/base' into standard/arm-versatile-926ejsBruce Ashfield
2017-08-18drm/vmwgfx: limit the number of mip levels in vmw_gb_surface_define_ioctl()Vladis Dronov
commit: ee9c4e681ec4f58e42a83cb0c22a0289ade1aacf upstream The 'req->mip_levels' parameter in vmw_gb_surface_define_ioctl() is a user-controlled 'uint32_t' value which is used as a loop count limit. This can lead to a kernel lockup and DoS. Add check for 'req->mip_levels'. References: https://bugzilla.redhat.com/show_bug.cgi?id=1437431 Cc: <stable@vger.kernel.org> Signed-off-by: Vladis Dronov <vdronov@redhat.com> Reviewed-by: Sinclair Yeh <syeh@vmware.com> Signed-off-by: Jianchuan Wang <jianchuan.wang@windriver.com> Signed-off-by: Bruce Ashfield <bruce.ashfield@windriver.com>
2017-08-18ACPICA: Namespace: fix operand cache leakSeunghun Han
commit 3b2d69114fefa474fca542e51119036dceb4aa6f upstream ACPICA commit a23325b2e583556eae88ed3f764e457786bf4df6 I found some ACPI operand cache leaks in ACPI early abort cases. Boot log of ACPI operand cache leak is as follows: >[ 0.174332] ACPI: Added _OSI(Module Device) >[ 0.175504] ACPI: Added _OSI(Processor Device) >[ 0.176010] ACPI: Added _OSI(3.0 _SCP Extensions) >[ 0.177032] ACPI: Added _OSI(Processor Aggregator Device) >[ 0.178284] ACPI: SCI (IRQ16705) allocation failed >[ 0.179352] ACPI Exception: AE_NOT_ACQUIRED, Unable to install System Control Interrupt handler (20160930/evevent-131) >[ 0.180008] ACPI: Unable to start the ACPI Interpreter >[ 0.181125] ACPI Error: Could not remove SCI handler (20160930/evmisc-281) >[ 0.184068] kmem_cache_destroy Acpi-Operand: Slab cache still has objects >[ 0.185358] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.10.0-rc3 #2 >[ 0.186820] Hardware name: innotek gmb_h virtual_box/virtual_box, BIOS virtual_box 12/01/2006 >[ 0.188000] Call Trace: >[ 0.188000] ? dump_stack+0x5c/0x7d >[ 0.188000] ? kmem_cache_destroy+0x224/0x230 >[ 0.188000] ? acpi_sleep_proc_init+0x22/0x22 >[ 0.188000] ? acpi_os_delete_cache+0xa/0xd >[ 0.188000] ? acpi_ut_delete_caches+0x3f/0x7b >[ 0.188000] ? acpi_terminate+0x5/0xf >[ 0.188000] ? acpi_init+0x288/0x32e >[ 0.188000] ? __class_create+0x4c/0x80 >[ 0.188000] ? video_setup+0x7a/0x7a >[ 0.188000] ? do_one_initcall+0x4e/0x1b0 >[ 0.188000] ? kernel_init_freeable+0x194/0x21a >[ 0.188000] ? rest_init+0x80/0x80 >[ 0.188000] ? kernel_init+0xa/0x100 >[ 0.188000] ? ret_from_fork+0x25/0x30 When early abort is occurred due to invalid ACPI information, Linux kernel terminates ACPI by calling acpi_terminate() function. The function calls acpi_ns_terminate() function to delete namespace data and ACPI operand cache (acpi_gbl_module_code_list). But the deletion code in acpi_ns_terminate() function is wrapped in ACPI_EXEC_APP definition, therefore the code is only executed when the definition exists. If the define doesn't exist, ACPI operand cache (acpi_gbl_module_code_list) is leaked, and stack dump is shown in kernel log. This causes a security threat because the old kernel (<= 4.9) shows memory locations of kernel functions in stack dump, therefore kernel ASLR can be neutralized. To fix ACPI operand leak for enhancing security, I made a patch which removes the ACPI_EXEC_APP define in acpi_ns_terminate() function for executing the deletion code unconditionally. Link: https://github.com/acpica/acpica/commit/a23325b2 Signed-off-by: Seunghun Han <kkamagui@gmail.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Jianchuan Wang <jianchuan.wang@windriver.com> Signed-off-by: Bruce Ashfield <bruce.ashfield@windriver.com>
2017-08-18char: lp: fix possible integer overflow in lp_setup()Willy Tarreau
commit 3e21f4af170bebf47c187c1ff8bf155583c9f3b1 upstream The lp_setup() code doesn't apply any bounds checking when passing "lp=none", and only in this case, resulting in an overflow of the parport_nr[] array. All versions in Git history are affected. Reported-By: Roee Hay <roee.hay@hcl.com> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: stable@vger.kernel.org Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jianchuan Wang <jianchuan.wang@windriver.com> Signed-off-by: Bruce Ashfield <bruce.ashfield@windriver.com>
2017-08-18dccp/tcp: do not inherit mc_list from parentEric Dumazet
commit 657831ffc38e30092a2d5f03d385d710eb88b09a upstream syzkaller found a way to trigger double frees from ip_mc_drop_socket() It turns out that leave a copy of parent mc_list at accept() time, which is very bad. Very similar to commit 8b485ce69876 ("tcp: do not inherit fastopen_req from parent") Initial report from Pray3r, completed by Andrey one. Thanks a lot to them ! Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Pray3r <pray3r.z@gmail.com> Reported-by: Andrey Konovalov <andreyknvl@google.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jianchuan Wang <jianchuan.wang@windriver.com> Signed-off-by: Bruce Ashfield <bruce.ashfield@windriver.com>
2017-08-18nfsd: encoders mustn't use unitialized values in error casesJ. Bruce Fields
commit: f961e3f2acae94b727380c0b74e2d3954d0edf79 upstream In error cases, lgp->lg_layout_type may be out of bounds; so we shouldn't be using it until after the check of nfserr. This was seen to crash nfsd threads when the server receives a LAYOUTGET request with a large layout type. GETDEVICEINFO has the same problem. Reported-by: Ari Kauppi <Ari.Kauppi@synopsys.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Jianchuan Wang <jianchuan.wang@windriver.com> Signed-off-by: Bruce Ashfield <bruce.ashfield@windriver.com>
2017-08-18nfsd: fix undefined behavior in nfsd4_layout_verifyAri Kauppi
commit: b550a32e60a4941994b437a8d662432a486235a5 upstream UBSAN: Undefined behaviour in fs/nfsd/nfs4proc.c:1262:34 shift exponent 128 is too large for 32-bit type 'int' Depending on compiler+architecture, this may cause the check for layout_type to succeed for overly large values (which seems to be the case with amd64). The large value will be later used in de-referencing nfsd4_layout_ops for function pointers. Reported-by: Jani Tuovila <tuovila@synopsys.com> Signed-off-by: Ari Kauppi <ari@synopsys.com> [colin.king@canonical.com: use LAYOUT_TYPE_MAX instead of 32] Cc: stable@vger.kernel.org Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Jianchuan Wang <jianchuan.wang@windriver.com> Signed-off-by: Bruce Ashfield <bruce.ashfield@windriver.com>
2017-08-18xen-blkback: don't leak stack data via response ringJan Beulich
commit: 089bc0143f489bd3a4578bdff5f4ca68fb26f341 upstream Rather than constructing a local structure instance on the stack, fill the fields directly on the shared ring, just like other backends do. Build on the fact that all response structure flavors are actually identical (the old code did make this assumption too). This is XSA-216. Cc: stable@vger.kernel.org Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Jianchuan Wang <jianchuan.wang@windriver.com> Signed-off-by: Bruce Ashfield <bruce.ashfield@windriver.com>
2017-08-18brcmfmac: fix possible buffer overflow in brcmf_cfg80211_mgmt_tx()Arend van Spriel
commit 8f44c9a41386729fea410e688959ddaa9d51be7c upstream The lower level nl80211 code in cfg80211 ensures that "len" is between 25 and NL80211_ATTR_FRAME (2304). We subtract DOT11_MGMT_HDR_LEN (24) from "len" so thats's max of 2280. However, the action_frame->data[] buffer is only BRCMF_FIL_ACTION_FRAME_SIZE (1800) bytes long so this memcpy() can overflow. memcpy(action_frame->data, &buf[DOT11_MGMT_HDR_LEN], le16_to_cpu(action_frame->len)); Cc: stable@vger.kernel.org # 3.9.x Fixes: 18e2f61db3b70 ("brcmfmac: P2P action frame tx.") Reported-by: "freenerguo(郭大兴)" <freenerguo@tencent.com> Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jianchuan Wang <jianchuan.wang@windriver.com> Signed-off-by: Bruce Ashfield <bruce.ashfield@windriver.com>
2017-08-18Merge branch 'standard/base' into standard/arm-versatile-926ejsBruce Ashfield
2017-08-18ipx: call ipxitf_put() in ioctl error pathDan Carpenter
upstream ee0d8d8482345ff97a75a7d747efc309f13b0d80 commit We should call ipxitf_put() if the copy_to_user() fails. Reported-by: 李强 <liqiang6-s@360.cn> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jianchuan Wang <jianchuan.wang@windriver.com> Signed-off-by: Bruce Ashfield <bruce.ashfield@windriver.com>
2017-08-18mm: fix new crash in unmapped_area_topdown()Hugh Dickins
commit f4cb767d76cf7ee72f97dd76f6cfa6c76a5edc89 upstream Trinity gets kernel BUG at mm/mmap.c:1963! in about 3 minutes of mmap testing. That's the VM_BUG_ON(gap_end < gap_start) at the end of unmapped_area_topdown(). Linus points out how MAP_FIXED (which does not have to respect our stack guard gap intentions) could result in gap_end below gap_start there. Fix that, and the similar case in its alternative, unmapped_area(). Cc: stable@vger.kernel.org Fixes: 1be7107fbe18 ("mm: larger stack guard gap, between vmas") Reported-by: Dave Jones <davej@codemonkey.org.uk> Debugged-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jianchuan Wang <jianchuan.wang@windriver.com> Signed-off-by: Bruce Ashfield <bruce.ashfield@windriver.com>
2017-08-18mm: larger stack guard gap, between vmasHugh Dickins
commit 1be7107fbe18eed3e319a6c3e83c78254b693acb upstream. Stack guard page is a useful feature to reduce a risk of stack smashing into a different mapping. We have been using a single page gap which is sufficient to prevent having stack adjacent to a different mapping. But this seems to be insufficient in the light of the stack usage in userspace. E.g. glibc uses as large as 64kB alloca() in many commonly used functions. Others use constructs liks gid_t buffer[NGROUPS_MAX] which is 256kB or stack strings with MAX_ARG_STRLEN. This will become especially dangerous for suid binaries and the default no limit for the stack size limit because those applications can be tricked to consume a large portion of the stack and a single glibc call could jump over the guard page. These attacks are not theoretical, unfortunatelly. Make those attacks less probable by increasing the stack guard gap to 1MB (on systems with 4k pages; but make it depend on the page size because systems with larger base pages might cap stack allocations in the PAGE_SIZE units) which should cover larger alloca() and VLA stack allocations. It is obviously not a full fix because the problem is somehow inherent, but it should reduce attack space a lot. One could argue that the gap size should be configurable from userspace, but that can be done later when somebody finds that the new 1MB is wrong for some special case applications. For now, add a kernel command line option (stack_guard_gap) to specify the stack gap size (in page units). Implementation wise, first delete all the old code for stack guard page: because although we could get away with accounting one extra page in a stack vma, accounting a larger gap can break userspace - case in point, a program run with "ulimit -S -v 20000" failed when the 1MB gap was counted for RLIMIT_AS; similar problems could come with RLIMIT_MLOCK and strict non-overcommit mode. Instead of keeping gap inside the stack vma, maintain the stack guard gap as a gap between vmas: using vm_start_gap() in place of vm_start (or vm_end_gap() in place of vm_end if VM_GROWSUP) in just those few places which need to respect the gap - mainly arch_get_unmapped_area(), and and the vma tree's subtree_gap support for that. Original-patch-by: Oleg Nesterov <oleg@redhat.com> Original-patch-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Tested-by: Helge Deller <deller@gmx.de> # parisc Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [wt: backport to 4.11: adjust context] [wt: backport to 4.9: adjust context ; kernel doc was not in admin-guide] Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Jianchuan Wang <jianchuan.wang@windriver.com> Signed-off-by: Bruce Ashfield <bruce.ashfield@windriver.com>
2017-06-08Merge branch 'standard/base' into standard/arm-versatile-926ejsBruce Ashfield
2017-06-08Merge tag 'v4.10.17' into standard/baseBruce Ashfield
This is the 4.10.17 stable release
2017-06-08Merge branch 'standard/base' into standard/arm-versatile-926ejsBruce Ashfield
2017-06-08Merge tag 'v4.10.16' into standard/baseBruce Ashfield
This is the 4.10.16 stable release
2017-05-20Linux 4.10.17v4.10.17Greg Kroah-Hartman
2017-05-20pstore: Shut down worker when unregisteringKees Cook
commit 6330d5534786d5315d56d558aa6d20740f97d80a upstream. When built as a module and running with update_ms >= 0, pstore will Oops during module unload since the work timer is still running. This makes sure the worker is stopped before unloading. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20pstore: Fix flags to enable dumps on powerpcAnkit Kumar
commit 041939c1ec54208b42f5cd819209173d52a29d34 upstream. After commit c950fd6f201a kernel registers pstore write based on flag set. Pstore write for powerpc is broken as flags(PSTORE_FLAGS_DMESG) is not set for powerpc architecture. On panic, kernel doesn't write message to /fs/pstore/dmesg*(Entry doesn't gets created at all). This patch enables pstore write for powerpc architecture by setting PSTORE_FLAGS_DMESG flag. Fixes: c950fd6f201a ("pstore: Split pstore fragile flags") Signed-off-by: Ankit Kumar <ankit@linux.vnet.ibm.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20libnvdimm, pfn: fix 'npfns' vs section alignmentDan Williams
commit d5483feda85a8f39ee2e940e279547c686aac30c upstream. Fix failures to create namespaces due to the vmem_altmap not advertising enough free space to store the memmap. WARNING: CPU: 15 PID: 8022 at arch/x86/mm/init_64.c:656 arch_add_memory+0xde/0xf0 [..] Call Trace: dump_stack+0x63/0x83 __warn+0xcb/0xf0 warn_slowpath_null+0x1d/0x20 arch_add_memory+0xde/0xf0 devm_memremap_pages+0x244/0x440 pmem_attach_disk+0x37e/0x490 [nd_pmem] nd_pmem_probe+0x7e/0xa0 [nd_pmem] nvdimm_bus_probe+0x71/0x120 [libnvdimm] driver_probe_device+0x2bb/0x460 bind_store+0x114/0x160 drv_attr_store+0x25/0x30 In commit 658922e57b84 "libnvdimm, pfn: fix memmap reservation sizing" we arranged for the capacity to be allocated, but failed to also update the 'npfns' parameter. This leads to cases where there is enough capacity reserved to hold all the allocated sections, but vmemmap_populate_hugepages() still encounters -ENOMEM from altmap_alloc_block_buf(). This fix is a stop-gap until we can teach the core memory hotplug implementation to permit sub-section hotplug. Fixes: 658922e57b84 ("libnvdimm, pfn: fix memmap reservation sizing") Reported-by: Anisha Allada <anisha.allada@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20libnvdimm: fix nvdimm_bus_lock() vs device_lock() orderingDan Williams
commit 452bae0aede774f87bf56c28b6dd50b72c78986c upstream. A debug patch to turn the standard device_lock() into something that lockdep can analyze yielded the following: ====================================================== [ INFO: possible circular locking dependency detected ] 4.11.0-rc4+ #106 Tainted: G O ------------------------------------------------------- lt-libndctl/1898 is trying to acquire lock: (&dev->nvdimm_mutex/3){+.+.+.}, at: [<ffffffffc023c948>] nd_attach_ndns+0x178/0x1b0 [libnvdimm] but task is already holding lock: (&nvdimm_bus->reconfig_mutex){+.+.+.}, at: [<ffffffffc022e0b1>] nvdimm_bus_lock+0x21/0x30 [libnvdimm] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&nvdimm_bus->reconfig_mutex){+.+.+.}: lock_acquire+0xf6/0x1f0 __mutex_lock+0x88/0x980 mutex_lock_nested+0x1b/0x20 nvdimm_bus_lock+0x21/0x30 [libnvdimm] nvdimm_namespace_capacity+0x1b/0x40 [libnvdimm] nvdimm_namespace_common_probe+0x230/0x510 [libnvdimm] nd_pmem_probe+0x14/0x180 [nd_pmem] nvdimm_bus_probe+0xa9/0x260 [libnvdimm] -> #0 (&dev->nvdimm_mutex/3){+.+.+.}: __lock_acquire+0x1107/0x1280 lock_acquire+0xf6/0x1f0 __mutex_lock+0x88/0x980 mutex_lock_nested+0x1b/0x20 nd_attach_ndns+0x178/0x1b0 [libnvdimm] nd_namespace_store+0x308/0x3c0 [libnvdimm] namespace_store+0x87/0x220 [libnvdimm] In this case '&dev->nvdimm_mutex/3' mirrors '&dev->mutex'. Fix this by replacing the use of device_lock() with nvdimm_bus_lock() to protect nd_{attach,detach}_ndns() operations. Fixes: 8c2f7e8658df ("libnvdimm: infrastructure for btt devices") Reported-by: Yi Zhang <yizhan@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20libnvdimm, pmem: fix a NULL pointer BUG in nd_pmem_notifyToshi Kani
commit b2518c78ce76896f0f8f7940bf02104b227e1709 upstream. The following BUG was observed when nd_pmem_notify() was called for a BTT device. The use of a pmem_device pointer is not valid with BTT. BUG: unable to handle kernel NULL pointer dereference at 0000000000000030 IP: nd_pmem_notify+0x30/0xf0 [nd_pmem] Call Trace: nd_device_notify+0x40/0x50 child_notify+0x10/0x20 device_for_each_child+0x50/0x90 nd_region_notify+0x20/0x30 nd_device_notify+0x40/0x50 nvdimm_region_notify+0x27/0x30 acpi_nfit_scrub+0x341/0x590 [nfit] process_one_work+0x197/0x450 worker_thread+0x4e/0x4a0 kthread+0x109/0x140 Fix nd_pmem_notify() by setting nd_region and badblocks pointers properly for BTT. Cc: Vishal Verma <vishal.l.verma@intel.com> Fixes: 719994660c24 ("libnvdimm: async notification support") Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20libnvdimm, region: fix flush hint detection crashDan Williams
commit bc042fdfbb92b5b13421316b4548e2d6e98eed37 upstream. In the case where a dimm does not have any associated flush hints the ndrd->flush_wpq array may be uninitialized leading to crashes with the following signature: BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 IP: region_visible+0x10f/0x160 [libnvdimm] Call Trace: internal_create_group+0xbe/0x2f0 sysfs_create_groups+0x40/0x80 device_add+0x2d8/0x650 nd_async_device_register+0x12/0x40 [libnvdimm] async_run_entry_fn+0x39/0x170 process_one_work+0x212/0x6c0 ? process_one_work+0x197/0x6c0 worker_thread+0x4e/0x4a0 kthread+0x10c/0x140 ? process_one_work+0x6c0/0x6c0 ? kthread_create_on_node+0x60/0x60 ret_from_fork+0x31/0x40 Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Fixes: f284a4f23752 ("libnvdimm: introduce nvdimm_flush() and nvdimm_has_flush()") Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20ipmi: Fix kernel panic at ipmi_ssif_thread()Joeseph Chang
commit 6de65fcfdb51835789b245203d1bfc8d14cb1e06 upstream. msg_written_handler() may set ssif_info->multi_data to NULL when using ipmitool to write fru. Before setting ssif_info->multi_data to NULL, add new local pointer "data_to_send" and store correct i2c data pointer to it to fix NULL pointer kernel panic and incorrect ssif_info->multi_pos. Signed-off-by: Joeseph Chang <joechang@codeaurora.org> Signed-off-by: Corey Minyard <cminyard@mvista.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20Bluetooth: hci_intel: add missing tty-device sanity checkJohan Hovold
commit dcb9cfaa5ea9aa0ec08aeb92582ccfe3e4c719a9 upstream. Make sure to check the tty-device pointer before looking up the sibling platform device to avoid dereferencing a NULL-pointer when the tty is one end of a Unix98 pty. Fixes: 74cdad37cd24 ("Bluetooth: hci_intel: Add runtime PM support") Fixes: 1ab1f239bf17 ("Bluetooth: hci_intel: Add support for platform driver") Cc: Loic Poulain <loic.poulain@intel.com> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20Bluetooth: hci_bcm: add missing tty-device sanity checkJohan Hovold
commit 95065a61e9bf25fb85295127fba893200c2bbbd8 upstream. Make sure to check the tty-device pointer before looking up the sibling platform device to avoid dereferencing a NULL-pointer when the tty is one end of a Unix98 pty. Fixes: 0395ffc1ee05 ("Bluetooth: hci_bcm: Add PM for BCM devices") Cc: Frederic Danis <frederic.danis@linux.intel.com> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20Bluetooth: Fix user channel for 32bit userspace on 64bit kernelSzymon Janc
commit ab89f0bdd63a3721f7cd3f064f39fc4ac7ca14d4 upstream. Running 32bit userspace on 64bit kernel results in MSG_CMSG_COMPAT being defined as 0x80000000. This results in sendmsg failure if used from 32bit userspace running on 64bit kernel. Fix this by accounting for MSG_CMSG_COMPAT in flags check in hci_sock_sendmsg. Signed-off-by: Szymon Janc <szymon.janc@codecoup.pl> Signed-off-by: Marko Kiiskila <marko@runtime.io> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20tty: pty: Fix ldisc flush after userspace become aware of the data alreadyWang YanQing
commit 77dae6134440420bac334581a3ccee94cee1c054 upstream. While using emacs, cat or others' commands in konsole with recent kernels, I have met many times that CTRL-C freeze konsole. After konsole freeze I can't type anything, then I have to open a new one, it is very annoying. See bug report: https://bugs.kde.org/show_bug.cgi?id=175283 The platform in that bug report is Solaris, but now the pty in linux has the same problem or the same behavior as Solaris :) It has high possibility to trigger the problem follow steps below: Note: In my test, BigFile is a text file whose size is bigger than 1G 1:open konsole 1:cat BigFile 2:CTRL-C After some digging, I find out the reason is that commit 1d1d14da12e7 ("pty: Fix buffer flush deadlock") changes the behavior of pty_flush_buffer. Thread A Thread B -------- -------- 1:n_tty_poll return POLLIN 2:CTRL-C trigger pty_flush_buffer tty_buffer_flush n_tty_flush_buffer 3:attempt to check count of chars: ioctl(fd, TIOCINQ, &available) available is equal to 0 4:read(fd, buffer, avaiable) return 0 5:konsole close fd Yes, I know we could use the same patch included in the BUG report as a workaround for linux platform too. But I think the data in ldisc is belong to application of another side, we shouldn't clear it when we want to flush write buffer of this side in pty_flush_buffer. So I think it is better to disable ldisc flush in pty_flush_buffer, because its new hehavior bring no benefit except that it mess up the behavior between POLLIN, and TIOCINQ or FIONREAD. Also I find no flush_buffer function in others' tty driver has the same behavior as current pty_flush_buffer. Fixes: 1d1d14da12e7 ("pty: Fix buffer flush deadlock") Signed-off-by: Wang YanQing <udknight@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20serial: omap: suspend device on probe errorsJohan Hovold
commit 77e6fe7fd2b7cba0bf2f2dc8cde51d7b9a35bf74 upstream. Make sure to actually suspend the device before returning after a failed (or deferred) probe. Note that autosuspend must be disabled before runtime pm is disabled in order to balance the usage count due to a negative autosuspend delay as well as to make the final put suspend the device synchronously. Fixes: 388bc2622680 ("omap-serial: Fix the error handling in the omap_serial probe") Cc: Shubhrajyoti D <shubhrajyoti@ti.com> Signed-off-by: Johan Hovold <johan@kernel.org> Acked-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20serial: omap: fix runtime-pm handling on unbindJohan Hovold
commit 099bd73dc17ed77aa8c98323e043613b6e8f54fc upstream. An unbalanced and misplaced synchronous put was used to suspend the device on driver unbind, something which with a likewise misplaced pm_runtime_disable leads to external aborts when an open port is being removed. Unhandled fault: external abort on non-linefetch (0x1028) at 0xfa024010 ... [<c046e760>] (serial_omap_set_mctrl) from [<c046a064>] (uart_update_mctrl+0x50/0x60) [<c046a064>] (uart_update_mctrl) from [<c046a400>] (uart_shutdown+0xbc/0x138) [<c046a400>] (uart_shutdown) from [<c046bd2c>] (uart_hangup+0x94/0x190) [<c046bd2c>] (uart_hangup) from [<c045b760>] (__tty_hangup+0x404/0x41c) [<c045b760>] (__tty_hangup) from [<c045b794>] (tty_vhangup+0x1c/0x20) [<c045b794>] (tty_vhangup) from [<c046ccc8>] (uart_remove_one_port+0xec/0x260) [<c046ccc8>] (uart_remove_one_port) from [<c046ef4c>] (serial_omap_remove+0x40/0x60) [<c046ef4c>] (serial_omap_remove) from [<c04845e8>] (platform_drv_remove+0x34/0x4c) Fix this up by resuming the device before deregistering the port and by suspending and disabling runtime pm only after the port has been removed. Also make sure to disable autosuspend before disabling runtime pm so that the usage count is balanced and device actually suspended before returning. Note that due to a negative autosuspend delay being set in probe, the unbalanced put would actually suspend the device on first driver unbind, while rebinding and again unbinding would result in a negative power.usage_count. Fixes: 7e9c8e7dbf3b ("serial: omap: make sure to suspend device before remove") Cc: Felipe Balbi <balbi@kernel.org> Cc: Santosh Shilimkar <santosh.shilimkar@ti.com> Signed-off-by: Johan Hovold <johan@kernel.org> Acked-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20serial: samsung: Use right device for DMA-mapping callsMarek Szyprowski
commit 768d64f491a530062ddad50e016fb27125f8bd7c upstream. Driver should provide its own struct device for all DMA-mapping calls instead of extracting device pointer from DMA engine channel. Although this is harmless from the driver operation perspective on ARM architecture, it is always good to use the DMA mapping API in a proper way. This patch fixes following DMA API debug warning: WARNING: CPU: 0 PID: 0 at lib/dma-debug.c:1241 check_sync+0x520/0x9f4 samsung-uart 12c20000.serial: DMA-API: device driver tries to sync DMA memory it has not allocated [device address=0x000000006df0f580] [size=64 bytes] Modules linked in: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.11.0-rc1-00137-g07ca963 #51 Hardware name: SAMSUNG EXYNOS (Flattened Device Tree) [<c011aaa4>] (unwind_backtrace) from [<c01127c0>] (show_stack+0x20/0x24) [<c01127c0>] (show_stack) from [<c06ba5d8>] (dump_stack+0x84/0xa0) [<c06ba5d8>] (dump_stack) from [<c0139528>] (__warn+0x14c/0x180) [<c0139528>] (__warn) from [<c01395a4>] (warn_slowpath_fmt+0x48/0x50) [<c01395a4>] (warn_slowpath_fmt) from [<c0729058>] (check_sync+0x520/0x9f4) [<c0729058>] (check_sync) from [<c072967c>] (debug_dma_sync_single_for_device+0x88/0xc8) [<c072967c>] (debug_dma_sync_single_for_device) from [<c0803c10>] (s3c24xx_serial_start_tx_dma+0x100/0x2f8) [<c0803c10>] (s3c24xx_serial_start_tx_dma) from [<c0804338>] (s3c24xx_serial_tx_chars+0x198/0x33c) Reported-by: Seung-Woo Kim <sw0312.kim@samsung.com> Fixes: 62c37eedb74c8 ("serial: samsung: add dma reqest/release functions") Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Reviewed-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org> Reviewed-by: Shuah Khan <shuahkh@osg.samsung.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20fscrypt: fix context consistency check when key(s) unavailableEric Biggers
commit 272f98f6846277378e1758a49a49d7bf39343c02 upstream. To mitigate some types of offline attacks, filesystem encryption is designed to enforce that all files in an encrypted directory tree use the same encryption policy (i.e. the same encryption context excluding the nonce). However, the fscrypt_has_permitted_context() function which enforces this relies on comparing struct fscrypt_info's, which are only available when we have the encryption keys. This can cause two incorrect behaviors: 1. If we have the parent directory's key but not the child's key, or vice versa, then fscrypt_has_permitted_context() returned false, causing applications to see EPERM or ENOKEY. This is incorrect if the encryption contexts are in fact consistent. Although we'd normally have either both keys or neither key in that case since the master_key_descriptors would be the same, this is not guaranteed because keys can be added or removed from keyrings at any time. 2. If we have neither the parent's key nor the child's key, then fscrypt_has_permitted_context() returned true, causing applications to see no error (or else an error for some other reason). This is incorrect if the encryption contexts are in fact inconsistent, since in that case we should deny access. To fix this, retrieve and compare the fscrypt_contexts if we are unable to set up both fscrypt_infos. While this slightly hurts performance when accessing an encrypted directory tree without the key, this isn't a case we really need to be optimizing for; access *with* the key is much more important. Furthermore, the performance hit is barely noticeable given that we are already retrieving the fscrypt_context and doing two keyring searches in fscrypt_get_encryption_info(). If we ever actually wanted to optimize this case we might start by caching the fscrypt_contexts. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20f2fs: fix fs corruption due to zero inode pageJaegeuk Kim
commit 9bb02c3627f46e50246bf7ab957b56ffbef623cb upstream. This patch fixes the following scenario. - f2fs_create/f2fs_mkdir - write_checkpoint - f2fs_mark_inode_dirty_sync - block_operations - f2fs_lock_all - f2fs_sync_inode_meta - f2fs_unlock_all - sync_inode_metadata - f2fs_lock_op - f2fs_write_inode - update_inode_page - get_node_page return -ENOENT - new_inode_page - fill_node_footer - f2fs_mark_inode_dirty_sync - ... - f2fs_unlock_op - f2fs_inode_synced - f2fs_lock_all - do_checkpoint In this checkpoint, we can get an inode page which contains zeros having valid node footer only. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20mm: fix data corruption due to stale mmap readsJan Kara
commit cd656375f94632d7b5af57bf67b7b5c0270c591c upstream. Currently, we didn't invalidate page tables during invalidate_inode_pages2() for DAX. That could result in e.g. 2MiB zero page being mapped into page tables while there were already underlying blocks allocated and thus data seen through mmap were different from data seen by read(2). The following sequence reproduces the problem: - open an mmap over a 2MiB hole - read from a 2MiB hole, faulting in a 2MiB zero page - write to the hole with write(3p). The write succeeds but we incorrectly leave the 2MiB zero page mapping intact. - via the mmap, read the data that was just written. Since the zero page mapping is still intact we read back zeroes instead of the new data. Fix the problem by unconditionally calling invalidate_inode_pages2_range() in dax_iomap_actor() for new block allocations and by properly invalidating page tables in invalidate_inode_pages2_range() for DAX mappings. Fixes: c6dcf52c23d2d3fb5235cec42d7dd3f786b87d55 Link: http://lkml.kernel.org/r/20170510085419.27601-3-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20dax: prevent invalidation of mapped DAX entriesRoss Zwisler
commit 4636e70bb0a8b871998b6841a2e4b205cf2bc863 upstream. Patch series "mm,dax: Fix data corruption due to mmap inconsistency", v4. This series fixes data corruption that can happen for DAX mounts when page faults race with write(2) and as a result page tables get out of sync with block mappings in the filesystem and thus data seen through mmap is different from data seen through read(2). The series passes testing with t_mmap_stale test program from Ross and also other mmap related tests on DAX filesystem. This patch (of 4): dax_invalidate_mapping_entry() currently removes DAX exceptional entries only if they are clean and unlocked. This is done via: invalidate_mapping_pages() invalidate_exceptional_entry() dax_invalidate_mapping_entry() However, for page cache pages removed in invalidate_mapping_pages() there is an additional criteria which is that the page must not be mapped. This is noted in the comments above invalidate_mapping_pages() and is checked in invalidate_inode_page(). For DAX entries this means that we can can end up in a situation where a DAX exceptional entry, either a huge zero page or a regular DAX entry, could end up mapped but without an associated radix tree entry. This is inconsistent with the rest of the DAX code and with what happens in the page cache case. We aren't able to unmap the DAX exceptional entry because according to its comments invalidate_mapping_pages() isn't allowed to block, and unmap_mapping_range() takes a write lock on the mapping->i_mmap_rwsem. Since we essentially never have unmapped DAX entries to evict from the radix tree, just remove dax_invalidate_mapping_entry(). Fixes: c6dcf52c23d2 ("mm: Invalidate DAX radix tree entries only if appropriate") Link: http://lkml.kernel.org/r/20170510085419.27601-2-jack@suse.cz Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Reported-by: Jan Kara <jack@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20device-dax: fix sysfs attribute deadlockDan Williams
commit 565851c972b50612f3a4542e26879ffb3e906fc2 upstream. Usage of device_lock() for dax_region attributes is unnecessary and deadlock prone. It's unnecessary because the order of registration / un-registration guarantees that drvdata is always valid. It's deadlock prone because it sets up this situation: ndctl D 0 2170 2082 0x00000000 Call Trace: __schedule+0x31f/0x980 schedule+0x3d/0x90 schedule_preempt_disabled+0x15/0x20 __mutex_lock+0x402/0x980 ? __mutex_lock+0x158/0x980 ? align_show+0x2b/0x80 [dax] ? kernfs_seq_start+0x2f/0x90 mutex_lock_nested+0x1b/0x20 align_show+0x2b/0x80 [dax] dev_attr_show+0x20/0x50 ndctl D 0 2186 2079 0x00000000 Call Trace: __schedule+0x31f/0x980 schedule+0x3d/0x90 __kernfs_remove+0x1f6/0x340 ? kernfs_remove_by_name_ns+0x45/0xa0 ? remove_wait_queue+0x70/0x70 kernfs_remove_by_name_ns+0x45/0xa0 remove_files.isra.1+0x35/0x70 sysfs_remove_group+0x44/0x90 sysfs_remove_groups+0x2e/0x50 dax_region_unregister+0x25/0x40 [dax] devm_action_release+0xf/0x20 release_nodes+0x16d/0x2b0 devres_release_all+0x3c/0x60 device_release_driver_internal+0x17d/0x220 device_release_driver+0x12/0x20 unbind_store+0x112/0x160 ndctl/2170 is trying to acquire the device_lock() to read an attribute, and ndctl/2186 is holding the device_lock() while trying to drain all active attribute readers. Thanks to Yi Zhang for the reproduction script. Fixes: d7fe1a67f658 ("dax: add region 'id', 'size', and 'align' attributes") Reported-by: Yi Zhang <yizhan@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20device-dax: fix cdev leakDan Williams
commit ed01e50acdd3e4a640cf9ebd28a7e810c3ceca97 upstream. If device_add() fails, cleanup the cdev. Otherwise, we leak a kobj_map() with a stale device number. As Jason points out, there is a small possibility that userspace has opened and mapped the device in the time between cdev_add() and the device_add() failure. We need a new kill_dax_dev() helper to invalidate any established mappings. Fixes: ba09c01d2fa8 ("dax: convert to the cdev api") Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20md/raid1: avoid reusing a resync bio after error handling.NeilBrown
commit 0c9d5b127f695818c2c5a3868c1f28ca2969e905 upstream. fix_sync_read_error() modifies a bio on a newly faulty device by setting bi_end_io to end_sync_write. This ensure that put_buf() will still call rdev_dec_pending() as required, but makes sure that subsequent code in fix_sync_read_error() doesn't try to read from the device. Unfortunately this interacts badly with sync_request_write() which assumes that any bio with bi_end_io set to non-NULL other than end_sync_read is safe to write to. As the device is now faulty it doesn't make sense to write. As the bio was recently used for a read, it is "dirty" and not suitable for immediate submission. In particular, ->bi_next might be non-NULL, which will cause generic_make_request() to complain. Break this interaction by refusing to write to devices which are marked as Faulty. Reported-and-tested-by: Michael Wang <yun.wang@profitbricks.com> Fixes: 2e52d449bcec ("md/raid1: add failfast handling for reads.") Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20padata: free correct variableJason A. Donenfeld
commit 07a77929ba672d93642a56dc2255dd21e6e2290b upstream. The author meant to free the variable that was just allocated, instead of the one that failed to be allocated, but made a simple typo. This patch rectifies that. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20ovl: do not set overlay.opaque on non-dir createAmir Goldstein
commit 4a99f3c83dc493c8ea84693d78cd792839c8aa64 upstream. The optimization for opaque dir create was wrongly being applied also to non-dir create. Fixes: 97c684cc9110 ("ovl: create directories inside merged parent opaque") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20CIFS: add misssing SFM mapping for doublequoteBjörn Jacke
commit 85435d7a15294f9f7ef23469e6aaf7c5dfcc54f0 upstream. SFM is mapping doublequote to 0xF020 Without this patch creating files with doublequote fails to Windows/Mac Signed-off-by: Bjoern Jacke <bjacke@samba.org> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20cifs: fix CIFS_IOC_GET_MNT_INFO oopsDavid Disseldorp
commit d8a6e505d6bba2250852fbc1c1c86fe68aaf9af3 upstream. An open directory may have a NULL private_data pointer prior to readdir. Fixes: 0de1f4c6f6c0 ("Add way to query server fs info for smb3") Signed-off-by: David Disseldorp <ddiss@suse.de> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20CIFS: fix oplock break deadlocksRabin Vincent
commit 3998e6b87d4258a70df358296d6f1c7234012bfe upstream. When the final cifsFileInfo_put() is called from cifsiod and an oplock break work is queued, lockdep complains loudly: ============================================= [ INFO: possible recursive locking detected ] 4.11.0+ #21 Not tainted --------------------------------------------- kworker/0:2/78 is trying to acquire lock: ("cifsiod"){++++.+}, at: flush_work+0x215/0x350 but task is already holding lock: ("cifsiod"){++++.+}, at: process_one_work+0x255/0x8e0 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock("cifsiod"); lock("cifsiod"); *** DEADLOCK *** May be due to missing lock nesting notation 2 locks held by kworker/0:2/78: #0: ("cifsiod"){++++.+}, at: process_one_work+0x255/0x8e0 #1: ((&wdata->work)){+.+...}, at: process_one_work+0x255/0x8e0 stack backtrace: CPU: 0 PID: 78 Comm: kworker/0:2 Not tainted 4.11.0+ #21 Workqueue: cifsiod cifs_writev_complete Call Trace: dump_stack+0x85/0xc2 __lock_acquire+0x17dd/0x2260 ? match_held_lock+0x20/0x2b0 ? trace_hardirqs_off_caller+0x86/0x130 ? mark_lock+0xa6/0x920 lock_acquire+0xcc/0x260 ? lock_acquire+0xcc/0x260 ? flush_work+0x215/0x350 flush_work+0x236/0x350 ? flush_work+0x215/0x350 ? destroy_worker+0x170/0x170 __cancel_work_timer+0x17d/0x210 ? ___preempt_schedule+0x16/0x18 cancel_work_sync+0x10/0x20 cifsFileInfo_put+0x338/0x7f0 cifs_writedata_release+0x2a/0x40 ? cifs_writedata_release+0x2a/0x40 cifs_writev_complete+0x29d/0x850 ? preempt_count_sub+0x18/0xd0 process_one_work+0x304/0x8e0 worker_thread+0x9b/0x6a0 kthread+0x1b2/0x200 ? process_one_work+0x8e0/0x8e0 ? kthread_create_on_node+0x40/0x40 ret_from_fork+0x31/0x40 This is a real warning. Since the oplock is queued on the same workqueue this can deadlock if there is only one worker thread active for the workqueue (which will be the case during memory pressure when the rescuer thread is handling it). Furthermore, there is at least one other kind of hang possible due to the oplock break handling if there is only worker. (This can be reproduced without introducing memory pressure by having passing 1 for the max_active parameter of cifsiod.) cifs_oplock_break() can wait indefintely in the filemap_fdatawait() while the cifs_writev_complete() work is blocked: sysrq: SysRq : Show Blocked State task PC stack pid father kworker/0:1 D 0 16 2 0x00000000 Workqueue: cifsiod cifs_oplock_break Call Trace: __schedule+0x562/0xf40 ? mark_held_locks+0x4a/0xb0 schedule+0x57/0xe0 io_schedule+0x21/0x50 wait_on_page_bit+0x143/0x190 ? add_to_page_cache_lru+0x150/0x150 __filemap_fdatawait_range+0x134/0x190 ? do_writepages+0x51/0x70 filemap_fdatawait_range+0x14/0x30 filemap_fdatawait+0x3b/0x40 cifs_oplock_break+0x651/0x710 ? preempt_count_sub+0x18/0xd0 process_one_work+0x304/0x8e0 worker_thread+0x9b/0x6a0 kthread+0x1b2/0x200 ? process_one_work+0x8e0/0x8e0 ? kthread_create_on_node+0x40/0x40 ret_from_fork+0x31/0x40 dd D 0 683 171 0x00000000 Call Trace: __schedule+0x562/0xf40 ? mark_held_locks+0x29/0xb0 schedule+0x57/0xe0 io_schedule+0x21/0x50 wait_on_page_bit+0x143/0x190 ? add_to_page_cache_lru+0x150/0x150 __filemap_fdatawait_range+0x134/0x190 ? do_writepages+0x51/0x70 filemap_fdatawait_range+0x14/0x30 filemap_fdatawait+0x3b/0x40 filemap_write_and_wait+0x4e/0x70 cifs_flush+0x6a/0xb0 filp_close+0x52/0xa0 __close_fd+0xdc/0x150 SyS_close+0x33/0x60 entry_SYSCALL_64_fastpath+0x1f/0xbe Showing all locks held in the system: 2 locks held by kworker/0:1/16: #0: ("cifsiod"){.+.+.+}, at: process_one_work+0x255/0x8e0 #1: ((&cfile->oplock_break)){+.+.+.}, at: process_one_work+0x255/0x8e0 Showing busy workqueues and worker pools: workqueue cifsiod: flags=0xc pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/1 in-flight: 16:cifs_oplock_break delayed: cifs_writev_complete, cifs_echo_request pool 0: cpus=0 node=0 flags=0x0 nice=0 hung=0s workers=3 idle: 750 3 Fix these problems by creating a a new workqueue (with a rescuer) for the oplock break work. Signed-off-by: Rabin Vincent <rabinv@axis.com> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>