summaryrefslogtreecommitdiffstats
path: root/include
AgeCommit message (Collapse)Author
2018-07-17ahci: Disable LPM on Lenovo 50 series laptops with a too old BIOSHans de Goede
commit 240630e61870e62e39a97225048f9945848fa5f5 upstream. There have been several reports of LPM related hard freezes about once a day on multiple Lenovo 50 series models. Strange enough these reports where not disk model specific as LPM issues usually are and some users with the exact same disk + laptop where seeing them while other users where not seeing these issues. It turns out that enabling LPM triggers a firmware bug somewhere, which has been fixed in later BIOS versions. This commit adds a new ahci_broken_lpm() function and a new ATA_FLAG_NO_LPM for dealing with this. The ahci_broken_lpm() function contains DMI match info for the 4 models which are known to be affected by this and the DMI BIOS date field for known good BIOS versions. If the BIOS date is older then the one in the table LPM will be disabled and a warning will be printed. Note the BIOS dates are for known good versions, some older versions may work too, but we don't know for sure, the table is using dates from BIOS versions for which users have confirmed that upgrading to that version makes the problem go away. Unfortunately I've been unable to get hold of the reporter who reported that BIOS version 2.35 fixed the problems on the W541 for him. I've been able to verify the DMI_SYS_VENDOR and DMI_PRODUCT_VERSION from an older dmidecode, but I don't know the exact BIOS date as reported in the DMI. Lenovo keeps a changelog with dates in their release notes, but the dates there are the release dates not the build dates which are in DMI. So I've chosen to set the date to which we compare to one day past the release date of the 2.34 BIOS. I plan to fix this with a follow up commit once I've the necessary info. Cc: stable@vger.kernel.org Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-11sched, tracing: Fix trace_sched_pi_setprio() for deboostingSebastian Andrzej Siewior
commit 4ff648decf4712d39f184fc2df3163f43975575a upstream. Since the following commit: b91473ff6e97 ("sched,tracing: Update trace_sched_pi_setprio()") the sched_pi_setprio trace point shows the "newprio" during a deboost: |futex sched_pi_setprio: comm=futex_requeue_p pid"34 oldprio˜ newprio=3D98 |futex sched_switch: prev_comm=futex_requeue_p prev_pid"34 prev_prio=120 This patch open codes __rt_effective_prio() in the tracepoint as the 'newprio' to get the old behaviour back / the correct priority: |futex sched_pi_setprio: comm=futex_requeue_p pid"20 oldprio˜ newprio=3D120 |futex sched_switch: prev_comm=futex_requeue_p prev_pid"20 prev_prio=120 Peter suggested to open code the new priority so people using tracehook could get the deadline data out. Reported-by: Mansky Christian <man@keba.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: b91473ff6e97 ("sched,tracing: Update trace_sched_pi_setprio()") Link: http://lkml.kernel.org/r/20180524132647.gg6ziuogczdmjjzu@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-11dax: change bdev_dax_supported() to support boolean returnsDave Jiang
commit 80660f20252d6f76c9f203874ad7c7a4a8508cf8 upstream. The function return values are confusing with the way the function is named. We expect a true or false return value but it actually returns 0/-errno. This makes the code very confusing. Changing the return values to return a bool where if DAX is supported then return true and no DAX support returns false. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-11fs: allow per-device dax status checking for filesystemsDarrick J. Wong
commit ba23cba9b3bdc967aabdc6ff1e3e9b11ce05bb4f upstream. Change bdev_dax_supported so it takes a bdev parameter. This enables multi-device filesystems like xfs to check that a dax device can work for the particular filesystem. Once that's in place, actually fix all the parts of XFS where we need to be able to distinguish between datadev and rtdev. This patch fixes the problem where we screw up the dax support checking in xfs if the datadev and rtdev have different dax capabilities. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> [rez: Re-added __bdev_dax_supported() for !CONFIG_FS_DAX cases] Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-11mm: hwpoison: disable memory error handling on 1GB hugepageNaoya Horiguchi
commit 31286a8484a85e8b4e91ddb0f5415aee8a416827 upstream. Recently the following BUG was reported: Injecting memory failure for pfn 0x3c0000 at process virtual address 0x7fe300000000 Memory failure: 0x3c0000: recovery action for huge page: Recovered BUG: unable to handle kernel paging request at ffff8dfcc0003000 IP: gup_pgd_range+0x1f0/0xc20 PGD 17ae72067 P4D 17ae72067 PUD 0 Oops: 0000 [#1] SMP PTI ... CPU: 3 PID: 5467 Comm: hugetlb_1gb Not tainted 4.15.0-rc8-mm1-abc+ #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-1.fc25 04/01/2014 You can easily reproduce this by calling madvise(MADV_HWPOISON) twice on a 1GB hugepage. This happens because get_user_pages_fast() is not aware of a migration entry on pud that was created in the 1st madvise() event. I think that conversion to pud-aligned migration entry is working, but other MM code walking over page table isn't prepared for it. We need some time and effort to make all this work properly, so this patch avoids the reported bug by just disabling error handling for 1GB hugepage. [n-horiguchi@ah.jp.nec.com: v2] Link: http://lkml.kernel.org/r/1517284444-18149-1-git-send-email-n-horiguchi@ah.jp.nec.com Link: http://lkml.kernel.org/r/1517207283-15769-1-git-send-email-n-horiguchi@ah.jp.nec.com Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Punit Agrawal <punit.agrawal@arm.com> Tested-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-08netfilter: nf_tables: bogus EBUSY in chain deletionsPablo Neira Ayuso
commit bb7b40aecbf778c0c83a5bd62b0f03ca9f49a618 upstream. When removing a rule that jumps to chain and such chain in the same batch, this bogusly hits EBUSY. Add activate and deactivate operations to expression that can be called from the preparation and the commit/abort phases. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-08acpi: Add helper for deactivating memory regionHeikki Krogerus
commit d2d2e3c46be5d6dd8001d0eebdf7cafb9bc7006b upstream. Sometimes memory resource may be overlapping with SystemMemory Operation Region by design, for example if the memory region is used as a mailbox for communication with a firmware in the system. One occasion of such mailboxes is USB Type-C Connector System Software Interface (UCSI). With regions like that, it is important that the driver is able to map the memory with the requirements it has. For example, the driver should be allowed to map the memory as non-cached memory. However, if the operation region has been accessed before the driver has mapped the memory, the memory has been marked as write-back by the time the driver is loaded. That means the driver will fail to map the memory if it expects non-cached memory. To work around the problem, introducing helper that the drivers can use to temporarily deactivate (unmap) SystemMemory Operation Regions that overlap with their IO memory. Fixes: 8243edf44152 ("usb: typec: ucsi: Add ACPI driver") Cc: stable@vger.kernel.org Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-03block: Fix transfer when chunk sectors exceeds maxKeith Busch
commit 15bfd21fbc5d35834b9ea383dc458a1f0c9e3434 upstream. A device may have boundary restrictions where the number of sectors between boundaries exceeds its max transfer size. In this case, we need to cap the max size to the smaller of the two limits. Reported-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com> Tested-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com> Cc: <stable@vger.kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-03slub: fix failure when we delete and create a slab cacheMikulas Patocka
commit d50d82faa0c964e31f7a946ba8aba7c715ca7ab0 upstream. In kernel 4.17 I removed some code from dm-bufio that did slab cache merging (commit 21bb13276768: "dm bufio: remove code that merges slab caches") - both slab and slub support merging caches with identical attributes, so dm-bufio now just calls kmem_cache_create and relies on implicit merging. This uncovered a bug in the slub subsystem - if we delete a cache and immediatelly create another cache with the same attributes, it fails because of duplicate filename in /sys/kernel/slab/. The slub subsystem offloads freeing the cache to a workqueue - and if we create the new cache before the workqueue runs, it complains because of duplicate filename in sysfs. This patch fixes the bug by moving the call of kobject_del from sysfs_slab_remove_workfn to shutdown_cache. kobject_del must be called while we hold slab_mutex - so that the sysfs entry is deleted before a cache with the same attributes could be created. Running device-mapper-test-suite with: dmtest run --suite thin-provisioning -n /commit_failure_causes_fallback/ triggered: Buffer I/O error on dev dm-0, logical block 1572848, async page read device-mapper: thin: 253:1: metadata operation 'dm_pool_alloc_data_block' failed: error = -5 device-mapper: thin: 253:1: aborting current metadata transaction sysfs: cannot create duplicate filename '/kernel/slab/:a-0000144' CPU: 2 PID: 1037 Comm: kworker/u48:1 Not tainted 4.17.0.snitm+ #25 Hardware name: Supermicro SYS-1029P-WTR/X11DDW-L, BIOS 2.0a 12/06/2017 Workqueue: dm-thin do_worker [dm_thin_pool] Call Trace: dump_stack+0x5a/0x73 sysfs_warn_dup+0x58/0x70 sysfs_create_dir_ns+0x77/0x80 kobject_add_internal+0xba/0x2e0 kobject_init_and_add+0x70/0xb0 sysfs_slab_add+0xb1/0x250 __kmem_cache_create+0x116/0x150 create_cache+0xd9/0x1f0 kmem_cache_create_usercopy+0x1c1/0x250 kmem_cache_create+0x18/0x20 dm_bufio_client_create+0x1ae/0x410 [dm_bufio] dm_block_manager_create+0x5e/0x90 [dm_persistent_data] __create_persistent_data_objects+0x38/0x940 [dm_thin_pool] dm_pool_abort_metadata+0x64/0x90 [dm_thin_pool] metadata_operation_failed+0x59/0x100 [dm_thin_pool] alloc_data_block.isra.53+0x86/0x180 [dm_thin_pool] process_cell+0x2a3/0x550 [dm_thin_pool] do_worker+0x28d/0x8f0 [dm_thin_pool] process_one_work+0x171/0x370 worker_thread+0x49/0x3f0 kthread+0xf8/0x130 ret_from_fork+0x35/0x40 kobject_add_internal failed for :a-0000144 with -EEXIST, don't try to register things with the same name in the same directory. kmem_cache_create(dm_bufio_buffer-16) failed with error -17 Link: http://lkml.kernel.org/r/alpine.LRH.2.02.1806151817130.6333@file01.intranet.prod.int.rdu2.redhat.com Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Reported-by: Mike Snitzer <snitzer@redhat.com> Tested-by: Mike Snitzer <snitzer@redhat.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-03IB/hfi1: Optimize kthread pointer locking when queuing CQ entriesSebastian Sanchez
commit af8aab71370a692eaf7e7969ba5b1a455ac20113 upstream. All threads queuing CQ entries on different CQs are unnecessarily synchronized by a spin lock to check if the CQ kthread worker hasn't been destroyed before queuing an CQ entry. The lock used in 6efaf10f163d ("IB/rdmavt: Avoid queuing work into a destroyed cq kthread worker") is a device global lock and will have poor performance at scale as completions are entered from a large number of CPUs. Convert to use RCU where the read side of RCU is rvt_cq_enter() to determine that the worker is alive prior to triggering the completion event. Apply write side RCU semantics in rvt_driver_cq_init() and rvt_cq_exit(). Fixes: 6efaf10f163d ("IB/rdmavt: Avoid queuing work into a destroyed cq kthread worker") Cc: <stable@vger.kernel.org> # 4.14.x Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-03IB/core: Make testing MR flags for writability a static inline functionJack Morgenstein
commit 08bb558ac11ab944e0539e78619d7b4c356278bd upstream. Make the MR writability flags check, which is performed in umem.c, a static inline function in file ib_verbs.h This allows the function to be used by low-level infiniband drivers. Cc: <stable@vger.kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-03branch-check: fix long->int truncation when profiling branchesMikulas Patocka
commit 2026d35741f2c3ece73c11eb7e4a15d7c2df9ebe upstream. The function __builtin_expect returns long type (see the gcc documentation), and so do macros likely and unlikely. Unfortunatelly, when CONFIG_PROFILE_ANNOTATED_BRANCHES is selected, the macros likely and unlikely expand to __branch_check__ and __branch_check__ truncates the long type to int. This unintended truncation may cause bugs in various kernel code (we found a bug in dm-writecache because of it), so it's better to fix __branch_check__ to return long. Link: http://lkml.kernel.org/r/alpine.LRH.2.02.1805300818140.24812@file01.intranet.prod.int.rdu2.redhat.com Cc: Ingo Molnar <mingo@redhat.com> Cc: stable@vger.kernel.org Fixes: 1f0d69a9fc815 ("tracing: profile likely and unlikely annotations") Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26net: in virtio_net_hdr only add VLAN_HLEN to csum_start if payload holds vlanWillem de Bruijn
[ Upstream commit fd3a88625844907151737fc3b4201676effa6d27 ] Tun, tap, virtio, packet and uml vector all use struct virtio_net_hdr to communicate packet metadata to userspace. For skbuffs with vlan, the first two return the packet as it may have existed on the wire, inserting the VLAN tag in the user buffer. Then virtio_net_hdr.csum_start needs to be adjusted by VLAN_HLEN bytes. Commit f09e2249c4f5 ("macvtap: restore vlan header on user read") added this feature to macvtap. Commit 3ce9b20f1971 ("macvtap: Fix csum_start when VLAN tags are present") then fixed up csum_start. Virtio, packet and uml do not insert the vlan header in the user buffer. When introducing virtio_net_hdr_from_skb to deduplicate filling in the virtio_net_hdr, the variant from macvtap which adds VLAN_HLEN was applied uniformly, breaking csum offset for packets with vlan on virtio and packet. Make insertion of VLAN_HLEN optional. Convert the callers to pass it when needed. Fixes: e858fae2b0b8f4 ("virtio_net: use common code for virtio_net_hdr and skb GSO conversion") Fixes: 1276f24eeef2 ("packet: use common code for virtio_net_hdr and skb GSO conversion") Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26udp: fix rx queue len reported by diag and proc interfacePaolo Abeni
[ Upstream commit 6c206b20092a3623184cff9470dba75d21507874 ] After commit 6b229cf77d68 ("udp: add batching to udp_rmem_release()") the sk_rmem_alloc field does not measure exactly anymore the receive queue length, because we batch the rmem release. The issue is really apparent only after commit 0d4a6608f68c ("udp: do rmem bulk free even if the rx sk queue is empty"): the user space can easily check for an empty socket with not-0 queue length reported by the 'ss' tool or the procfs interface. We need to use a custom UDP helper to report the correct queue length, taking into account the forward allocation deficit. Reported-by: trevor.francis@46labs.com Fixes: 6b229cf77d68 ("UDP: add batching to udp_rmem_release()") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-21locking/percpu-rwsem: Annotate rwsem ownership transfer by setting ↵Waiman Long
RWSEM_OWNER_UNKNOWN [ Upstream commit 5a817641f68a6399a5fac8b7d2da67a73698ffed ] The filesystem freezing code needs to transfer ownership of a rwsem embedded in a percpu-rwsem from the task that does the freezing to another one that does the thawing by calling percpu_rwsem_release() after freezing and percpu_rwsem_acquire() before thawing. However, the new rwsem debug code runs afoul with this scheme by warning that the task that releases the rwsem isn't the one that acquires it, as reported by Amir Goldstein: DEBUG_LOCKS_WARN_ON(sem->owner != get_current()) WARNING: CPU: 1 PID: 1401 at /home/amir/build/src/linux/kernel/locking/rwsem.c:133 up_write+0x59/0x79 Call Trace: percpu_up_write+0x1f/0x28 thaw_super_locked+0xdf/0x120 do_vfs_ioctl+0x270/0x5f1 ksys_ioctl+0x52/0x71 __x64_sys_ioctl+0x16/0x19 do_syscall_64+0x5d/0x167 entry_SYSCALL_64_after_hwframe+0x49/0xbe To work properly with the rwsem debug code, we need to annotate that the rwsem ownership is unknown during the tranfer period until a brave soul comes forward to acquire the ownership. During that period, optimistic spinning will be disabled. Reported-by: Amir Goldstein <amir73il@gmail.com> Tested-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Waiman Long <longman@redhat.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Jan Kara <jack@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Theodore Y. Ts'o <tytso@mit.edu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-fsdevel@vger.kernel.org Link: http://lkml.kernel.org/r/1526420991-21213-3-git-send-email-longman@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-21mtd: Fix comparison in map_word_andequal()Ben Hutchings
[ Upstream commit ea739a287f4f16d6250bea779a1026ead79695f2 ] Commit 9e343e87d2c4 ("mtd: cfi: convert inline functions to macros") changed map_word_andequal() into a macro, but also changed the right hand side of the comparison from val3 to val2. Change it back to use val3 on the right hand side. Thankfully this did not cause a regression because all callers currently pass the same argument for val2 and val3. Fixes: 9e343e87d2c4 ("mtd: cfi: convert inline functions to macros") Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-21KVM: Extend MAX_IRQ_ROUTES to 4096 for all archsWanpeng Li
[ Upstream commit ddc9cfb79c1096a0855839631c091aa7e9602052 ] Our virtual machines make use of device assignment by configuring 12 NVMe disks for high I/O performance. Each NVMe device has 129 MSI-X Table entries: Capabilities: [50] MSI-X: Enable+ Count=129 Masked-Vector table: BAR=0 offset=00002000 The windows virtual machines fail to boot since they will map the number of MSI-table entries that the NVMe hardware reported to the bus to msi routing table, this will exceed the 1024. This patch extends MAX_IRQ_ROUTES to 4096 for all archs, in the future this might be extended again if needed. Reviewed-by: Cornelia Huck <cohuck@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim KrÄmář <rkrcmar@redhat.com> Cc: Cornelia Huck <cohuck@redhat.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Tonny Lu <tonnylu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-21sched/core: Introduce set_special_state()Peter Zijlstra
[ Upstream commit b5bf9a90bbebffba888c9144c5a8a10317b04064 ] Gaurav reported a perceived problem with TASK_PARKED, which turned out to be a broken wait-loop pattern in __kthread_parkme(), but the reported issue can (and does) in fact happen for states that do not do condition based sleeps. When the 'current->state = TASK_RUNNING' store of a previous (concurrent) try_to_wake_up() collides with the setting of a 'special' sleep state, we can loose the sleep state. Normal condition based wait-loops are immune to this problem, but for sleep states that are not condition based are subject to this problem. There already is a fix for TASK_DEAD. Abstract that and also apply it to TASK_STOPPED and TASK_TRACED, both of which are also without condition based wait-loop. Reported-by: Gaurav Kohli <gkohli@codeaurora.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-21<linux/stringhash.h>: fix end_name_hash() for 64bit longAmir Goldstein
[ Upstream commit 19b9ad67310ed2f685062a00aec602bec33835f0 ] The comment claims that this helper will try not to loose bits, but for 64bit long it looses the high bits before hashing 64bit long into 32bit int. Use the helper hash_long() to do the right thing for 64bit long. For 32bit long, there is no change. All the callers of end_name_hash() either assign the result to qstr->hash, which is u32 or return the result as an int value (e.g. full_name_hash()). Change the helper return type to int to conform to its users. [ It took me a while to apply this, because my initial reaction to it was - incorrectly - that it could make for slower code. After having looked more at it, I take back all my complaints about the patch, Amir was right and I was mis-reading things or just being stupid. I also don't worry too much about the possible performance impact of this on 64-bit, since most architectures that actually care about performance end up not using this very much (the dcache code is the most performance-critical, but the word-at-a-time case uses its own hashing anyway). So this ends up being mostly used for filesystems that do their own degraded hashing (usually because they want a case-insensitive comparison function). A _tiny_ worry remains, in that not everybody uses DCACHE_WORD_ACCESS, and then this potentially makes things more expensive on 64-bit architectures with slow or lacking multipliers even for the normal case. That said, realistically the only such architecture I can think of is PA-RISC. Nobody really cares about performance on that, it's more of a "look ma, I've got warts^W an odd machine" platform. So the patch is fine, and all my initial worries were just misplaced from not looking at this properly. - Linus ] Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-21blk-mq: fix sysfs inflight counterOmar Sandoval
[ Upstream commit bf0ddaba65ddbb2715af97041da8e7a45b2d8628 ] When the blk-mq inflight implementation was added, /proc/diskstats was converted to use it, but /sys/block/$dev/inflight was not. Fix it by adding another helper to count in-flight requests by data direction. Fixes: f299b7c7a9de ("blk-mq: provide internal in-flight variant") Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-21net: ethtool: Add missing kernel doc for FEC parametersFlorian Fainelli
[ Upstream commit d805c5209350ae725e3a1ee0204ba27d9e75ce3e ] While adding support for ethtool::get_fecparam and set_fecparam, kernel doc for these functions was missed, add those. Fixes: 1a5f3da20bd9 ("net: ethtool: add support for forward error correction modes") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-21soc: bcm2835: Make !RASPBERRYPI_FIRMWARE dummies return failureGeert Uytterhoeven
[ Upstream commit 144345a4a8c3b497a3f60d3af9d6071a37660186 ] If CONFIG_RASPBERRYPI_FIRMWARE=n: drivers/gpio/gpio-raspberrypi-exp.c: In function ‘rpi_exp_gpio_get_polarity’: drivers/gpio/gpio-raspberrypi-exp.c:71: warning: ‘get.polarity’ is used uninitialized in this function drivers/gpio/gpio-raspberrypi-exp.c: In function ‘rpi_exp_gpio_get_direction’: drivers/gpio/gpio-raspberrypi-exp.c:150: warning: ‘get.direction’ is used uninitialized in this function The dummy firmware interface functions return 0, which means success, causing subsequent code to make use of the never initialized output parameter. Fix this by making the dummy functions return an error code (-ENOSYS) instead. Note that this assumes the firmware always fills in the requested data in the CONFIG_RASPBERRYPI_FIRMWARE=y case. Fixes: d45f1a563b92dac7 ("staging: vc04_services: fix up rpi firmware functions") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Eric Anholt <eric@anholt.net> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-21clk: honor CLK_MUX_ROUND_CLOSEST in generic clk muxJerome Brunet
[ Upstream commit 4ad69b80e886a845f56ce0a3d10211208693d92b ] CLK_MUX_ROUND_CLOSEST is part of the clk_mux documentation but clk_mux directly calls __clk_mux_determine_rate(), which overrides the flag. As result, if clk_mux is instantiated with CLK_MUX_ROUND_CLOSEST, the flag will be ignored and the clock rounded down. To solve this, this patch expose clk_mux_determine_rate_flags() in the clk-provider API and uses it in the determine_rate() callback of clk_mux. Fixes: 15a02c1f6dd7 ("clk: Add __clk_mux_determine_rate_closest") Signed-off-by: Jerome Brunet <jbrunet@baylibre.com> Signed-off-by: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-11ipv6: omit traffic class when calculating flow hashMichal Kubecek
[ Upstream commit fa1be7e01ea863e911349e30456706749518eeab ] Some of the code paths calculating flow hash for IPv6 use flowlabel member of struct flowi6 which, despite its name, encodes both flow label and traffic class. If traffic class changes within a TCP connection (as e.g. ssh does), ECMP route can switch between path. It's also inconsistent with other code paths where ip6_flowlabel() (returning only flow label) is used to feed the key. Use only flow label everywhere, including one place where hash key is set using ip6_flowinfo(). Fixes: 51ebd3181572 ("ipv6: add support of equal cost multipath (ECMP)") Fixes: f70ea018da06 ("net: Add functions to get skb->hash based on flow structures") Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-11btrfs: define SUPER_FLAG_METADUMP_V2Anand Jain
commit e2731e55884f2138a252b0a3d7b24d57e49c3c59 upstream. btrfs-progs uses super flag bit BTRFS_SUPER_FLAG_METADUMP_V2 (1ULL << 34). So just define that in kernel so that we know its been used. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-05iio:buffer: make length types match kfifo typesMartin Kelly
commit c043ec1ca5baae63726aae32abbe003192bc6eec upstream. Currently, we use int for buffer length and bytes_per_datum. However, kfifo uses unsigned int for length and size_t for element size. We need to make sure these matches or we will have bugs related to overflow (in the range between INT_MAX and UINT_MAX for length, for example). In addition, set_bytes_per_datum uses size_t while bytes_per_datum is an int, which would cause bugs for large values of bytes_per_datum. Change buffer length to use unsigned int and bytes_per_datum to use size_t. Signed-off-by: Martin Kelly <mkelly@xevo.com> Cc: <Stable@vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-05tcp: avoid integer overflows in tcp_rcv_space_adjust()Eric Dumazet
commit 607065bad9931e72207b0cac365d7d4abc06bd99 upstream. When using large tcp_rmem[2] values (I did tests with 500 MB), I noticed overflows while computing rcvwin. Lets fix this before the following patch. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Wei Wang <weiwan@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> [Backport: sysctl_tcp_rmem is not Namespace-ify'd in older kernels] Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-05cfg80211: further limit wiphy names to 64 bytesEric Biggers
commit 814596495dd2b9d4aab92d8f89cf19060d25d2ea upstream. wiphy names were recently limited to 128 bytes by commit a7cfebcb7594 ("cfg80211: limit wiphy names to 128 bytes"). As it turns out though, this isn't sufficient because dev_vprintk_emit() needs the syslog header string "SUBSYSTEM=ieee80211\0DEVICE=+ieee80211:$devname" to fit into 128 bytes. This triggered the "device/subsystem name too long" WARN when the device name was >= 90 bytes. As before, this was reproduced by syzbot by sending an HWSIM_CMD_NEW_RADIO command to the MAC80211_HWSIM generic netlink family. Fix it by further limiting wiphy names to 64 bytes. Reported-by: syzbot+e64565577af34b3768dc@syzkaller.appspotmail.com Fixes: a7cfebcb7594 ("cfg80211: limit wiphy names to 128 bytes") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30vlan: Fix vlan insertion for packets without ethernet headerToshiaki Makita
[ Upstream commit c769accdf3d8a103940bea2979b65556718567e9 ] In some situation vlan packets do not have ethernet headers. One example is packets from tun devices. Users can specify vlan protocol in tun_pi field instead of IP protocol. When we have a vlan device with reorder_hdr disabled on top of the tun device, such packets from tun devices are untagged in skb_vlan_untag() and vlan headers will be inserted back in vlan_insert_inner_tag(). vlan_insert_inner_tag() however did not expect packets without ethernet headers, so in such a case size argument for memmove() underflowed. We don't need to copy headers for packets which do not have preceding headers of vlan headers, so skip memmove() in that case. Also don't write vlan protocol in skb->data when it does not have enough room for it. Fixes: cbe7128c4b92 ("vlan: Fix out of order vlan headers with reorder header off") Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30llc: properly handle dev_queue_xmit() return valueCong Wang
[ Upstream commit b85ab56c3f81c5a24b5a5213374f549df06430da ] llc_conn_send_pdu() pushes the skb into write queue and calls llc_conn_send_pdus() to flush them out. However, the status of dev_queue_xmit() is not returned to caller, in this case, llc_conn_state_process(). llc_conn_state_process() needs hold the skb no matter success or failure, because it still uses it after that, therefore we should hold skb before dev_queue_xmit() when that skb is the one being processed by llc_conn_state_process(). For other callers, they can just pass NULL and ignore the return value as they are. Reported-by: Noam Rathaus <noamr@beyondsecurity.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30vlan: Fix out of order vlan headers with reorder header offToshiaki Makita
[ Upstream commit cbe7128c4b92e2004984f477fd38dfa81662f02e ] With reorder header off, received packets are untagged in skb_vlan_untag() called from within __netif_receive_skb_core(), and later the tag will be inserted back in vlan_do_receive(). This caused out of order vlan headers when we create a vlan device on top of another vlan device, because vlan_do_receive() inserts a tag as the outermost vlan tag. E.g. the outer tag is first removed in skb_vlan_untag() and inserted back in vlan_do_receive(), then the inner tag is next removed and inserted back as the outermost tag. This patch fixes the behaviour by inserting the inner tag at the right position. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30net: Fix vlan untag for bridge and vlan_dev with reorder_hdr offToshiaki Makita
[ Upstream commit 4bbb3e0e8239f9079bf1fe20b3c0cb598714ae61 ] When we have a bridge with vlan_filtering on and a vlan device on top of it, packets would be corrupted in skb_vlan_untag() called from br_dev_xmit(). The problem sits in skb_reorder_vlan_header() used in skb_vlan_untag(), which makes use of skb->mac_len. In this function mac_len is meant for handling rx path with vlan devices with reorder_header disabled, but in tx path mac_len is typically 0 and cannot be used, which is the problem in this case. The current code even does not properly handle rx path (skb_vlan_untag() called from __netif_receive_skb_core()) with reorder_header off actually. In rx path single tag case, it works as follows: - Before skb_reorder_vlan_header() mac_header data v v +-------------------+-------------+------+---- | ETH | VLAN | ETH | | ADDRS | TPID | TCI | TYPE | +-------------------+-------------+------+---- <-------- mac_len ---------> <-------------> to be removed - After skb_reorder_vlan_header() mac_header data v v +-------------------+------+---- | ETH | ETH | | ADDRS | TYPE | +-------------------+------+---- <-------- mac_len ---------> This is ok, but in rx double tag case, it corrupts packets: - Before skb_reorder_vlan_header() mac_header data v v +-------------------+-------------+-------------+------+---- | ETH | VLAN | VLAN | ETH | | ADDRS | TPID | TCI | TPID | TCI | TYPE | +-------------------+-------------+-------------+------+---- <--------------- mac_len ----------------> <-------------> should be removed <---------------------------> actually will be removed - After skb_reorder_vlan_header() mac_header data v v +-------------------+------+---- | ETH | ETH | | ADDRS | TYPE | +-------------------+------+---- <--------------- mac_len ----------------> So, two of vlan tags are both removed while only inner one should be removed and mac_header (and mac_len) is broken. skb_vlan_untag() is meant for removing the vlan header at (skb->data - 2), so use skb->data and skb->mac_header to calculate the right offset. Reported-by: Brandon Carpenter <brandon.carpenter@cypherpath.com> Fixes: a6e18ff11170 ("vlan: Fix untag operations of stacked vlans with REORDER_HEADER off") Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30ipv4: lock mtu in fnhe when received PMTU < net.ipv4.route.min_pmtuSabrina Dubroca
[ Upstream commit d52e5a7e7ca49457dd31fc8b42fb7c0d58a31221 ] Prior to the rework of PMTU information storage in commit 2c8cec5c10bc ("ipv4: Cache learned PMTU information in inetpeer."), when a PMTU event advertising a PMTU smaller than net.ipv4.route.min_pmtu was received, we would disable setting the DF flag on packets by locking the MTU metric, and set the PMTU to net.ipv4.route.min_pmtu. Since then, we don't disable DF, and set PMTU to net.ipv4.route.min_pmtu, so the intermediate router that has this link with a small MTU will have to drop the packets. This patch reestablishes pre-2.6.39 behavior by splitting rtable->rt_pmtu into a bitfield with rt_mtu_locked and rt_pmtu. rt_mtu_locked indicates that we shouldn't set the DF bit on that path, and is checked in ip_dont_fragment(). One possible workaround is to set net.ipv4.route.min_pmtu to a value low enough to accommodate the lowest MTU encountered. Fixes: 2c8cec5c10bc ("ipv4: Cache learned PMTU information in inetpeer.") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30block: display the correct diskname for bioJiufei Xue
[ Upstream commit 9c0fb1e313aaf4e8edec22433c8b22dd308e466c ] bio_devname use __bdevname to display the device name, and can only show the major and minor of the part0, Fix this by using disk_name to display the correct name. Fixes: 74d46992e0d9 ("block: replace bi_bdev with a gendisk pointer and partitions index") Reviewed-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30ARC: mcip: update MCIP debug mask when the new cpu came onlineEugeniy Paltsev
[ Upstream commit f3205de98db2fc8083796dd5ad81b191e436fab8 ] As of today we use hardcoded MCIP debug mask, so if we launch kernel via debugger and kick fever cores than HW has all cpus hang at the momemt of setup MCIP debug mask. So update MCIP debug mask when the new cpu came online, instead of use hardcoded MCIP debug mask. Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30ARC: mcip: halt GFRC counter when ARC cores haltEugeniy Paltsev
[ Upstream commit 07423d00a2b2a71a97e4287d9262cb83c4c4c89f ] In SMP systems, GFRC is used for clocksource. However by default the counter keeps running even when core is halted (say when debugging via a JTAG debugger). This confuses Linux timekeeping and triggers flase RCU stall splat such as below: | [ARCLinux]# while true; do ./shm_open_23-1.run-test ; done | Running with 1000 processes for 1000 objects | hrtimer: interrupt took 485060 ns | | create_cnt: 1000 | Running with 1000 processes for 1000 objects | [ARCLinux]# INFO: rcu_preempt self-detected stall on CPU | 2-...: (1 GPs behind) idle=a01/1/0 softirq=135770/135773 fqs=0 | INFO: rcu_preempt detected stalls on CPUs/tasks: | 0-...: (1 GPs behind) idle=71e/0/0 softirq=135264/135264 fqs=0 | 2-...: (1 GPs behind) idle=a01/1/0 softirq=135770/135773 fqs=0 | 3-...: (1 GPs behind) idle=4e0/0/0 softirq=134304/134304 fqs=0 | (detected by 1, t=13648 jiffies, g=31493, c=31492, q=1) Starting from ARC HS v3.0 it's possible to tie GFRC to state of up-to 4 ARC cores with help of GFRC's CORE register where we set a mask for cores which state we need to rely on. We update cpu mask every time new cpu came online instead of using hardcoded one or using mask generated from "possible_cpus" as we want it set correctly even if we run kernel on HW which has fewer cores than expected (or we launch kernel via debugger and kick fever cores than HW has) Note that GFRC halts when all cores have halted and thus relies on programming of Inter-Core-dEbug register to halt all cores when one halts. Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com> Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com> [vgupta: rewrote changelog] Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30virtio-gpu: fix ioctl and expose the fixed status to userspace.Dave Airlie
[ Upstream commit 9a191b114906457c4b2494c474f58ae4142d4e67 ] This exposes to mesa that it can use the fixed ioctl for querying later cap sets, cap set 1 is forever frozen in time. Signed-off-by: Dave Airlie <airlied@redhat.com> Link: http://patchwork.freedesktop.org/patch/msgid/20180221015003.22884-1-airlied@gmail.com Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30kvm: fix warning for CONFIG_HAVE_KVM_EVENTFD buildsSebastian Ott
[ Upstream commit 076467490b8176eb96eddc548a14d4135c7b5852 ] Move the kvm_arch_irq_routing_update() prototype outside of ifdef CONFIG_HAVE_KVM_EVENTFD guards to fix the following sparse warning: arch/s390/kvm/../../../virt/kvm/irqchip.c:171:28: warning: symbol 'kvm_arch_irq_routing_update' was not declared. Should it be static? Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30regulatory: add NUL to request alpha2Johannes Berg
[ Upstream commit 657308f73e674e86b60509a430a46e569bf02846 ] Similar to the ancient commit a5fe8e7695dc ("regulatory: add NUL to alpha2"), add another byte to alpha2 in the request struct so that when we use nla_put_string(), we don't overrun anything. Fixes: 73d54c9e74c4 ("cfg80211: add regulatory netlink multicast group") Reported-by: Kees Cook <keescook@google.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30bug.h: work around GCC PR82365 in BUG()Arnd Bergmann
[ Upstream commit 173a3efd3edb2ef6ef07471397c5f542a360e9c1 ] Looking at functions with large stack frames across all architectures led me discovering that BUG() suffers from the same problem as fortify_panic(), which I've added a workaround for already. In short, variables that go out of scope by calling a noreturn function or __builtin_unreachable() keep using stack space in functions afterwards. A workaround that was identified is to insert an empty assembler statement just before calling the function that doesn't return. I'm adding a macro "barrier_before_unreachable()" to document this, and insert calls to that in all instances of BUG() that currently suffer from this problem. The files that saw the largest change from this had these frame sizes before, and much less with my patch: fs/ext4/inode.c:82:1: warning: the frame size of 1672 bytes is larger than 800 bytes [-Wframe-larger-than=] fs/ext4/namei.c:434:1: warning: the frame size of 904 bytes is larger than 800 bytes [-Wframe-larger-than=] fs/ext4/super.c:2279:1: warning: the frame size of 1160 bytes is larger than 800 bytes [-Wframe-larger-than=] fs/ext4/xattr.c:146:1: warning: the frame size of 1168 bytes is larger than 800 bytes [-Wframe-larger-than=] fs/f2fs/inode.c:152:1: warning: the frame size of 1424 bytes is larger than 800 bytes [-Wframe-larger-than=] net/netfilter/ipvs/ip_vs_core.c:1195:1: warning: the frame size of 1068 bytes is larger than 800 bytes [-Wframe-larger-than=] net/netfilter/ipvs/ip_vs_core.c:395:1: warning: the frame size of 1084 bytes is larger than 800 bytes [-Wframe-larger-than=] net/netfilter/ipvs/ip_vs_ftp.c:298:1: warning: the frame size of 928 bytes is larger than 800 bytes [-Wframe-larger-than=] net/netfilter/ipvs/ip_vs_ftp.c:418:1: warning: the frame size of 908 bytes is larger than 800 bytes [-Wframe-larger-than=] net/netfilter/ipvs/ip_vs_lblcr.c:718:1: warning: the frame size of 960 bytes is larger than 800 bytes [-Wframe-larger-than=] drivers/net/xen-netback/netback.c:1500:1: warning: the frame size of 1088 bytes is larger than 800 bytes [-Wframe-larger-than=] In case of ARC and CRIS, it turns out that the BUG() implementation actually does return (or at least the compiler thinks it does), resulting in lots of warnings about uninitialized variable use and leaving noreturn functions, such as: block/cfq-iosched.c: In function 'cfq_async_queue_prio': block/cfq-iosched.c:3804:1: error: control reaches end of non-void function [-Werror=return-type] include/linux/dmaengine.h: In function 'dma_maxpq': include/linux/dmaengine.h:1123:1: error: control reaches end of non-void function [-Werror=return-type] This makes them call __builtin_trap() instead, which should normally dump the stack and kill the current process, like some of the other architectures already do. I tried adding barrier_before_unreachable() to panic() and fortify_panic() as well, but that had very little effect, so I'm not submitting that patch. Vineet said: : For ARC, it is double win. : : 1. Fixes 3 -Wreturn-type warnings : : | ../net/core/ethtool.c:311:1: warning: control reaches end of non-void function : [-Wreturn-type] : | ../kernel/sched/core.c:3246:1: warning: control reaches end of non-void function : [-Wreturn-type] : | ../include/linux/sunrpc/svc_xprt.h:180:1: warning: control reaches end of : non-void function [-Wreturn-type] : : 2. bloat-o-meter reports code size improvements as gcc elides the : generated code for stack return. Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82365 Link: http://lkml.kernel.org/r/20171219114112.939391-1-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Vineet Gupta <vgupta@synopsys.com> [arch/arc] Tested-by: Vineet Gupta <vgupta@synopsys.com> [arch/arc] Cc: Mikael Starvik <starvik@axis.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Christopher Li <sparse@chrisli.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kees Cook <keescook@chromium.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30mac80211: round IEEE80211_TX_STATUS_HEADROOM up to multiple of 4Felix Fietkau
[ Upstream commit 651b9920d7a694ffb1f885aef2bbb068a25d9d66 ] This ensures that mac80211 allocated management frames are properly aligned, which makes copying them more efficient. For instance, mt76 uses iowrite32_copy to copy beacon frames to beacon template memory on the chip. Misaligned 32-bit accesses cause CPU exceptions on MIPS and should be avoided. Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30ptr_ring: prevent integer overflow when calculating sizeJason Wang
[ Upstream commit 54e02162d4454a99227f520948bf4494c3d972d0 ] Switch to use dividing to prevent integer overflow when size is too big to calculate allocation size properly. Reported-by: Eric Biggers <ebiggers3@gmail.com> Fixes: 6e6e41c31122 ("ptr_ring: fail early if queue occupies more than KMALLOC_MAX_SIZE") Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30IB/umem: Use the correct mm during ib_umem_releaseLidong Chen
commit 8e907ed4882714fd13cfe670681fc6cb5284c780 upstream. User-space may invoke ibv_reg_mr and ibv_dereg_mr in different threads. If ibv_dereg_mr is called after the thread which invoked ibv_reg_mr has exited, get_pid_task will return NULL and ib_umem_release will not decrease mm->pinned_vm. Instead of using threads to locate the mm, use the overall tgid from the ib_ucontext struct instead. This matches the behavior of ODP and disassociate in handling the mm of the process that called ibv_reg_mr. Cc: <stable@vger.kernel.org> Fixes: 87773dd56d54 ("IB: ib_umem_release() should decrement mm->pinned_vm from ib_umem_get") Signed-off-by: Lidong Chen <lidongchen@tencent.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30do d_instantiate/unlock_new_inode combinations safelyAl Viro
commit 1e2e547a93a00ebc21582c06ca3c6cfea2a309ee upstream. For anything NFS-exported we do _not_ want to unlock new inode before it has grown an alias; original set of fixes got the ordering right, but missed the nasty complication in case of lockdep being enabled - unlock_new_inode() does lockdep_annotate_inode_mutex_key(inode) which can only be done before anyone gets a chance to touch ->i_mutex. Unfortunately, flipping the order and doing unlock_new_inode() before d_instantiate() opens a window when mkdir can race with open-by-fhandle on a guessed fhandle, leading to multiple aliases for a directory inode and all the breakage that follows from that. Correct solution: a new primitive (d_instantiate_new()) combining these two in the right order - lockdep annotate, then d_instantiate(), then the rest of unlock_new_inode(). All combinations of d_instantiate() with unlock_new_inode() should be converted to that. Cc: stable@kernel.org # 2.6.29 and later Tested-by: Mike Marshall <hubcap@omnibond.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-25scsi: core: Make SCSI Status CONDITION MET equivalent to GOODDouglas Gilbert
[ Upstream commit 1875ede02ed5e176a18dccbca84abc28d5b3e141 ] The SCSI PRE-FETCH (10 or 16) command is present both on hard disks and some SSDs. It is useful when the address of the next block(s) to be read is known but it is not following the LBA of the current READ (so read-ahead won't help). It returns two "good" SCSI Status values. If the requested blocks have fitted (or will most likely fit (when the IMMED bit is set)) into the disk's cache, it returns CONDITION MET. If it didn't (or will not) fit then it returns GOOD status. The goal of this patch is to stop the SCSI subsystem treating the CONDITION MET SCSI status as an error. The current state makes the PRE-FETCH command effectively unusable via pass-throughs. Signed-off-by: Douglas Gilbert <dgilbert@interlog.com> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-25usb: gadget: composite: fix incorrect handling of OS desc requestsChris Dickens
[ Upstream commit 5d6ae4f0da8a64a185074dabb1b2f8c148efa741 ] When handling an OS descriptor request, one of the first operations is to zero out the request buffer using the wLength from the setup packet. There is no bounds checking, so a wLength > 4096 would clobber memory adjacent to the request buffer. Fix this by taking the min of wLength and the request buffer length prior to the memset. While at it, define the buffer length in a header file so that magic numbers don't appear throughout the code. When returning data to the host, the data length should be the min of the wLength and the valid data we have to return. Currently we are returning wLength, thus requests for a wLength greater than the amount of data in the OS descriptor buffer would return invalid (albeit zero'd) data following the valid descriptor data. Fix this by counting the number of bytes when constructing the data and using this when determining the length of the request. Signed-off-by: Chris Dickens <christopher.a.dickens@gmail.com> Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-25net: usbnet: fix potential deadlock on 32bit hostsEric Dumazet
[ Upstream commit 2695578b896aea472b2c0dcbe9d92daa71738484 ] Marek reported a LOCKDEP issue occurring on 32bit host, that we tracked down to the fact that usbnet could either run from soft or hard irqs. This patch adds u64_stats_update_begin_irqsave() and u64_stats_update_end_irqrestore() helpers to solve this case. [ 17.768040] ================================ [ 17.772239] WARNING: inconsistent lock state [ 17.776511] 4.16.0-rc3-next-20180227-00007-g876c53a7493c #453 Not tainted [ 17.783329] -------------------------------- [ 17.787580] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. [ 17.793607] swapper/0/0 [HC0[0]:SC1[1]:HE1:SE0] takes: [ 17.798751] (&syncp->seq#5){?.-.}, at: [<9b22e5f0>] asix_rx_fixup_internal+0x188/0x288 [ 17.806790] {IN-HARDIRQ-W} state was registered at: [ 17.811677] tx_complete+0x100/0x208 [ 17.815319] __usb_hcd_giveback_urb+0x60/0xf0 [ 17.819770] xhci_giveback_urb_in_irq+0xa8/0x240 [ 17.824469] xhci_td_cleanup+0xf4/0x16c [ 17.828367] xhci_irq+0xe74/0x2240 [ 17.831827] usb_hcd_irq+0x24/0x38 [ 17.835343] __handle_irq_event_percpu+0x98/0x510 [ 17.840111] handle_irq_event_percpu+0x1c/0x58 [ 17.844623] handle_irq_event+0x38/0x5c [ 17.848519] handle_fasteoi_irq+0xa4/0x138 [ 17.852681] generic_handle_irq+0x18/0x28 [ 17.856760] __handle_domain_irq+0x6c/0xe4 [ 17.860941] gic_handle_irq+0x54/0xa0 [ 17.864666] __irq_svc+0x70/0xb0 [ 17.867964] arch_cpu_idle+0x20/0x3c [ 17.871578] arch_cpu_idle+0x20/0x3c [ 17.875190] do_idle+0x144/0x218 [ 17.878468] cpu_startup_entry+0x18/0x1c [ 17.882454] start_kernel+0x394/0x400 [ 17.886177] irq event stamp: 161912 [ 17.889616] hardirqs last enabled at (161912): [<7bedfacf>] __netdev_alloc_skb+0xcc/0x140 [ 17.897893] hardirqs last disabled at (161911): [<d58261d0>] __netdev_alloc_skb+0x94/0x140 [ 17.904903] exynos5-hsi2c 12ca0000.i2c: tx timeout [ 17.906116] softirqs last enabled at (161904): [<387102ff>] irq_enter+0x78/0x80 [ 17.906123] softirqs last disabled at (161905): [<cf4c628e>] irq_exit+0x134/0x158 [ 17.925722]. [ 17.925722] other info that might help us debug this: [ 17.933435] Possible unsafe locking scenario: [ 17.933435]. [ 17.940331] CPU0 [ 17.942488] ---- [ 17.944894] lock(&syncp->seq#5); [ 17.948274] <Interrupt> [ 17.950847] lock(&syncp->seq#5); [ 17.954386]. [ 17.954386] *** DEADLOCK *** [ 17.954386]. [ 17.962422] no locks held by swapper/0/0. Fixes: c8b5d129ee29 ("net: usbnet: support 64bit stats") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Greg Ungerer <gerg@linux-m68k.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-25cfg80211: limit wiphy names to 128 bytesJohannes Berg
commit a7cfebcb7594a24609268f91299ab85ba064bf82 upstream. There's currently no limit on wiphy names, other than netlink message size and memory limitations, but that causes issues when, for example, the wiphy name is used in a uevent, e.g. in rfkill where we use the same name for the rfkill instance, and then the buffer there is "only" 2k for the environment variables. This was reported by syzkaller, which used a 4k name. Limit the name to something reasonable, I randomly picked 128. Reported-by: syzbot+230d9e642a85d3fec29c@syzkaller.appspotmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-22seccomp: Move speculation migitation control to arch codeThomas Gleixner
commit 8bf37d8c067bb7eb8e7c381bdadf9bd89182b6bc upstream The migitation control is simpler to implement in architecture code as it avoids the extra function call to check the mode. Aside of that having an explicit seccomp enabled mode in the architecture mitigations would require even more workarounds. Move it into architecture code and provide a weak function in the seccomp code. Remove the 'which' argument as this allows the architecture to decide which mitigations are relevant for seccomp. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-22seccomp: Add filter flag to opt-out of SSB mitigationKees Cook
commit 00a02d0c502a06d15e07b857f8ff921e3e402675 upstream If a seccomp user is not interested in Speculative Store Bypass mitigation by default, it can set the new SECCOMP_FILTER_FLAG_SPEC_ALLOW flag when adding filters. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>