summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)Author
2016-08-16Linux 4.7.1v4.7.1Greg Kroah-Hartman
2016-08-16ext4: fix reference counting bug on block allocation errorVegard Nossum
commit 554a5ccc4e4a20c5f3ec859de0842db4b4b9c77e upstream. If we hit this error when mounted with errors=continue or errors=remount-ro: EXT4-fs error (device loop0): ext4_mb_mark_diskspace_used:2940: comm ext4.exe: Allocating blocks 5090-6081 which overlap fs metadata then ext4_mb_new_blocks() will call ext4_mb_release_context() and try to continue. However, ext4_mb_release_context() is the wrong thing to call here since we are still actually using the allocation context. Instead, just error out. We could retry the allocation, but there is a possibility of getting stuck in an infinite loop instead, so this seems safer. [ Fixed up so we don't return EAGAIN to userspace. --tytso ] Fixes: 8556e8f3b6 ("ext4: Don't allow new groups to be added during block allocation") Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-16ext4: short-cut orphan cleanup on errorVegard Nossum
commit c65d5c6c81a1f27dec5f627f67840726fcd146de upstream. If we encounter a filesystem error during orphan cleanup, we should stop. Otherwise, we may end up in an infinite loop where the same inode is processed again and again. EXT4-fs (loop0): warning: checktime reached, running e2fsck is recommended EXT4-fs error (device loop0): ext4_mb_generate_buddy:758: group 2, block bitmap and bg descriptor inconsistent: 6117 vs 0 free clusters Aborting journal on device loop0-8. EXT4-fs (loop0): Remounting filesystem read-only EXT4-fs error (device loop0) in ext4_free_blocks:4895: Journal has aborted EXT4-fs error (device loop0) in ext4_do_update_inode:4893: Journal has aborted EXT4-fs error (device loop0) in ext4_do_update_inode:4893: Journal has aborted EXT4-fs error (device loop0) in ext4_ext_remove_space:3068: IO failure EXT4-fs error (device loop0) in ext4_ext_truncate:4667: Journal has aborted EXT4-fs error (device loop0) in ext4_orphan_del:2927: Journal has aborted EXT4-fs error (device loop0) in ext4_do_update_inode:4893: Journal has aborted EXT4-fs (loop0): Inode 16 (00000000618192a0): orphan list check failed! [...] EXT4-fs (loop0): Inode 16 (0000000061819748): orphan list check failed! [...] EXT4-fs (loop0): Inode 16 (0000000061819bf0): orphan list check failed! [...] See-also: c9eb13a9105 ("ext4: fix hang when processing corrupted orphaned inode list") Cc: Jan Kara <jack@suse.cz> Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-16ext4: validate s_reserved_gdt_blocks on mountTheodore Ts'o
commit 5b9554dc5bf008ae7f68a52e3d7e76c0920938a2 upstream. If s_reserved_gdt_blocks is extremely large, it's possible for ext4_init_block_bitmap(), which is called when ext4 sets up an uninitialized block bitmap, to corrupt random kernel memory. Add the same checks which e2fsck has --- it must never be larger than blocksize / sizeof(__u32) --- and then add a backup check in ext4_init_block_bitmap() in case the superblock gets modified after the file system is mounted. Reported-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-16ext4: don't call ext4_should_journal_data() on the journal inodeVegard Nossum
commit 6a7fd522a7c94cdef0a3b08acf8e6702056e635c upstream. If ext4_fill_super() fails early, it's possible for ext4_evict_inode() to call ext4_should_journal_data() before superblock options and flags are fully set up. In that case, the iput() on the journal inode can end up causing a BUG(). Work around this problem by reordering the tests so we only call ext4_should_journal_data() after we know it's not the journal inode. Fixes: 2d859db3e4 ("ext4: fix data corruption in inodes with journalled data") Fixes: 2b405bfa84 ("ext4: fix data=journal fast mount/umount hang") Cc: Jan Kara <jack@suse.cz> Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-16ext4: fix deadlock during page writebackJan Kara
commit 646caa9c8e196880b41cd3e3d33a2ebc752bdb85 upstream. Commit 06bd3c36a733 (ext4: fix data exposure after a crash) uncovered a deadlock in ext4_writepages() which was previously much harder to hit. After this commit xfstest generic/130 reproduces the deadlock on small filesystems. The problem happens when ext4_do_update_inode() sets LARGE_FILE feature and marks current inode handle as synchronous. That subsequently results in ext4_journal_stop() called from ext4_writepages() to block waiting for transaction commit while still holding page locks, reference to io_end, and some prepared bio in mpd structure each of which can possibly block transaction commit from completing and thus results in deadlock. Fix the problem by releasing page locks, io_end reference, and submitting prepared bio before calling ext4_journal_stop(). [ Changed to defer the call to ext4_journal_stop() only if the handle is synchronous. --tytso ] Reported-and-tested-by: Eryu Guan <eguan@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-16ext4: check for extents that wrap aroundVegard Nossum
commit f70749ca42943faa4d4dcce46dfdcaadb1d0c4b6 upstream. An extent with lblock = 4294967295 and len = 1 will pass the ext4_valid_extent() test: ext4_lblk_t last = lblock + len - 1; if (len == 0 || lblock > last) return 0; since last = 4294967295 + 1 - 1 = 4294967295. This would later trigger the BUG_ON(es->es_lblk + es->es_len < es->es_lblk) in ext4_es_end(). We can simplify it by removing the - 1 altogether and changing the test to use lblock + len <= lblock, since now if len = 0, then lblock + 0 == lblock and it fails, and if len > 0 then lblock + len > lblock in order to pass (i.e. it doesn't overflow). Fixes: 5946d0893 ("ext4: check for overlapping extents in ext4_valid_extent_entries()") Fixes: 2f974865f ("ext4: check for zero length extent explicitly") Cc: Eryu Guan <guaneryu@gmail.com> Signed-off-by: Phil Turnbull <phil.turnbull@oracle.com> Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-16serial: mvebu-uart: free the IRQ in ->shutdown()Thomas Petazzoni
commit c2c1659b4f8f9e19fe82a4fd06cca4b3d59090ce upstream. As suggested by the serial port infrastructure documentation, the IRQ is requested in ->startup(). However, it is never freed in the ->shutdown() hook. With simple systems that open the serial port once for all and always have at least one process that keep the serial port opened, there was no problem. But with a more complicated system (*cough* systemd *cough*), the serial port is opened/closed many times, which at some point no processes having the serial port open at all. Due to this ->startup() gets called again, tries to request_irq() again, which fails. Fixes: 30530791a7a0 ("serial: mvebu-uart: initial support for Armada-3700 serial port") Cc: Ofer Heifetz <oferh@marvell.com> Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-16crypto: scatterwalk - Fix test in scatterwalk_doneHerbert Xu
commit 5f070e81bee35f1b7bd1477bb223a873ff657803 upstream. When there is more data to be processed, the current test in scatterwalk_done may prevent us from calling pagedone even when we should. In particular, if we're on an SG entry spanning multiple pages where the last page is not a full page, we will incorrectly skip calling pagedone on the second last page. This patch fixes this by adding a separate test for whether we've reached the end of a page. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-16crypto: gcm - Filter out async ghash if necessaryHerbert Xu
commit b30bdfa86431afbafe15284a3ad5ac19b49b88e3 upstream. As it is if you ask for a sync gcm you may actually end up with an async one because it does not filter out async implementations of ghash. This patch fixes this by adding the necessary filter when looking for ghash. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-16Revert "cpufreq: pcc-cpufreq: update default value of ↵Andreas Herrmann
cpuinfo_transition_latency" commit da7d3abe1c9e5ebac2cf86f97e9e89888a5e2094 upstream. This reverts commit 790d849bf811a8ab5d4cd2cce0f6fda92f6aebf2. Using a v4.7-rc7 kernel on a HP ProLiant triggered following messages pcc-cpufreq: (v1.10.00) driver loaded with frequency limits: 1200 MHz, 2800 MHz cpufreq: ondemand governor failed, too long transition latency of HW, fallback to performance governor The last line was shown for each CPU in the system. Testing v4.5 (where commit 790d849b was integrated) triggered similar messages. Same behaviour on a 2nd HP Proliant system. So commit 790d849bf (cpufreq: pcc-cpufreq: update default value of cpuinfo_transition_latency) causes the system to use performance governor which, I guess, was not the intention of the patch. Enabling debug output in pcc-cpufreq provides following verbose output: pcc-cpufreq: (v1.10.00) driver loaded with frequency limits: 1200 MHz, 2800 MHz pcc_get_offset: for CPU 0: pcc_cpu_data input_offset: 0x44, pcc_cpu_data output_offset: 0x48 init: policy->max is 2800000, policy->min is 1200000 get: get_freq for CPU 0 get: SUCCESS: (virtual) output_offset for cpu 0 is 0xffffc9000d7c0048, contains a value of: 0xff06. Speed is: 168000 MHz cpufreq: ondemand governor failed, too long transition latency of HW, fallback to performance governor target: CPU 0 should go to target freq: 2800000 (virtual) input_offset is 0xffffc9000d7c0044 target: was SUCCESSFUL for cpu 0 I am asking to revert 790d849bf to re-enable usage of ondemand governor with pcc-cpufreq. Fixes: 790d849bf (cpufreq: pcc-cpufreq: update default value of cpuinfo_transition_latency) Signed-off-by: Andreas Herrmann <aherrmann@suse.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-16fs/dcache.c: avoid soft-lockup in dput()Wei Fang
commit 47be61845c775643f1aa4d2a54343549f943c94c upstream. We triggered soft-lockup under stress test which open/access/write/close one file concurrently on more than five different CPUs: WARN: soft lockup - CPU#0 stuck for 11s! [who:30631] ... [<ffffffc0003986f8>] dput+0x100/0x298 [<ffffffc00038c2dc>] terminate_walk+0x4c/0x60 [<ffffffc00038f56c>] path_lookupat+0x5cc/0x7a8 [<ffffffc00038f780>] filename_lookup+0x38/0xf0 [<ffffffc000391180>] user_path_at_empty+0x78/0xd0 [<ffffffc0003911f4>] user_path_at+0x1c/0x28 [<ffffffc00037d4fc>] SyS_faccessat+0xb4/0x230 ->d_lock trylock may failed many times because of concurrently operations, and dput() may execute a long time. Fix this by replacing cpu_relax() with cond_resched(). dput() used to be sleepable, so make it sleepable again should be safe. Signed-off-by: Wei Fang <fangwei1@huawei.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-16Revert "mm, mempool: only set __GFP_NOMEMALLOC if there are free elements"Michal Hocko
commit 4e390b2b2f34b8daaabf2df1df0cf8f798b87ddb upstream. This reverts commit f9054c70d28b ("mm, mempool: only set __GFP_NOMEMALLOC if there are free elements"). There has been a report about OOM killer invoked when swapping out to a dm-crypt device. The primary reason seems to be that the swapout out IO managed to completely deplete memory reserves. Ondrej was able to bisect and explained the issue by pointing to f9054c70d28b ("mm, mempool: only set __GFP_NOMEMALLOC if there are free elements"). The reason is that the swapout path is not throttled properly because the md-raid layer needs to allocate from the generic_make_request path which means it allocates from the PF_MEMALLOC context. dm layer uses mempool_alloc in order to guarantee a forward progress which used to inhibit access to memory reserves when using page allocator. This has changed by f9054c70d28b ("mm, mempool: only set __GFP_NOMEMALLOC if there are free elements") which has dropped the __GFP_NOMEMALLOC protection when the memory pool is depleted. If we are running out of memory and the only way forward to free memory is to perform swapout we just keep consuming memory reserves rather than throttling the mempool allocations and allowing the pending IO to complete up to a moment when the memory is depleted completely and there is no way forward but invoking the OOM killer. This is less than optimal. The original intention of f9054c70d28b was to help with the OOM situations where the oom victim depends on mempool allocation to make a forward progress. David has mentioned the following backtrace: schedule schedule_timeout io_schedule_timeout mempool_alloc __split_and_process_bio dm_request generic_make_request submit_bio mpage_readpages ext4_readpages __do_page_cache_readahead ra_submit filemap_fault handle_mm_fault __do_page_fault do_page_fault page_fault We do not know more about why the mempool is depleted without being replenished in time, though. In any case the dm layer shouldn't depend on any allocations outside of the dedicated pools so a forward progress should be guaranteed. If this is not the case then the dm should be fixed rather than papering over the problem and postponing it to later by accessing more memory reserves. mempools are a mechanism to maintain dedicated memory reserves to guaratee forward progress. Allowing them an unbounded access to the page allocator memory reserves is going against the whole purpose of this mechanism. Bisected by Ondrej Kozina. [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/20160721145309.GR26379@dhcp22.suse.cz Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Ondrej Kozina <okozina@redhat.com> Reviewed-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: NeilBrown <neilb@suse.com> Cc: David Rientjes <rientjes@google.com> Cc: Mikulas Patocka <mpatocka@redhat.com> Cc: Ondrej Kozina <okozina@redhat.com> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Cc: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-16fuse: fix wrong assignment of ->flags in fuse_send_init()Wei Fang
commit 9446385f05c9af25fed53dbed3cc75763730be52 upstream. FUSE_HAS_IOCTL_DIR should be assigned to ->flags, it may be a typo. Signed-off-by: Wei Fang <fangwei1@huawei.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 69fe05c90ed5 ("fuse: add missing INIT flags") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-16fuse: fuse_flush must check mapping->flags for errorsMaxim Patlasov
commit 9ebce595f63a407c5cec98f98f9da8459b73740a upstream. fuse_flush() calls write_inode_now() that triggers writeback, but actual writeback will happen later, on fuse_sync_writes(). If an error happens, fuse_writepage_end() will set error bit in mapping->flags. So, we have to check mapping->flags after fuse_sync_writes(). Signed-off-by: Maxim Patlasov <mpatlasov@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 4d99ff8f12eb ("fuse: Turn writeback cache on") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-16fuse: fsync() did not return IO errorsAlexey Kuznetsov
commit ac7f052b9e1534c8248f814b6f0068ad8d4a06d2 upstream. Due to implementation of fuse writeback filemap_write_and_wait_range() does not catch errors. We have to do this directly after fuse_sync_writes() Signed-off-by: Alexey Kuznetsov <kuznet@virtuozzo.com> Signed-off-by: Maxim Patlasov <mpatlasov@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 4d99ff8f12eb ("fuse: Turn writeback cache on") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-16x86/power/64: Fix hibernation return address corruptionJosh Poimboeuf
commit 4ce827b4cc58bec7952591b96cce2b28553e4d5b upstream. In kernel bug 150021, a kernel panic was reported when restoring a hibernate image. Only a picture of the oops was reported, so I can't paste the whole thing here. But here are the most interesting parts: kernel tried to execute NX-protected page - exploit attempt? (uid: 0) BUG: unable to handle kernel paging request at ffff8804615cfd78 ... RIP: ffff8804615cfd78 RSP: ffff8804615f0000 RBP: ffff8804615cfdc0 ... Call Trace: do_signal+0x23 exit_to_usermode_loop+0x64 ... The RIP is on the same page as RBP, so it apparently started executing on the stack. The bug was bisected to commit ef0f3ed5a4ac (x86/asm/power: Create stack frames in hibernate_asm_64.S), which in retrospect seems quite dangerous, since that code saves and restores the stack pointer from a global variable ('saved_context'). There are a lot of moving parts in the hibernate save and restore paths, so I don't know exactly what caused the panic. Presumably, a FRAME_END was executed without the corresponding FRAME_BEGIN, or vice versa. That would corrupt the return address on the stack and would be consistent with the details of the above panic. [ rjw: One major problem is that by the time the FRAME_BEGIN in restore_registers() is executed, the stack pointer value may not be valid any more. Namely, the stack area pointed to by it previously may have been overwritten by some image memory contents and that page frame may now be used for whatever different purpose it had been allocated for before hibernation. In that case, the FRAME_BEGIN will corrupt that memory. ] Instead of doing the frame pointer save/restore around the bounds of the affected functions, just do it around the call to swsusp_save(). That has the same effect of ensuring that if swsusp_save() sleeps, the frame pointers will be correct. It's also a much more obviously safe way to do it than the original patch. And objtool still doesn't report any warnings. Fixes: ef0f3ed5a4ac (x86/asm/power: Create stack frames in hibernate_asm_64.S) Link: https://bugzilla.kernel.org/show_bug.cgi?id=150021 Reported-by: Andre Reinke <andre.reinke@mailbox.org> Tested-by: Andre Reinke <andre.reinke@mailbox.org> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-16x86/microcode: Fix suspend to RAM with builtin microcodeBorislav Petkov
commit 4b703305d98bf7350d4b2953ee39a3aa2eeb1778 upstream. Usually, after we have found the proper microcode blob for the current machine, we stash it away for later use with save_microcode_in_initrd(). However, with builtin microcode which doesn't come from the initrd, we don't call that function because CONFIG_BLK_DEV_INITRD=n and even if set, we don't have a valid initrd. In order to fix this, let's make save_microcode_in_initrd() an fs_initcall which runs before rootfs_initcall() as this was the time it was called previously through: rootfs_initcall(populate_rootfs) |-> free_initrd() |-> free_initrd_mem() |-> save_microcode_in_initrd() Also, we make it run independently from initrd functionality being present or not. And since it is called in the microcode loader only now, we can also make it static. Reported-and-tested-by: Jim Bos <jim876@xs4all.nl> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1465225850-7352-3-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-16radix-tree: account nodes to memcg only if explicitly requestedVladimir Davydov
commit 05eb6e7263185a6bb0de9501ccf2addc52429414 upstream. Radix trees may be used not only for storing page cache pages, so unconditionally accounting radix tree nodes to the current memory cgroup is bad: if a radix tree node is used for storing data shared among different cgroups we risk pinning dead memory cgroups forever. So let's only account radix tree nodes if it was explicitly requested by passing __GFP_ACCOUNT to INIT_RADIX_TREE. Currently, we only want to account page cache entries, so mark mapping->page_tree so. Fixes: 58e698af4c63 ("radix-tree: account radix_tree_node to memory cgroup") Link: http://lkml.kernel.org/r/1470057188-7864-1-git-send-email-vdavydov@virtuozzo.com Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-16sysv, ipc: fix security-layer leakingFabian Frederick
commit 9b24fef9f0410fb5364245d6cc2bd044cc064007 upstream. Commit 53dad6d3a8e5 ("ipc: fix race with LSMs") updated ipc_rcu_putref() to receive rcu freeing function but used generic ipc_rcu_free() instead of msg_rcu_free() which does security cleaning. Running LTP msgsnd06 with kmemleak gives the following: cat /sys/kernel/debug/kmemleak unreferenced object 0xffff88003c0a11f8 (size 8): comm "msgsnd06", pid 1645, jiffies 4294672526 (age 6.549s) hex dump (first 8 bytes): 1b 00 00 00 01 00 00 00 ........ backtrace: kmemleak_alloc+0x23/0x40 kmem_cache_alloc_trace+0xe1/0x180 selinux_msg_queue_alloc_security+0x3f/0xd0 security_msg_queue_alloc+0x2e/0x40 newque+0x4e/0x150 ipcget+0x159/0x1b0 SyS_msgget+0x39/0x40 entry_SYSCALL_64_fastpath+0x13/0x8f Manfred Spraul suggested to fix sem.c as well and Davidlohr Bueso to only use ipc_rcu_free in case of security allocation failure in newary() Fixes: 53dad6d3a8e ("ipc: fix race with LSMs") Link: http://lkml.kernel.org/r/1470083552-22966-1-git-send-email-fabf@skynet.be Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-16block: fix use-after-free in seq fileVegard Nossum
commit 77da160530dd1dc94f6ae15a981f24e5f0021e84 upstream. I got a KASAN report of use-after-free: ================================================================== BUG: KASAN: use-after-free in klist_iter_exit+0x61/0x70 at addr ffff8800b6581508 Read of size 8 by task trinity-c1/315 ============================================================================= BUG kmalloc-32 (Not tainted): kasan: bad access detected ----------------------------------------------------------------------------- Disabling lock debugging due to kernel taint INFO: Allocated in disk_seqf_start+0x66/0x110 age=144 cpu=1 pid=315 ___slab_alloc+0x4f1/0x520 __slab_alloc.isra.58+0x56/0x80 kmem_cache_alloc_trace+0x260/0x2a0 disk_seqf_start+0x66/0x110 traverse+0x176/0x860 seq_read+0x7e3/0x11a0 proc_reg_read+0xbc/0x180 do_loop_readv_writev+0x134/0x210 do_readv_writev+0x565/0x660 vfs_readv+0x67/0xa0 do_preadv+0x126/0x170 SyS_preadv+0xc/0x10 do_syscall_64+0x1a1/0x460 return_from_SYSCALL_64+0x0/0x6a INFO: Freed in disk_seqf_stop+0x42/0x50 age=160 cpu=1 pid=315 __slab_free+0x17a/0x2c0 kfree+0x20a/0x220 disk_seqf_stop+0x42/0x50 traverse+0x3b5/0x860 seq_read+0x7e3/0x11a0 proc_reg_read+0xbc/0x180 do_loop_readv_writev+0x134/0x210 do_readv_writev+0x565/0x660 vfs_readv+0x67/0xa0 do_preadv+0x126/0x170 SyS_preadv+0xc/0x10 do_syscall_64+0x1a1/0x460 return_from_SYSCALL_64+0x0/0x6a CPU: 1 PID: 315 Comm: trinity-c1 Tainted: G B 4.7.0+ #62 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 ffffea0002d96000 ffff880119b9f918 ffffffff81d6ce81 ffff88011a804480 ffff8800b6581500 ffff880119b9f948 ffffffff8146c7bd ffff88011a804480 ffffea0002d96000 ffff8800b6581500 fffffffffffffff4 ffff880119b9f970 Call Trace: [<ffffffff81d6ce81>] dump_stack+0x65/0x84 [<ffffffff8146c7bd>] print_trailer+0x10d/0x1a0 [<ffffffff814704ff>] object_err+0x2f/0x40 [<ffffffff814754d1>] kasan_report_error+0x221/0x520 [<ffffffff8147590e>] __asan_report_load8_noabort+0x3e/0x40 [<ffffffff83888161>] klist_iter_exit+0x61/0x70 [<ffffffff82404389>] class_dev_iter_exit+0x9/0x10 [<ffffffff81d2e8ea>] disk_seqf_stop+0x3a/0x50 [<ffffffff8151f812>] seq_read+0x4b2/0x11a0 [<ffffffff815f8fdc>] proc_reg_read+0xbc/0x180 [<ffffffff814b24e4>] do_loop_readv_writev+0x134/0x210 [<ffffffff814b4c45>] do_readv_writev+0x565/0x660 [<ffffffff814b8a17>] vfs_readv+0x67/0xa0 [<ffffffff814b8de6>] do_preadv+0x126/0x170 [<ffffffff814b92ec>] SyS_preadv+0xc/0x10 This problem can occur in the following situation: open() - pread() - .seq_start() - iter = kmalloc() // succeeds - seqf->private = iter - .seq_stop() - kfree(seqf->private) - pread() - .seq_start() - iter = kmalloc() // fails - .seq_stop() - class_dev_iter_exit(seqf->private) // boom! old pointer As the comment in disk_seqf_stop() says, stop is called even if start failed, so we need to reinitialise the private pointer to NULL when seq iteration stops. An alternative would be to set the private pointer to NULL when the kmalloc() in disk_seqf_start() fails. Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-16x86/syscalls/64: Add compat_sys_keyctl for 32-bit userspaceDavid Howells
commit f7d665627e103e82d34306c7d3f6f46f387c0d8b upstream. x86_64 needs to use compat_sys_keyctl for 32-bit userspace rather than calling sys_keyctl(). The latter will work in a lot of cases, thereby hiding the issue. Reported-by: Stephan Mueller <smueller@chronox.de> Tested-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: David Howells <dhowells@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: keyrings@vger.kernel.org Cc: linux-security-module@vger.kernel.org Link: http://lkml.kernel.org/r/146961615805.14395.5581949237156769439.stgit@warthog.procyon.org.uk Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-16mm: memcontrol: fix memcg id ref counter on swap charge moveVladimir Davydov
commit 615d66c37c755c49ce022c9e5ac0875d27d2603d upstream. Since commit 73f576c04b94 ("mm: memcontrol: fix cgroup creation failure after many small jobs") swap entries do not pin memcg->css.refcnt directly. Instead, they pin memcg->id.ref. So we should adjust the reference counters accordingly when moving swap charges between cgroups. Fixes: 73f576c04b941 ("mm: memcontrol: fix cgroup creation failure after many small jobs") Link: http://lkml.kernel.org/r/9ce297c64954a42dc90b543bc76106c4a94f07e8.1470219853.git.vdavydov@virtuozzo.com Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-16mm: memcontrol: fix swap counter leak on swapout from offline cgroupVladimir Davydov
commit 1f47b61fb4077936465dcde872a4e5cc4fe708da upstream. An offline memory cgroup might have anonymous memory or shmem left charged to it and no swap. Since only swap entries pin the id of an offline cgroup, such a cgroup will have no id and so an attempt to swapout its anon/shmem will not store memory cgroup info in the swap cgroup map. As a result, memcg->swap or memcg->memsw will never get uncharged from it and any of its ascendants. Fix this by always charging swapout to the first ancestor cgroup that hasn't released its id yet. [hannes@cmpxchg.org: add comment to mem_cgroup_swapout] [vdavydov@virtuozzo.com: use WARN_ON_ONCE() in mem_cgroup_id_get_online()] Link: http://lkml.kernel.org/r/20160803123445.GJ13263@esperanza Fixes: 73f576c04b941 ("mm: memcontrol: fix cgroup creation failure after many small jobs") Link: http://lkml.kernel.org/r/5336daa5c9a32e776067773d9da655d2dc126491.1470219853.git.vdavydov@virtuozzo.com Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-16random: strengthen input validation for RNDADDTOENTCNTTheodore Ts'o
commit 86a574de4590ffe6fd3f3ca34cdcf655a78e36ec upstream. Don't allow RNDADDTOENTCNT or RNDADDENTROPY to accept a negative entropy value. It doesn't make any sense to subtract from the entropy counter, and it can trigger a warning: random: negative entropy/overflow: pool input count -40000 ------------[ cut here ]------------ WARNING: CPU: 3 PID: 6828 at drivers/char/random.c:670[< none >] credit_entropy_bits+0x21e/0xad0 drivers/char/random.c:670 Modules linked in: CPU: 3 PID: 6828 Comm: a.out Not tainted 4.7.0-rc4+ #4 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 ffffffff880b58e0 ffff88005dd9fcb0 ffffffff82cc838f ffffffff87158b40 fffffbfff1016b1c 0000000000000000 0000000000000000 ffffffff87158b40 ffffffff83283dae 0000000000000009 ffff88005dd9fcf8 ffffffff8136d27f Call Trace: [< inline >] __dump_stack lib/dump_stack.c:15 [<ffffffff82cc838f>] dump_stack+0x12e/0x18f lib/dump_stack.c:51 [<ffffffff8136d27f>] __warn+0x19f/0x1e0 kernel/panic.c:516 [<ffffffff8136d48c>] warn_slowpath_null+0x2c/0x40 kernel/panic.c:551 [<ffffffff83283dae>] credit_entropy_bits+0x21e/0xad0 drivers/char/random.c:670 [< inline >] credit_entropy_bits_safe drivers/char/random.c:734 [<ffffffff8328785d>] random_ioctl+0x21d/0x250 drivers/char/random.c:1546 [< inline >] vfs_ioctl fs/ioctl.c:43 [<ffffffff8185316c>] do_vfs_ioctl+0x18c/0xff0 fs/ioctl.c:674 [< inline >] SYSC_ioctl fs/ioctl.c:689 [<ffffffff8185405f>] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:680 [<ffffffff86a995c0>] entry_SYSCALL_64_fastpath+0x23/0xc1 arch/x86/entry/entry_64.S:207 ---[ end trace 5d4902b2ba842f1f ]--- This was triggered using the test program: // autogenerated by syzkaller (http://github.com/google/syzkaller) int main() { int fd = open("/dev/random", O_RDWR); int val = -5000; ioctl(fd, RNDADDTOENTCNT, &val); return 0; } It's harmless in that (a) only root can trigger it, and (b) after complaining the code never does let the entropy count go negative, but it's better to simply not allow this userspace from passing in a negative entropy value altogether. Google-Bug-Id: #29575089 Reported-By: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-16apparmor: fix ref count leak when profile sha1 hash is readJohn Johansen
commit 0b938a2e2cf0b0a2c8bac9769111545aff0fee97 upstream. Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Seth Arnold <seth.arnold@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-16IB/hfi1: Disable by defaultBart Van Assche
commit a154a8cd080b437969ef194dee365bbb60a3b38a upstream. There is a strict policy in the Linux kernel that new drivers must be disabled by default. Hence leave out the "default m" line from Kconfig. Fixes: f48ad614c100 ("IB/hfi1: Move driver out of staging") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Jubin John <jubin.john@intel.com> Cc: Dennis Dalessandro <dennis.dalessandro@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Mike Marciniszyn <mike.marciniszyn@intel.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-16KEYS: 64-bit MIPS needs to use compat_sys_keyctl for 32-bit userspaceDavid Howells
commit 20f06ed9f61a185c6dabd662c310bed6189470df upstream. MIPS64 needs to use compat_sys_keyctl for 32-bit userspace rather than calling sys_keyctl. The latter will work in a lot of cases, thereby hiding the issue. Reported-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: David Howells <dhowells@redhat.com> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Cc: linux-security-module@vger.kernel.org Cc: keyrings@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/13832/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-16arm: oabi compat: add missing access checksDave Weinstein
commit 7de249964f5578e67b99699c5f0b405738d820a2 upstream. Add access checks to sys_oabi_epoll_wait() and sys_oabi_semtimedop(). This fixes CVE-2016-3857, a local privilege escalation under CONFIG_OABI_COMPAT. Reported-by: Chiachih Wu <wuchiachih@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Dave Weinstein <olorin@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-16tcp: consider recv buf for the initial window scaleSoheil Hassas Yeganeh
[ Upstream commit f626300a3e776ccc9671b0dd94698fb3aa315966 ] tcp_select_initial_window() intends to advertise a window scaling for the maximum possible window size. To do so, it considers the maximum of net.ipv4.tcp_rmem[2] and net.core.rmem_max as the only possible upper-bounds. However, users with CAP_NET_ADMIN can use SO_RCVBUFFORCE to set the socket's receive buffer size to values larger than net.ipv4.tcp_rmem[2] and net.core.rmem_max. Thus, SO_RCVBUFFORCE is effectively ignored by tcp_select_initial_window(). To fix this, consider the maximum of net.ipv4.tcp_rmem[2], net.core.rmem_max and socket's initial buffer space. Fixes: b0573dea1fb3 ("[NET]: Introduce SO_{SND,RCV}BUFFORCE socket options") Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Suggested-by: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-16macsec: ensure rx_sa is set when validation is disabledBeniamino Galvani
[ Upstream commit e3a3b626010a14fe067f163c2c43409d5afcd2a9 ] macsec_decrypt() is not called when validation is disabled and so macsec_skb_cb(skb)->rx_sa is not set; but it is used later in macsec_post_decrypt(), ensure that it's always initialized. Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver") Signed-off-by: Beniamino Galvani <bgalvani@redhat.com> Acked-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-16qed: Fix setting/clearing bit in completion bitmapManish Chopra
[ Upstream commit 59d3f1ceb69b54569685d0c34dff16a1e0816b19 ] Slowpath completion handling is incorrectly changing SPQ_RING_SIZE bits instead of a single one. Fixes: 76a9a3642a0b ("qed: fix handling of concurrent ramrods") Signed-off-by: Manish Chopra <manish.chopra@qlogic.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-16net/sctp: terminate rhashtable walk correctlyVegard Nossum
[ Upstream commit 5fc382d87517707ad77ea4c9c12e2a3fde2c838a ] I was seeing a lot of these: BUG: sleeping function called from invalid context at mm/slab.h:388 in_atomic(): 0, irqs_disabled(): 0, pid: 14971, name: trinity-c2 Preemption disabled at:[<ffffffff819bcd46>] rhashtable_walk_start+0x46/0x150 [<ffffffff81149abb>] preempt_count_add+0x1fb/0x280 [<ffffffff83295722>] _raw_spin_lock+0x12/0x40 [<ffffffff811aac87>] console_unlock+0x2f7/0x930 [<ffffffff811ab5bb>] vprintk_emit+0x2fb/0x520 [<ffffffff811aba6a>] vprintk_default+0x1a/0x20 [<ffffffff812c171a>] printk+0x94/0xb0 [<ffffffff811d6ed0>] print_stack_trace+0xe0/0x170 [<ffffffff8115835e>] ___might_sleep+0x3be/0x460 [<ffffffff81158490>] __might_sleep+0x90/0x1a0 [<ffffffff8139b823>] kmem_cache_alloc+0x153/0x1e0 [<ffffffff819bca1e>] rhashtable_walk_init+0xfe/0x2d0 [<ffffffff82ec64de>] sctp_transport_walk_start+0x1e/0x60 [<ffffffff82edd8ad>] sctp_transport_seq_start+0x4d/0x150 [<ffffffff8143a82b>] seq_read+0x27b/0x1180 [<ffffffff814f97fc>] proc_reg_read+0xbc/0x180 [<ffffffff813d471b>] __vfs_read+0xdb/0x610 [<ffffffff813d4d3a>] vfs_read+0xea/0x2d0 [<ffffffff813d615b>] SyS_pread64+0x11b/0x150 [<ffffffff8100334c>] do_syscall_64+0x19c/0x410 [<ffffffff832960a5>] return_from_SYSCALL_64+0x0/0x6a [<ffffffffffffffff>] 0xffffffffffffffff Apparently we always need to call rhashtable_walk_stop(), even when rhashtable_walk_start() fails: * rhashtable_walk_start - Start a hash table walk * @iter: Hash table iterator * * Start a hash table walk. Note that we take the RCU lock in all * cases including when we return an error. So you must always call * rhashtable_walk_stop to clean up. otherwise we never call rcu_read_unlock() and we get the splat above. Fixes: 53fa1036 ("sctp: fix some rhashtable functions using in sctp proc/diag") See-also: 53fa1036 ("sctp: fix some rhashtable functions using in sctp proc/diag") See-also: f2dba9c6 ("rhashtable: Introduce rhashtable_walk_*") Cc: Xin Long <lucien.xin@gmail.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: stable@vger.kernel.org Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-16net/irda: fix NULL pointer dereference on memory allocation failureVegard Nossum
[ Upstream commit d3e6952cfb7ba5f4bfa29d4803ba91f96ce1204d ] I ran into this: kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] PREEMPT SMP KASAN CPU: 2 PID: 2012 Comm: trinity-c3 Not tainted 4.7.0-rc7+ #19 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 task: ffff8800b745f2c0 ti: ffff880111740000 task.ti: ffff880111740000 RIP: 0010:[<ffffffff82bbf066>] [<ffffffff82bbf066>] irttp_connect_request+0x36/0x710 RSP: 0018:ffff880111747bb8 EFLAGS: 00010286 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000069dd8358 RDX: 0000000000000009 RSI: 0000000000000027 RDI: 0000000000000048 RBP: ffff880111747c00 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000069dd8358 R11: 1ffffffff0759723 R12: 0000000000000000 R13: ffff88011a7e4780 R14: 0000000000000027 R15: 0000000000000000 FS: 00007fc738404700(0000) GS:ffff88011af00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fc737fdfb10 CR3: 0000000118087000 CR4: 00000000000006e0 Stack: 0000000000000200 ffff880111747bd8 ffffffff810ee611 ffff880119f1f220 ffff880119f1f4f8 ffff880119f1f4f0 ffff88011a7e4780 ffff880119f1f232 ffff880119f1f220 ffff880111747d58 ffffffff82bca542 0000000000000000 Call Trace: [<ffffffff82bca542>] irda_connect+0x562/0x1190 [<ffffffff825ae582>] SYSC_connect+0x202/0x2a0 [<ffffffff825b4489>] SyS_connect+0x9/0x10 [<ffffffff8100334c>] do_syscall_64+0x19c/0x410 [<ffffffff83295ca5>] entry_SYSCALL64_slow_path+0x25/0x25 Code: 41 89 ca 48 89 e5 41 57 41 56 41 55 41 54 41 89 d7 53 48 89 fb 48 83 c7 48 48 89 fa 41 89 f6 48 c1 ea 03 48 83 ec 20 4c 8b 65 10 <0f> b6 04 02 84 c0 74 08 84 c0 0f 8e 4c 04 00 00 80 7b 48 00 74 RIP [<ffffffff82bbf066>] irttp_connect_request+0x36/0x710 RSP <ffff880111747bb8> ---[ end trace 4cda2588bc055b30 ]--- The problem is that irda_open_tsap() can fail and leave self->tsap = NULL, and then irttp_connect_request() almost immediately dereferences it. Cc: stable@vger.kernel.org Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-16sctp: fix BH handling on socket backlogMarcelo Ricardo Leitner
[ Upstream commit eefc1b1d105ee4d2ce907833ce675f1e9599b5e3 ] Now that the backlog processing is called with BH enabled, we have to disable BH before taking the socket lock via bh_lock_sock() otherwise it may dead lock: sctp_backlog_rcv() bh_lock_sock(sk); if (sock_owned_by_user(sk)) { if (sk_add_backlog(sk, skb, sk->sk_rcvbuf)) sctp_chunk_free(chunk); else backloged = 1; } else sctp_inq_push(inqueue, chunk); bh_unlock_sock(sk); while sctp_inq_push() was disabling/enabling BH, but enabling BH triggers pending softirq, which then may try to re-lock the socket in sctp_rcv(). [ 219.187215] <IRQ> [ 219.187217] [<ffffffff817ca3e0>] _raw_spin_lock+0x20/0x30 [ 219.187223] [<ffffffffa041888c>] sctp_rcv+0x48c/0xba0 [sctp] [ 219.187225] [<ffffffff816e7db2>] ? nf_iterate+0x62/0x80 [ 219.187226] [<ffffffff816f1b14>] ip_local_deliver_finish+0x94/0x1e0 [ 219.187228] [<ffffffff816f1e1f>] ip_local_deliver+0x6f/0xf0 [ 219.187229] [<ffffffff816f1a80>] ? ip_rcv_finish+0x3b0/0x3b0 [ 219.187230] [<ffffffff816f17a8>] ip_rcv_finish+0xd8/0x3b0 [ 219.187232] [<ffffffff816f2122>] ip_rcv+0x282/0x3a0 [ 219.187233] [<ffffffff810d8bb6>] ? update_curr+0x66/0x180 [ 219.187235] [<ffffffff816abac4>] __netif_receive_skb_core+0x524/0xa90 [ 219.187236] [<ffffffff810d8e00>] ? update_cfs_shares+0x30/0xf0 [ 219.187237] [<ffffffff810d557c>] ? __enqueue_entity+0x6c/0x70 [ 219.187239] [<ffffffff810dc454>] ? enqueue_entity+0x204/0xdf0 [ 219.187240] [<ffffffff816ac048>] __netif_receive_skb+0x18/0x60 [ 219.187242] [<ffffffff816ad1ce>] process_backlog+0x9e/0x140 [ 219.187243] [<ffffffff816ac8ec>] net_rx_action+0x22c/0x370 [ 219.187245] [<ffffffff817cd352>] __do_softirq+0x112/0x2e7 [ 219.187247] [<ffffffff817cc3bc>] do_softirq_own_stack+0x1c/0x30 [ 219.187247] <EOI> [ 219.187248] [<ffffffff810aa1c8>] do_softirq.part.14+0x38/0x40 [ 219.187249] [<ffffffff810aa24d>] __local_bh_enable_ip+0x7d/0x80 [ 219.187254] [<ffffffffa0408428>] sctp_inq_push+0x68/0x80 [sctp] [ 219.187258] [<ffffffffa04190f1>] sctp_backlog_rcv+0x151/0x1c0 [sctp] [ 219.187260] [<ffffffff81692b07>] __release_sock+0x87/0xf0 [ 219.187261] [<ffffffff81692ba0>] release_sock+0x30/0xa0 [ 219.187265] [<ffffffffa040e46d>] sctp_accept+0x17d/0x210 [sctp] [ 219.187266] [<ffffffff810e7510>] ? prepare_to_wait_event+0xf0/0xf0 [ 219.187268] [<ffffffff8172d52c>] inet_accept+0x3c/0x130 [ 219.187269] [<ffffffff8168d7a3>] SYSC_accept4+0x103/0x210 [ 219.187271] [<ffffffff817ca2ba>] ? _raw_spin_unlock_bh+0x1a/0x20 [ 219.187272] [<ffffffff81692bfc>] ? release_sock+0x8c/0xa0 [ 219.187276] [<ffffffffa0413e22>] ? sctp_inet_listen+0x62/0x1b0 [sctp] [ 219.187277] [<ffffffff8168f2d0>] SyS_accept+0x10/0x20 Fixes: 860fbbc343bf ("sctp: prepare for socket backlog behavior change") Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-16net: ipv6: Always leave anycast and multicast groups on link downMike Manning
[ Upstream commit ea06f7176413e2538d13bb85b65387d0917943d9 ] Default kernel behavior is to delete IPv6 addresses on link down, which entails deletion of the multicast and the subnet-router anycast addresses. These deletions do not happen with sysctl setting to keep global IPv6 addresses on link down, so every link down/up causes an increment of the anycast and multicast refcounts. These bogus refcounts may stop these addrs from being removed on subsequent calls to delete them. The solution is to leave the groups for the multicast and subnet anycast on link down for the callflow when global IPv6 addresses are kept. Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional") Signed-off-by: Mike Manning <mmanning@brocade.com> Acked-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-16bridge: Fix incorrect re-injection of LLDP packetsIdo Schimmel
[ Upstream commit baedbe55884c003819f5c8c063ec3d2569414296 ] Commit 8626c56c8279 ("bridge: fix potential use-after-free when hook returns QUEUE or STOLEN verdict") caused LLDP packets arriving through a bridge port to be re-injected to the Rx path with skb->dev set to the bridge device, but this breaks the lldpad daemon. The lldpad daemon opens a packet socket with protocol set to ETH_P_LLDP for any valid device on the system, which doesn't not include soft devices such as bridge and VLAN. Since packet sockets (ptype_base) are processed in the Rx path after the Rx handler, LLDP packets with skb->dev set to the bridge device never reach the lldpad daemon. Fix this by making the bridge's Rx handler re-inject LLDP packets with RX_HANDLER_PASS, which effectively restores the behaviour prior to the mentioned commit. This means netfilter will never receive LLDP packets coming through a bridge port, as I don't see a way in which we can have okfn() consume the packet without breaking existing behaviour. I've already carried out a similar fix for STP packets in commit 56fae404fb2c ("bridge: Fix incorrect re-injection of STP packets"). Fixes: 8626c56c8279 ("bridge: fix potential use-after-free when hook returns QUEUE or STOLEN verdict") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Cc: Florian Westphal <fw@strlen.de> Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-16net/bonding: Enforce active-backup policy for IPoIB bondsMark Bloch
[ Upstream commit 1533e77315220dc1d5ec3bd6d9fe32e2aa0a74c0 ] When using an IPoIB bond currently only active-backup mode is a valid use case and this commit strengthens it. Since commit 2ab82852a270 ("net/bonding: Enable bonding to enslave netdevices not supporting set_mac_address()") was introduced till 4.7-rc1, IPoIB didn't support the set_mac_address ndo, and hence the fail over mac policy always applied to IPoIB bonds. With the introduction of commit 492a7e67ff83 ("IB/IPoIB: Allow setting the device address"), that doesn't hold and practically IPoIB bonds are broken as of that. To fix it, lets go to fail over mac if the device doesn't support the ndo OR this is IPoIB device. As a by-product, this commit also prevents a stack corruption which occurred when trying to copy 20 bytes (IPoIB) device address to a sockaddr struct that has only 16 bytes of storage. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-16udp: use sk_filter_trim_cap for udp{,6}_queue_rcv_skbDaniel Borkmann
[ Upstream commit ba66bbe5480a012108958a71cff88b23dce84956 ] After a612769774a3 ("udp: prevent bugcheck if filter truncates packet too much"), there followed various other fixes for similar cases such as f4979fcea7fd ("rose: limit sk_filter trim to payload"). Latter introduced a new helper sk_filter_trim_cap(), where we can pass the trim limit directly to the socket filter handling. Make use of it here as well with sizeof(struct udphdr) as lower cap limit and drop the extra skb->len test in UDP's input path. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Willem de Bruijn <willemb@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-16vfs: fix deadlock in file_remove_privs() on overlayfsMiklos Szeredi
commit c1892c37769cf89c7e7ba57528ae2ccb5d153c9b upstream. file_remove_privs() is called with inode lock on file_inode(), which proceeds to calling notify_change() on file->f_path.dentry. Which triggers the WARN_ON_ONCE(!inode_is_locked(inode)) in addition to deadlocking later when ovl_setattr tries to lock the underlying inode again. Fix this mess by not mixing the layers, but doing everything on underlying dentry/inode. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 07a2daab49c5 ("ovl: Copy up underlying inode's ->i_mode to overlay inode") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-16vfs: ioctl: prevent double-fetch in dedupe ioctlScott Bauer
commit 10eec60ce79187686e052092e5383c99b4420a20 upstream. This prevents a double-fetch from user space that can lead to to an undersized allocation and heap overflow. Fixes: 54dbc1517237 ("vfs: hoist the btrfs deduplication ioctl to the vfs") Signed-off-by: Scott Bauer <sbauer@plzdonthack.me> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-16ext4: verify extent header depthVegard Nossum
commit 7bc9491645118c9461bd21099c31755ff6783593 upstream. Although the extent tree depth of 5 should enough be for the worst case of 2*32 extents of length 1, the extent tree code does not currently to merge nodes which are less than half-full with a sibling node, or to shrink the tree depth if possible. So it's possible, at least in theory, for the tree depth to be greater than 5. However, even in the worst case, a tree depth of 32 is highly unlikely, and if the file system is maliciously corrupted, an insanely large eh_depth can cause memory allocation failures that will trigger kernel warnings (here, eh_depth = 65280): JBD2: ext4.exe wants too many credits credits:195849 rsv_credits:0 max:256 ------------[ cut here ]------------ WARNING: CPU: 0 PID: 50 at fs/jbd2/transaction.c:293 start_this_handle+0x569/0x580 CPU: 0 PID: 50 Comm: ext4.exe Not tainted 4.7.0-rc5+ #508 Stack: 604a8947 625badd8 0002fd09 00000000 60078643 00000000 62623910 601bf9bc 62623970 6002fc84 626239b0 900000125 Call Trace: [<6001c2dc>] show_stack+0xdc/0x1a0 [<601bf9bc>] dump_stack+0x2a/0x2e [<6002fc84>] __warn+0x114/0x140 [<6002fdff>] warn_slowpath_null+0x1f/0x30 [<60165829>] start_this_handle+0x569/0x580 [<60165d4e>] jbd2__journal_start+0x11e/0x220 [<60146690>] __ext4_journal_start_sb+0x60/0xa0 [<60120a81>] ext4_truncate+0x131/0x3a0 [<60123677>] ext4_setattr+0x757/0x840 [<600d5d0f>] notify_change+0x16f/0x2a0 [<600b2b16>] do_truncate+0x76/0xc0 [<600c3e56>] path_openat+0x806/0x1300 [<600c55c9>] do_filp_open+0x89/0xf0 [<600b4074>] do_sys_open+0x134/0x1e0 [<600b4140>] SyS_open+0x20/0x30 [<6001ea68>] handle_syscall+0x88/0x90 [<600295fd>] userspace+0x3fd/0x500 [<6001ac55>] fork_handler+0x85/0x90 ---[ end trace 08b0b88b6387a244 ]--- [ Commit message modified and the extent tree depath check changed from 5 to 32 -- tytso ] Cc: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-24Linux 4.7v4.7Linus Torvalds
2016-07-24Merge tag 'ceph-for-4.7-rc8' of git://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph fix from Ilya Dryomov: "A fix for a long-standing bug in the incremental osdmap handling code that caused misdirected requests, tagged for stable" The tag is signed with a brand new key - Sage is on vacation and I didn't anticipate this" * tag 'ceph-for-4.7-rc8' of git://github.com/ceph/ceph-client: libceph: apply new_state before new_up_client on incrementals
2016-07-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix memory leak in nftables, from Liping Zhang. 2) Need to check result of vlan_insert_tag() in batman-adv otherwise we risk NULL skb derefs, from Sven Eckelmann. 3) Check for dev_alloc_skb() failures in cfg80211, from Gregory Greenman. 4) Handle properly when we have ppp_unregister_channel() happening in parallel with ppp_connect_channel(), from WANG Cong. 5) Fix DCCP deadlock, from Eric Dumazet. 6) Bail out properly in UDP if sk_filter() truncates the packet to be smaller than even the space that the protocol headers need. From Michal Kubecek. 7) Similarly for rose, dccp, and sctp, from Willem de Bruijn. 8) Make TCP challenge ACKs less predictable, from Eric Dumazet. 9) Fix infinite loop in bgmac_dma_tx_add() from Florian Fainelli. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (65 commits) packet: propagate sock_cmsg_send() error net/mlx5e: Fix del vxlan port command buffer memset packet: fix second argument of sock_tx_timestamp() net: switchdev: change ageing_time type to clock_t Update maintainer for EHEA driver. net/mlx4_en: Add resilience in low memory systems net/mlx4_en: Move filters cleanup to a proper location sctp: load transport header after sk_filter net/sched/sch_htb: clamp xstats tokens to fit into 32-bit int net: cavium: liquidio: Avoid dma_unmap_single on uninitialized ndata net: nb8800: Fix SKB leak in nb8800_receive() et131x: Fix logical vs bitwise check in et131x_tx_timeout() vlan: use a valid default mtu value for vlan over macsec net: bgmac: Fix infinite loop in bgmac_dma_tx_add() mlxsw: spectrum: Prevent invalid ingress buffer mapping mlxsw: spectrum: Prevent overwrite of DCB capability fields mlxsw: spectrum: Don't emit errors when PFC is disabled mlxsw: spectrum: Indicate support for autonegotiation mlxsw: spectrum: Force link training according to admin state r8152: add MODULE_VERSION ...
2016-07-23Merge branch 'overlayfs-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs fixes from Miklos Szeredi: "This contains a fix for a potential crash/corruption issue and another where the suid/sgid bits weren't cleared on write" * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: verify upper dentry in ovl_remove_and_whiteout() ovl: Copy up underlying inode's ->i_mode to overlay inode ovl: handle ATTR_KILL*
2016-07-23Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "Five fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: pps: do not crash when failed to register tools/vm/slabinfo: fix an unintentional printf testing/radix-tree: fix a macro expansion bug radix-tree: fix radix_tree_iter_retry() for tagged iterators. mm: memcontrol: fix cgroup creation failure after many small jobs
2016-07-23Merge tag 'drm-fixes-for-v4.7-rc8-intel-kbl' of ↵Linus Torvalds
git://people.freedesktop.org/~airlied/linux Pull intel kabylake drm fixes from Dave Airlie: "As mentioned Intel has gathered all the Kabylake fixes from -next, which we've enabled in 4.7 for the first time, these are pretty much limited in scope to only affects kabylake, which is hw that isn't shipping yet. So I'm mostly okay with it going in now. If we don't land this, it might be a good idea to disable kabylake support in 4.7 before we ship" * tag 'drm-fixes-for-v4.7-rc8-intel-kbl' of git://people.freedesktop.org/~airlied/linux: (28 commits) drm/i915/kbl: Introduce the first official DMC for Kabylake. drm/i915: Introduce Kabypoint PCH for Kabylake H/DT. drm/i915/gen9: implement WaConextSwitchWithConcurrentTLBInvalidate drm/i915/gen9: Add WaFbcHighMemBwCorruptionAvoidance drm/i195/fbc: Add WaFbcNukeOnHostModify drm/i915/gen9: Add WaFbcWakeMemOn drm/i915/gen9: Add WaFbcTurnOffFbcWatermark drm/i915/kbl: Add WaClearSlmSpaceAtContextSwitch drm/i915/gen9: Add WaEnableChickenDCPR drm/i915/kbl: Add WaDisableSbeCacheDispatchPortSharing drm/i915/kbl: Add WaDisableGafsUnitClkGating drm/i915/kbl: Add WaForGAMHang drm/i915: Add WaInsertDummyPushConstP for bxt and kbl drm/i915/kbl: Add WaDisableDynamicCreditSharing drm/i915/kbl: Add WaDisableGamClockGating drm/i915/gen9: Enable must set chicken bits in config0 reg drm/i915/kbl: Add WaDisableLSQCROPERFforOCL drm/i915/kbl: Add WaDisableSDEUnitClockGating drm/i915/kbl: Add WaDisableFenceDestinationToSLM for A0 drm/i915/kbl: Add WaEnableGapsTsvCreditFix ...
2016-07-23Merge tag 'drm-fixes-for-v4.7-rc8-intel' of ↵Linus Torvalds
git://people.freedesktop.org/~airlied/linux Pull drm fixes from Dave Airlie: "Two i915 regression fixes. Intel have submitted some Kabylake fixes I'll send separately, since this is the first kernel with kabylake support and they don't go much outside that area I think they should be fine" * tag 'drm-fixes-for-v4.7-rc8-intel' of git://people.freedesktop.org/~airlied/linux: drm/i915: add missing condition for committing planes on crtc drm/i915: Treat eDP as always connected, again
2016-07-23Merge tag 'm68k-for-v4.8-tag1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k Pull m68k upddates from Geert Uytterhoeven: - assorted spelling fixes - defconfig updates * tag 'm68k-for-v4.8-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k: m68k/defconfig: Update defconfigs for v4.7-rc2 m68k: Assorted spelling fixes