summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)Author
2020-08-13Linux 5.2.53v5.2.53Paul Gortmaker
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-08-13blktrace: ensure our debugfs dir existsLuis Chamberlain
commit b431ef837e3374da0db8ff6683170359aaa0859c upstream. We make an assumption that a debugfs directory exists, but since this can fail ensure it exists before allowing blktrace setup to complete. Otherwise we end up stuffing blktrace files on the debugfs root directory. In the worst case scenario this *in theory* can create an eventual panic *iff* in the future a similarly named file is created prior on the debugfs root directory. This theoretical crash can happen due to a recursive removal followed by a specific dentry removal. This doesn't fix any known crash, however I have seen the files go into the main debugfs root directory in cases where the debugfs directory was not created due to other internal bugs with blktrace now fixed. blktrace is also completely useless without this directory, so this ensures to userspace we only setup blktrace if the kernel can stuff files where they are supposed to go into. debugfs directory creations typically aren't checked for, and we have maintainers doing sweep removals of these checks, but since we need this check to ensure proper userspace blktrace functionality we make sure to annotate the justification for the check. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-08-13blktrace: fix debugfs use after freeLuis Chamberlain
commit bad8e64fb19d3a0de5e564d9a7271c31bd684369 upstream. On commit 6ac93117ab00 ("blktrace: use existing disk debugfs directory") merged on v4.12 Omar fixed the original blktrace code for request-based drivers (multiqueue). This however left in place a possible crash, if you happen to abuse blktrace while racing to remove / add a device. We used to use asynchronous removal of the request_queue, and with that the issue was easier to reproduce. Now that we have reverted to synchronous removal of the request_queue, the issue is still possible to reproduce, its however just a bit more difficult. We essentially run two instances of break-blktrace which add/remove a loop device, and setup a blktrace and just never tear the blktrace down. We do this twice in parallel. This is easily reproduced with the script run_0004.sh from break-blktrace [0]. We can end up with two types of panics each reflecting where we race, one a failed blktrace setup: [ 252.426751] debugfs: Directory 'loop0' with parent 'block' already present! [ 252.432265] BUG: kernel NULL pointer dereference, address: 00000000000000a0 [ 252.436592] #PF: supervisor write access in kernel mode [ 252.439822] #PF: error_code(0x0002) - not-present page [ 252.442967] PGD 0 P4D 0 [ 252.444656] Oops: 0002 [#1] SMP NOPTI [ 252.446972] CPU: 10 PID: 1153 Comm: break-blktrace Tainted: G E 5.7.0-rc2-next-20200420+ #164 [ 252.452673] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 [ 252.456343] RIP: 0010:down_write+0x15/0x40 [ 252.458146] Code: eb ca e8 ae 22 8d ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc 0f 1f 44 00 00 55 48 89 fd e8 52 db ff ff 31 c0 ba 01 00 00 00 <f0> 48 0f b1 55 00 75 0f 48 8b 04 25 c0 8b 01 00 48 89 45 08 5d [ 252.463638] RSP: 0018:ffffa626415abcc8 EFLAGS: 00010246 [ 252.464950] RAX: 0000000000000000 RBX: ffff958c25f0f5c0 RCX: ffffff8100000000 [ 252.466727] RDX: 0000000000000001 RSI: ffffff8100000000 RDI: 00000000000000a0 [ 252.468482] RBP: 00000000000000a0 R08: 0000000000000000 R09: 0000000000000001 [ 252.470014] R10: 0000000000000000 R11: ffff958d1f9227ff R12: 0000000000000000 [ 252.471473] R13: ffff958c25ea5380 R14: ffffffff8cce15f1 R15: 00000000000000a0 [ 252.473346] FS: 00007f2e69dee540(0000) GS:ffff958c2fc80000(0000) knlGS:0000000000000000 [ 252.475225] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 252.476267] CR2: 00000000000000a0 CR3: 0000000427d10004 CR4: 0000000000360ee0 [ 252.477526] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 252.478776] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 252.479866] Call Trace: [ 252.480322] simple_recursive_removal+0x4e/0x2e0 [ 252.481078] ? debugfs_remove+0x60/0x60 [ 252.481725] ? relay_destroy_buf+0x77/0xb0 [ 252.482662] debugfs_remove+0x40/0x60 [ 252.483518] blk_remove_buf_file_callback+0x5/0x10 [ 252.484328] relay_close_buf+0x2e/0x60 [ 252.484930] relay_open+0x1ce/0x2c0 [ 252.485520] do_blk_trace_setup+0x14f/0x2b0 [ 252.486187] __blk_trace_setup+0x54/0xb0 [ 252.486803] blk_trace_ioctl+0x90/0x140 [ 252.487423] ? do_sys_openat2+0x1ab/0x2d0 [ 252.488053] blkdev_ioctl+0x4d/0x260 [ 252.488636] block_ioctl+0x39/0x40 [ 252.489139] ksys_ioctl+0x87/0xc0 [ 252.489675] __x64_sys_ioctl+0x16/0x20 [ 252.490380] do_syscall_64+0x52/0x180 [ 252.491032] entry_SYSCALL_64_after_hwframe+0x44/0xa9 And the other on the device removal: [ 128.528940] debugfs: Directory 'loop0' with parent 'block' already present! [ 128.615325] BUG: kernel NULL pointer dereference, address: 00000000000000a0 [ 128.619537] #PF: supervisor write access in kernel mode [ 128.622700] #PF: error_code(0x0002) - not-present page [ 128.625842] PGD 0 P4D 0 [ 128.627585] Oops: 0002 [#1] SMP NOPTI [ 128.629871] CPU: 12 PID: 544 Comm: break-blktrace Tainted: G E 5.7.0-rc2-next-20200420+ #164 [ 128.635595] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 [ 128.640471] RIP: 0010:down_write+0x15/0x40 [ 128.643041] Code: eb ca e8 ae 22 8d ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc 0f 1f 44 00 00 55 48 89 fd e8 52 db ff ff 31 c0 ba 01 00 00 00 <f0> 48 0f b1 55 00 75 0f 65 48 8b 04 25 c0 8b 01 00 48 89 45 08 5d [ 128.650180] RSP: 0018:ffffa9c3c05ebd78 EFLAGS: 00010246 [ 128.651820] RAX: 0000000000000000 RBX: ffff8ae9a6370240 RCX: ffffff8100000000 [ 128.653942] RDX: 0000000000000001 RSI: ffffff8100000000 RDI: 00000000000000a0 [ 128.655720] RBP: 00000000000000a0 R08: 0000000000000002 R09: ffff8ae9afd2d3d0 [ 128.657400] R10: 0000000000000056 R11: 0000000000000000 R12: 0000000000000000 [ 128.659099] R13: 0000000000000000 R14: 0000000000000003 R15: 00000000000000a0 [ 128.660500] FS: 00007febfd995540(0000) GS:ffff8ae9afd00000(0000) knlGS:0000000000000000 [ 128.662204] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 128.663426] CR2: 00000000000000a0 CR3: 0000000420042003 CR4: 0000000000360ee0 [ 128.664776] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 128.666022] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 128.667282] Call Trace: [ 128.667801] simple_recursive_removal+0x4e/0x2e0 [ 128.668663] ? debugfs_remove+0x60/0x60 [ 128.669368] debugfs_remove+0x40/0x60 [ 128.669985] blk_trace_free+0xd/0x50 [ 128.670593] __blk_trace_remove+0x27/0x40 [ 128.671274] blk_trace_shutdown+0x30/0x40 [ 128.671935] blk_release_queue+0x95/0xf0 [ 128.672589] kobject_put+0xa5/0x1b0 [ 128.673188] disk_release+0xa2/0xc0 [ 128.673786] device_release+0x28/0x80 [ 128.674376] kobject_put+0xa5/0x1b0 [ 128.674915] loop_remove+0x39/0x50 [loop] [ 128.675511] loop_control_ioctl+0x113/0x130 [loop] [ 128.676199] ksys_ioctl+0x87/0xc0 [ 128.676708] __x64_sys_ioctl+0x16/0x20 [ 128.677274] do_syscall_64+0x52/0x180 [ 128.677823] entry_SYSCALL_64_after_hwframe+0x44/0xa9 The common theme here is: debugfs: Directory 'loop0' with parent 'block' already present This crash happens because of how blktrace uses the debugfs directory where it places its files. Upon init we always create the same directory which would be needed by blktrace but we only do this for make_request drivers (multiqueue) block drivers. When you race a removal of these devices with a blktrace setup you end up in a situation where the make_request recursive debugfs removal will sweep away the blktrace files and then later blktrace will also try to remove individual dentries which are already NULL. The inverse is also possible and hence the two types of use after frees. We don't create the block debugfs directory on init for these types of block devices: * request-based block driver block devices * every possible partition * scsi-generic And so, this race should in theory only be possible with make_request drivers. We can fix the UAF by simply re-using the debugfs directory for make_request drivers (multiqueue) and only creating the ephemeral directory for the other type of block devices. The new clarifications on relying on the q->blk_trace_mutex *and* also checking for q->blk_trace *prior* to processing a blktrace ensures the debugfs directories are only created if no possible directory name clashes are possible. This goes tested with: o nvme partitions o ISCSI with tgt, and blktracing against scsi-generic with: o block o tape o cdrom o media changer o blktests This patch is part of the work which disputes the severity of CVE-2019-19770 which shows this issue is not a core debugfs issue, but a misuse of debugfs within blktace. Fixes: 6ac93117ab00 ("blktrace: use existing disk debugfs directory") Reported-by: syzbot+603294af2d01acfdd6da@syzkaller.appspotmail.com Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Omar Sandoval <osandov@fb.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Nicolai Stange <nstange@suse.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: "James E.J. Bottomley" <jejb@linux.ibm.com> Cc: yu kuai <yukuai3@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-08-13blktrace: annotate required lock on do_blk_trace_setup()Luis Chamberlain
commit a67549c8e568627290234e9fbe833cb9dfd36b55 upstream. Ensure it is clear which lock is required on do_blk_trace_setup(). Suggested-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-08-13blktrace: Avoid sparse warnings when assigning q->blk_traceJan Kara
commit c3dbe541ef77754729de5e82be2d6e5d267c6c8c upstream. Mostly for historical reasons, q->blk_trace is assigned through xchg() and cmpxchg() atomic operations. Although this is correct, sparse complains about this because it violates rcu annotations since commit c780e86dd48e ("blktrace: Protect q->blk_trace with RCU") which started to use rcu for accessing q->blk_trace. Furthermore there's no real need for atomic operations anymore since all changes to q->blk_trace happen under q->blk_trace_mutex and since it also makes more sense to check if q->blk_trace is set with the mutex held earlier. So let's just replace xchg() with rcu_replace_pointer() and cmpxchg() with explicit check and rcu_assign_pointer(). This makes the code more efficient and sparse happy. Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-08-13blktrace: no need to check return value of debugfs_create functionsGreg Kroah-Hartman
commit 3e6f176f304ee8effec698ebaff6b7baecb5e1e6 upstream. When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Cc: Jens Axboe <axboe@kernel.dk> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-block@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-08-13rcu: Upgrade rcu_swap_protected() to rcu_replace_pointer()Paul E. McKenney
commit a63fc6b75cca984c71f095282e0227a390ba88f3 upstream. Although the rcu_swap_protected() macro follows the example of swap(), the interactions with RCU make its update of its argument somewhat counter-intuitive. This commit therefore introduces an rcu_replace_pointer() that returns the old value of the RCU pointer instead of doing the argument update. Once all the uses of rcu_swap_protected() are updated to instead use rcu_replace_pointer(), rcu_swap_protected() will be removed. Link: https://lore.kernel.org/lkml/CAHk-=wiAsJLw1egFEE=Z7-GGtM6wcvtyytXZA1+BHqta4gg6Hw@mail.gmail.com/ Reported-by: Linus Torvalds <torvalds@linux-foundation.org> [ paulmck: From rcu_replace() to rcu_replace_pointer() per Ingo Molnar. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: Shane M Seymour <shane.seymour@hpe.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-08-13soc: qcom: rpmh: Dirt can only make you dirtier, not cleanerDouglas Anderson
commit 35bb4b22f606c0cc8eedf567313adc18161b1af4 upstream. Adding an item into the cache should never be able to make the cache cleaner. Use "|=" rather than "=" to update the dirty flag. Reviewed-by: Matthias Kaehlcke <mka@chromium.org> Reviewed-by: Maulik Shah <mkshah@codeaurora.org> Thanks, Maulik Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Fixes: bb7000677a1b ("soc: qcom: rpmh: Update dirty flag only when data changes") Reported-by: Stephen Boyd <swboyd@chromium.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20200417141531.1.Ia4b74158497213eabad7c3d474c50bfccb3f342e@changeid Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-08-13random32: move the pseudo-random 32-bit definitions to prandom.hLinus Torvalds
commit c0842fbc1b18c7a044e6ff3e8fa78bfa822c7d1a upstream. The addition of percpu.h to the list of includes in random.h revealed some circular dependencies on arm64 and possibly other platforms. This include was added solely for the pseudo-random definitions, which have nothing to do with the rest of the definitions in this file but are still there for legacy reasons. This patch moves the pseudo-random parts to linux/prandom.h and the percpu.h include with it, which is now guarded by _LINUX_PRANDOM_H and protected against recursive inclusion. A further cleanup step would be to remove this from <linux/random.h> entirely, and make people who use the prandom infrastructure include just the new header file. That's a bit of a churn patch, but grepping for "prandom_" and "next_pseudo_random32" "struct rnd_state" should catch most users. But it turns out that that nice cleanup step is fairly painful, because a _lot_ of code currently seems to depend on the implicit include of <linux/random.h>, which can currently come in a lot of ways, including such fairly core headfers as <linux/net.h>. So the "nice cleanup" part may or may never happen. Fixes: 1c9df907da83 ("random: fix circular include dependency on arm64 after addition of percpu.h") Tested-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-08-13powerpc: Fix circular dependency between percpu.h and mmu.hMichael Ellerman
commit 0c83b277ada72b585e6a3e52b067669df15bcedb upstream. Recently random.h started including percpu.h (see commit f227e3ec3b5c ("random32: update the net random state on interrupt and activity")), which broke corenet64_smp_defconfig: In file included from /linux/arch/powerpc/include/asm/paca.h:18, from /linux/arch/powerpc/include/asm/percpu.h:13, from /linux/include/linux/random.h:14, from /linux/lib/uuid.c:14: /linux/arch/powerpc/include/asm/mmu.h:139:22: error: unknown type name 'next_tlbcam_idx' 139 | DECLARE_PER_CPU(int, next_tlbcam_idx); This is due to a circular header dependency: asm/mmu.h includes asm/percpu.h, which includes asm/paca.h, which includes asm/mmu.h Which means DECLARE_PER_CPU() isn't defined when mmu.h needs it. We can fix it by moving the include of paca.h below the include of asm-generic/percpu.h. This moves the include of paca.h out of the #ifdef __powerpc64__, but that is OK because paca.h is almost entirely inside #ifdef CONFIG_PPC64 anyway. It also moves the include of paca.h out of the #ifdef CONFIG_SMP, which could possibly break something, but seems to have no ill effects. Fixes: f227e3ec3b5c ("random32: update the net random state on interrupt and activity") Cc: stable@vger.kernel.org # v5.8 Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200804130558.292328-1-mpe@ellerman.id.au Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-08-13ocfs2: fix unbalanced lockingPavel Machek
commit 57c720d4144a9c2b88105c3e8f7b0e97e4b5cc93 upstream. Based on what fails, function can return with nfs_sync_rwlock either locked or unlocked. That can not be right. Always return with lock unlocked on error. Fixes: 4cd9973f9ff6 ("ocfs2: avoid inode removal while nfsd is accessing it") Signed-off-by: Pavel Machek (CIP) <pavel@denx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Link: http://lkml.kernel.org/r/20200724124443.GA28164@duo.ucw.cz Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-08-13ocfs2: change slot number type s16 to u16Junxiao Bi
commit 38d51b2dd171ad973afc1f5faab825ed05a2d5e9 upstream. Dan Carpenter reported the following static checker warning. fs/ocfs2/super.c:1269 ocfs2_parse_options() warn: '(-1)' 65535 can't fit into 32767 'mopt->slot' fs/ocfs2/suballoc.c:859 ocfs2_init_inode_steal_slot() warn: '(-1)' 65535 can't fit into 32767 'osb->s_inode_steal_slot' fs/ocfs2/suballoc.c:867 ocfs2_init_meta_steal_slot() warn: '(-1)' 65535 can't fit into 32767 'osb->s_meta_steal_slot' That's because OCFS2_INVALID_SLOT is (u16)-1. Slot number in ocfs2 can be never negative, so change s16 to u16. Fixes: 9277f8334ffc ("ocfs2: fix value of OCFS2_INVALID_SLOT") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Gang He <ghe@suse.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200627001259.19757-1-junxiao.bi@oracle.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-08-13module: Correctly truncate sysfs sections outputKees Cook
commit 11990a5bd7e558e9203c1070fc52fb6f0488e75b upstream. The only-root-readable /sys/module/$module/sections/$section files did not truncate their output to the available buffer size. While most paths into the kernfs read handlers end up using PAGE_SIZE buffers, it's possible to get there through other paths (e.g. splice, sendfile). Actually limit the output to the "count" passed into the read function, and report it back correctly. *sigh* Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/lkml/20200805002015.GE23458@shao2-debian Fixes: ed66f991bb19 ("module: Refactor section attr into bin attribute") Cc: stable@vger.kernel.org Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Jessica Yu <jeyu@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-08-13mac80211: fix misplaced while instead of ifJohannes Berg
commit 5981fe5b0529ba25d95f37d7faa434183ad618c5 upstream. This never was intended to be a 'while' loop, it should've just been an 'if' instead of 'while'. Fix this. I noticed this while applying another patch from Ben that intended to fix a busy loop at this spot. Cc: stable@vger.kernel.org Fixes: b16798f5b907 ("mac80211: mark station unauthorized before key removal") Reported-by: Ben Greear <greearb@candelatech.com> Link: https://lore.kernel.org/r/20200803110209.253009ae41ff.I3522aad099392b31d5cf2dcca34cbac7e5832dde@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-08-13genirq/affinity: Make affinity setting if activated opt-inThomas Gleixner
commit f0c7baca180046824e07fc5f1326e83a8fd150c7 upstream. John reported that on a RK3288 system the perf per CPU interrupts are all affine to CPU0 and provided the analysis: "It looks like what happens is that because the interrupts are not per-CPU in the hardware, armpmu_request_irq() calls irq_force_affinity() while the interrupt is deactivated and then request_irq() with IRQF_PERCPU | IRQF_NOBALANCING. Now when irq_startup() runs with IRQ_STARTUP_NORMAL, it calls irq_setup_affinity() which returns early because IRQF_PERCPU and IRQF_NOBALANCING are set, leaving the interrupt on its original CPU." This was broken by the recent commit which blocked interrupt affinity setting in hardware before activation of the interrupt. While this works in general, it does not work for this particular case. As contrary to the initial analysis not all interrupt chip drivers implement an activate callback, the safe cure is to make the deferred interrupt affinity setting at activation time opt-in. Implement the necessary core logic and make the two irqchip implementations for which this is required opt-in. In hindsight this would have been the right thing to do, but ... Fixes: baedb87d1b53 ("genirq/affinity: Handle affinity setting on inactive interrupts correctly") Reported-by: John Keeping <john@metanate.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marc Zyngier <maz@kernel.org> Acked-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/87blk4tzgm.fsf@nanos.tec.linutronix.de Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-08-13x86/apic/vector: Force interupt handler invocation to irq contextThomas Gleixner
commit 008f1d60fe25810d4554916744b0975d76601b64 upstream. Sathyanarayanan reported that the PCI-E AER error injection mechanism can result in a NULL pointer dereference in apic_ack_edge(): BUG: unable to handle kernel NULL pointer dereference at 0000000000000078 RIP: 0010:apic_ack_edge+0x1e/0x40 Call Trace: handle_edge_irq+0x7d/0x1e0 generic_handle_irq+0x27/0x30 aer_inject_write+0x53a/0x720 It crashes in irq_complete_move() which dereferences get_irq_regs() which is obviously NULL when this is called from non interrupt context. Of course the pointer could be checked, but that just papers over the real issue. Invoking the low level interrupt handling mechanism from random code can wreckage the fragile interrupt affinity mechanism of x86 as interrupts can only be moved in interrupt context or with special care when a CPU goes offline and the move has to be enforced. In the best case this triggers the warning in the MSI affinity setter, but if the call happens on the correct CPU it just corrupts state and might prevent further interrupt delivery for the affected device. Mark the APIC interrupts as unsuitable for being invoked in random contexts. This prevents the AER injection from proliferating the wreckage, but that's less broken than the current state of affairs and more correct than just papering over the problem by sprinkling random checks all over the place and silently corrupting state. Reported-by: sathyanarayanan.kuppuswamy@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200306130623.684591280@linutronix.de Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-08-13genirq: Add protection against unsafe usage of generic_handle_irq()Thomas Gleixner
commit c16816acd08697b02a53f56f8936497a9f6f6e7a upstream. In general calling generic_handle_irq() with interrupts disabled from non interrupt context is harmless. For some interrupt controllers like the x86 trainwrecks this is outright dangerous as it might corrupt state if an interrupt affinity change is pending. Add infrastructure which allows to mark interrupts as unsafe and catch such usage in generic_handle_irq(). Reported-by: sathyanarayanan.kuppuswamy@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lkml.kernel.org/r/20200306130623.590923677@linutronix.de Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-08-13arm64/alternatives: move length validation inside the subsectionSami Tolvanen
commit 966a0acce2fca776391823381dba95c40e03c339 upstream. Commit f7b93d42945c ("arm64/alternatives: use subsections for replacement sequences") breaks LLVM's integrated assembler, because due to its one-pass design, it cannot compute instruction sequence lengths before the layout for the subsection has been finalized. This change fixes the build by moving the .org directives inside the subsection, so they are processed after the subsection layout is known. Fixes: f7b93d42945c ("arm64/alternatives: use subsections for replacement sequences") Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Link: https://github.com/ClangBuiltLinux/linux/issues/1078 Link: https://lore.kernel.org/r/20200730153701.3892953-1-samitolvanen@google.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-08-13SUNRPC: Fix ("SUNRPC: Add "@len" parameter to gss_unwrap()")Chuck Lever
commit 986a4b63d3bc5f2c0eb4083b05aff2bf883b7b2f upstream. Braino when converting "buf->len -=" to "buf->len = len -". The result is under-estimation of the ralign and rslack values. On krb5p mounts, this has caused READDIR to fail with EIO, and KASAN splats when decoding READLINK replies. As a result of fixing this oversight, the gss_unwrap method now returns a buf->len that can be shorter than priv_len for small RPC messages. The additional adjustment done in unwrap_priv_data() can underflow buf->len. This causes the nfsd_request_too_large check to fail during some NFSv3 operations. Reported-by: Marian Rainer-Harbach Reported-by: Pierre Sauter <pierre.sauter@stwm.de> BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1886277 Fixes: 31c9590ae468 ("SUNRPC: Add "@len" parameter to gss_unwrap()") Reviewed-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-08-13ALSA: hda/realtek: typo_fix: enable headset mic of ASUS ROG Zephyrus ↵Armas Spann
G14(GA401) series with ALC289 commit 293a92c1d9913248b9987b68f3a5d6d2f0aae62b upstream. This patch fixes a small typo I accidently submitted with the initial patch. The board should be named GA401 not G401. Fixes: ff53664daff2 ("ALSA: hda/realtek: enable headset mic of ASUS ROG Zephyrus G14(G401) series with ALC289") Signed-off-by: Armas Spann <zappel@retarded.farm> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20200724140837.302763-1-zappel@retarded.farm Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-08-13ALSA: hda/realtek: enable headset mic of ASUS ROG Zephyrus G15(GA502) series ↵Armas Spann
with ALC289 commit 4b43d05a1978a93a19374c6e6b817c9c1ff4ba4b upstream. This patch adds support for headset mic to the ASUS ROG Zephyrus G15(GA502) notebook series by adding the corresponding vendor/pci_device id, as well as adding a new fixup for the used realtek ALC289. The fixup stets the correct pin to get the headset mic correctly recognized on audio-jack. Signed-off-by: Armas Spann <zappel@retarded.farm> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20200724140616.298892-1-zappel@retarded.farm Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-08-13usb: dwc2: Fix error path in gadget registrationMarek Szyprowski
commit 33a06f1300a79cfd461cea0268f05e969d4f34ec upstream. When gadget registration fails, one should not call usb_del_gadget_udc(). Ensure this by setting gadget->udc to NULL. Also in case of a failure there is no need to disable low-level hardware, so return immiedetly instead of jumping to error_init label. This fixes the following kernel NULL ptr dereference on gadget failure (can be easily triggered with g_mass_storage without any module parameters): dwc2 12480000.hsotg: dwc2_check_params: Invalid parameter besl=1 dwc2 12480000.hsotg: dwc2_check_params: Invalid parameter g_np_tx_fifo_size=1024 dwc2 12480000.hsotg: EPs: 16, dedicated fifos, 7808 entries in SPRAM Mass Storage Function, version: 2009/09/11 LUN: removable file: (no medium) no file given for LUN0 g_mass_storage 12480000.hsotg: failed to start g_mass_storage: -22 8<--- cut here --- Unable to handle kernel NULL pointer dereference at virtual address 00000104 pgd = (ptrval) [00000104] *pgd=00000000 Internal error: Oops: 805 [#1] PREEMPT SMP ARM Modules linked in: CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.8.0-rc5 #3133 Hardware name: Samsung Exynos (Flattened Device Tree) Workqueue: events deferred_probe_work_func PC is at usb_del_gadget_udc+0x38/0xc4 LR is at __mutex_lock+0x31c/0xb18 ... Process kworker/0:1 (pid: 12, stack limit = 0x(ptrval)) Stack: (0xef121db0 to 0xef122000) ... [<c076bf3c>] (usb_del_gadget_udc) from [<c0726bec>] (dwc2_hsotg_remove+0x10/0x20) [<c0726bec>] (dwc2_hsotg_remove) from [<c0711208>] (dwc2_driver_probe+0x57c/0x69c) [<c0711208>] (dwc2_driver_probe) from [<c06247c0>] (platform_drv_probe+0x6c/0xa4) [<c06247c0>] (platform_drv_probe) from [<c0621df4>] (really_probe+0x200/0x48c) [<c0621df4>] (really_probe) from [<c06221e8>] (driver_probe_device+0x78/0x1fc) [<c06221e8>] (driver_probe_device) from [<c061fcd4>] (bus_for_each_drv+0x74/0xb8) [<c061fcd4>] (bus_for_each_drv) from [<c0621b54>] (__device_attach+0xd4/0x16c) [<c0621b54>] (__device_attach) from [<c0620c98>] (bus_probe_device+0x88/0x90) [<c0620c98>] (bus_probe_device) from [<c06211b0>] (deferred_probe_work_func+0x3c/0xd0) [<c06211b0>] (deferred_probe_work_func) from [<c0149280>] (process_one_work+0x234/0x7dc) [<c0149280>] (process_one_work) from [<c014986c>] (worker_thread+0x44/0x51c) [<c014986c>] (worker_thread) from [<c0150b1c>] (kthread+0x158/0x1a0) [<c0150b1c>] (kthread) from [<c0100114>] (ret_from_fork+0x14/0x20) Exception stack(0xef121fb0 to 0xef121ff8) ... ---[ end trace 9724c2fc7cc9c982 ]--- While fixing this also fix the double call to dwc2_lowlevel_hw_disable() if dr_mode is set to USB_DR_MODE_PERIPHERAL. In such case low-level hardware is already disabled before calling usb_add_gadget_udc(). That function correctly preserves low-level hardware state, there is no need for the second unconditional dwc2_lowlevel_hw_disable() call. Fixes: 207324a321a8 ("usb: dwc2: Postponed gadget registration to the udc class driver") Acked-by: Minas Harutyunyan <hminas@synopsys.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Felipe Balbi <balbi@kernel.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-08-13thermal: ti-soc-thermal: Fix reversed condition in ti_thermal_expose_sensor()Dan Carpenter
commit 0f348db01fdf128813fdd659fcc339038fb421a4 upstream. This condition is reversed and will cause breakage. Fixes: 7440f518dad9 ("thermal/drivers/ti-soc-thermal: Avoid dereferencing ERR_PTR") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200616091949.GA11940@mwanda Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-08-13nvme-tcp: fix controller reset hang during trafficSagi Grimberg
commit 2875b0aecabe2f081a8432e2bc85b85df0529490 upstream. commit fe35ec58f0d3 ("block: update hctx map when use multiple maps") exposed an issue where we may hang trying to wait for queue freeze during I/O. We call blk_mq_update_nr_hw_queues which in case of multiple queue maps (which we have now for default/read/poll) is attempting to freeze the queue. However we never started queue freeze when starting the reset, which means that we have inflight pending requests that entered the queue that we will not complete once the queue is quiesced. So start a freeze before we quiesce the queue, and unfreeze the queue after we successfully connected the I/O queues (and make sure to call blk_mq_update_nr_hw_queues only after we are sure that the queue was already frozen). This follows to how the pci driver handles resets. Fixes: fe35ec58f0d3 ("block: update hctx map when use multiple maps") Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-08-13nvme-rdma: fix controller reset hang during trafficSagi Grimberg
commit 9f98772ba307dd89a3d17dc2589f213d3972fc64 upstream. commit fe35ec58f0d3 ("block: update hctx map when use multiple maps") exposed an issue where we may hang trying to wait for queue freeze during I/O. We call blk_mq_update_nr_hw_queues which in case of multiple queue maps (which we have now for default/read/poll) is attempting to freeze the queue. However we never started queue freeze when starting the reset, which means that we have inflight pending requests that entered the queue that we will not complete once the queue is quiesced. So start a freeze before we quiesce the queue, and unfreeze the queue after we successfully connected the I/O queues (and make sure to call blk_mq_update_nr_hw_queues only after we are sure that the queue was already frozen). This follows to how the pci driver handles resets. Fixes: fe35ec58f0d3 ("block: update hctx map when use multiple maps") Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-08-13regulator: fix memory leak on error path of regulator_register()Vladimir Zapolskiy
commit 9177514ce34902b3adb2abd490b6ad05d1cfcb43 upstream. The change corrects registration and deregistration on error path of a regulator, the problem was manifested by a reported memory leak on deferred probe: as3722-regulator as3722-regulator: regulator 13 register failed -517 # cat /sys/kernel/debug/kmemleak unreferenced object 0xecc43740 (size 64): comm "swapper/0", pid 1, jiffies 4294937640 (age 712.880s) hex dump (first 32 bytes): 72 65 67 75 6c 61 74 6f 72 2e 32 34 00 5a 5a 5a regulator.24.ZZZ 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ backtrace: [<0c4c3d1c>] __kmalloc_track_caller+0x15c/0x2c0 [<40c0ad48>] kvasprintf+0x64/0xd4 [<109abd29>] kvasprintf_const+0x70/0x84 [<c4215946>] kobject_set_name_vargs+0x34/0xa8 [<62282ea2>] dev_set_name+0x40/0x64 [<a39b6757>] regulator_register+0x3a4/0x1344 [<16a9543f>] devm_regulator_register+0x4c/0x84 [<51a4c6a1>] as3722_regulator_probe+0x294/0x754 ... The memory leak problem was introduced as a side ef another fix in regulator_register() error path, I believe that the proper fix is to decouple device_register() function into its two compounds and initialize a struct device before assigning any values to its fields and then using it before actual registration of a device happens. This lets to call put_device() safely after initialization, and, since now a release callback is called, kfree(rdev->constraints) shall be removed to exclude a double free condition. Fixes: a3cde9534ebd ("regulator: core: fix regulator_register() error paths to properly release rdev") Signed-off-by: Vladimir Zapolskiy <vz@mleia.com> Cc: Wen Yang <wenyang@linux.alibaba.com> Link: https://lore.kernel.org/r/20200724005013.23278-1-vz@mleia.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-08-13drm/i915/gvt: Fix two CFL MMIO handling caused by regression.Colin Xu
commit fccd0f7cf4d532674d727c7f204f038456675dee upstream. D_CFL was incorrectly removed for: GAMT_CHKN_BIT_REG GEN9_CTX_PREEMPT_REG V2: Update commit message. V3: Rebase and split Fixes and mis-handled MMIO. Fixes: 43226e6fe798 (drm/i915/gvt: replaced register address with name) Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com> Signed-off-by: Colin Xu <colin.xu@intel.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20200601030638.16002-1-colin.xu@intel.com Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-08-13iommu/vt-d: Make Intel SVM code 64-bit onlyLu Baolu
commit 9486727f5981a5ec5c0b699fb1777451bd6786e4 upstream. Current Intel SVM is designed by setting the pgd_t of the processor page table to FLPTR field of the PASID entry. The first level translation only supports 4 and 5 level paging structures, hence it's infeasible for the IOMMU to share a processor's page table when it's running in 32-bit mode. Let's disable 32bit support for now and claim support only when all the missing pieces are ready in the future. Fixes: 1c4f88b7f1f92 ("iommu/vt-d: Shared virtual address in scalable mode") Suggested-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20200622231345.29722-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-08-13spi: sprd: switch the sequence of setting WDG_LOAD_LOW and _HIGHLingling Xu
commit 8bdd79dae1ff5397351b95e249abcae126572617 upstream. The watchdog counter consists of WDG_LOAD_LOW and WDG_LOAD_HIGH, which would be loaded to watchdog counter once writing WDG_LOAD_LOW. Fixes: ac1775012058 ("spi: sprd: Add the support of restarting the system") Signed-off-by: Lingling Xu <ling_ling.xu@unisoc.com> Signed-off-by: Chunyan Zhang <chunyan.zhang@unisoc.com> Link: https://lore.kernel.org/r/20200602082415.5848-1-zhang.lyra@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-08-13rxrpc: Fix trace stringDavid Howells
commit aadf9dcef9d4cd68c73a4ab934f93319c4becc47 upstream. The trace symbol printer (__print_symbolic()) ignores symbols that map to an empty string and prints the hex value instead. Fix the symbol for rxrpc_cong_no_change to " -" instead of "" to avoid this. Fixes: b54a134a7de4 ("rxrpc: Fix handling of enums-to-string translation in tracing") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-08-13libceph: don't omit recovery_deletes in target_copy()Ilya Dryomov
commit 2f3fead62144002557f322c2a7c15e1255df0653 upstream. Currently target_copy() is used only for sending linger pings, so this doesn't come up, but generally omitting recovery_deletes can result in unneeded resends (force_resend in calc_target()). Fixes: ae78dd8139ce ("libceph: make RECOVERY_DELETES feature create a new interval") Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-08-13block: fix get_max_segment_size() overflow on 32bit archMing Lei
commit 4a2f704eb2d831a2d73d7f4cdd54f45c49c3c353 upstream. Commit 429120f3df2d starts to take account of segment's start dma address when computing max segment size, and data type of 'unsigned long' is used to do that. However, the segment mask may be 0xffffffff, so the figured out segment size may be overflowed in case of zero physical address on 32bit arch. Fix the issue by returning queue_max_segment_size() directly when that happens. Fixes: 429120f3df2d ("block: fix splitting segments on boundary masks") Reported-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Cc: Christoph Hellwig <hch@lst.de> Tested-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-08-13block: fix splitting segments on boundary masksMing Lei
commit 429120f3df2dba2bf3a4a19f4212a53ecefc7102 upstream. We ran into a problem with a mpt3sas based controller, where we would see random (and hard to reproduce) file corruption). The issue seemed specific to this controller, but wasn't specific to the file system. After a lot of debugging, we find out that it's caused by segments spanning a 4G memory boundary. This shouldn't happen, as the default setting for segment boundary masks is 4G. Turns out there are two issues in get_max_segment_size(): 1) The default segment boundary mask is bypassed 2) The segment start address isn't taken into account when checking segment boundary limit Fix these two issues by removing the bypass of the segment boundary check even if the mask is set to the default value, and taking into account the actual start address of the request when checking if a segment needs splitting. Cc: stable@vger.kernel.org # v5.1+ Reviewed-by: Chris Mason <clm@fb.com> Tested-by: Chris Mason <clm@fb.com> Fixes: dcebd755926b ("block: use bio_for_each_bvec() to compute multi-page bvec count") Signed-off-by: Ming Lei <ming.lei@redhat.com> Dropped const on the page pointer, ppc page_to_phys() doesn't mark the page as const... Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-08-13genirq/affinity: Handle affinity setting on inactive interrupts correctlyThomas Gleixner
commit baedb87d1b53532f81b4bd0387f83b05d4f7eb9a upstream. Setting interrupt affinity on inactive interrupts is inconsistent when hierarchical irq domains are enabled. The core code should just store the affinity and not call into the irq chip driver for inactive interrupts because the chip drivers may not be in a state to handle such requests. X86 has a hacky workaround for that but all other irq chips have not which causes problems e.g. on GIC V3 ITS. Instead of adding more ugly hacks all over the place, solve the problem in the core code. If the affinity is set on an inactive interrupt then: - Store it in the irq descriptors affinity mask - Update the effective affinity to reflect that so user space has a consistent view - Don't call into the irq chip driver This is the core equivalent of the X86 workaround and works correctly because the affinity setting is established in the irq chip when the interrupt is activated later on. Note, that this is only effective when hierarchical irq domains are enabled by the architecture. Doing it unconditionally would break legacy irq chip implementations. For hierarchial irq domains this works correctly as none of the drivers can have a dependency on affinity setting in inactive state by design. Remove the X86 workaround as it is not longer required. Fixes: 02edee152d6e ("x86/apic/vector: Ignore set_affinity call for inactive interrupts") Reported-by: Ali Saidi <alisaidi@amazon.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Ali Saidi <alisaidi@amazon.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200529015501.15771-1-alisaidi@amazon.com Link: https://lkml.kernel.org/r/877dv2rv25.fsf@nanos.tec.linutronix.de Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-08-13sched/fair: handle case of task_h_load() returning 0Vincent Guittot
commit 01cfcde9c26d8555f0e6e9aea9d6049f87683998 upstream. task_h_load() can return 0 in some situations like running stress-ng mmapfork, which forks thousands of threads, in a sched group on a 224 cores system. The load balance doesn't handle this correctly because env->imbalance never decreases and it will stop pulling tasks only after reaching loop_max, which can be equal to the number of running tasks of the cfs. Make sure that imbalance will be decreased by at least 1. misfit task is the other feature that doesn't handle correctly such situation although it's probably more difficult to face the problem because of the smaller number of CPUs and running tasks on heterogenous system. We can't simply ensure that task_h_load() returns at least one because it would imply to handle underflow in other places. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: <stable@vger.kernel.org> # v4.4+ Link: https://lkml.kernel.org/r/20200710152426.16981-1-vincent.guittot@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [PG: use v5.4.x-stable version instead of mainline.] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-08-13sched: Fix unreliable rseq cpu_id for new tasksMathieu Desnoyers
commit ce3614daabea8a2d01c1dd17ae41d1ec5e5ae7db upstream. While integrating rseq into glibc and replacing glibc's sched_getcpu implementation with rseq, glibc's tests discovered an issue with incorrect __rseq_abi.cpu_id field value right after the first time a newly created process issues sched_setaffinity. For the records, it triggers after building glibc and running tests, and then issuing: for x in {1..2000} ; do posix/tst-affinity-static & done and shows up as: error: Unexpected CPU 2, expected 0 error: Unexpected CPU 2, expected 0 error: Unexpected CPU 2, expected 0 error: Unexpected CPU 2, expected 0 error: Unexpected CPU 138, expected 0 error: Unexpected CPU 138, expected 0 error: Unexpected CPU 138, expected 0 error: Unexpected CPU 138, expected 0 This is caused by the scheduler invoking __set_task_cpu() directly from sched_fork() and wake_up_new_task(), thus bypassing rseq_migrate() which is done by set_task_cpu(). Add the missing rseq_migrate() to both functions. The only other direct use of __set_task_cpu() is done by init_idle(), which does not involve a user-space task. Based on my testing with the glibc test-case, just adding rseq_migrate() to wake_up_new_task() is sufficient to fix the observed issue. Also add it to sched_fork() to keep things consistent. The reason why this never triggered so far with the rseq/basic_test selftest is unclear. The current use of sched_getcpu(3) does not typically require it to be always accurate. However, use of the __rseq_abi.cpu_id field within rseq critical sections requires it to be accurate. If it is not accurate, it can cause corruption in the per-cpu data targeted by rseq critical sections in user-space. Reported-By: Florian Weimer <fweimer@redhat.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-By: Florian Weimer <fweimer@redhat.com> Cc: stable@vger.kernel.org # v4.18+ Link: https://lkml.kernel.org/r/20200707201505.2632-1-mathieu.desnoyers@efficios.com Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-08-13arm64: compat: Ensure upper 32 bits of x0 are zero on syscall returnWill Deacon
commit 15956689a0e60aa0c795174f3c310b60d8794235 upstream. Although we zero the upper bits of x0 on entry to the kernel from an AArch32 task, we do not clear them on the exception return path and can therefore expose 64-bit sign extended syscall return values to userspace via interfaces such as the 'perf_regs' ABI, which deal exclusively with 64-bit registers. Explicitly clear the upper 32 bits of x0 on return from a compat system call. Cc: <stable@vger.kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Keno Fischer <keno@juliacomputing.com> Cc: Luis Machado <luis.machado@linaro.org> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-08-13arm64: ptrace: Consistently use pseudo-singlestep exceptionsWill Deacon
commit ac2081cdc4d99c57f219c1a6171526e0fa0a6fff upstream. Although the arm64 single-step state machine can be fast-forwarded in cases where we wish to generate a SIGTRAP without actually executing an instruction, this has two major limitations outside of simply skipping an instruction due to emulation. 1. Stepping out of a ptrace signal stop into a signal handler where SIGTRAP is blocked. Fast-forwarding the stepping state machine in this case will result in a forced SIGTRAP, with the handler reset to SIG_DFL. 2. The hardware implicitly fast-forwards the state machine when executing an SVC instruction for issuing a system call. This can interact badly with subsequent ptrace stops signalled during the execution of the system call (e.g. SYSCALL_EXIT or seccomp traps), as they may corrupt the stepping state by updating the PSTATE for the tracee. Resolve both of these issues by injecting a pseudo-singlestep exception on entry to a signal handler and also on return to userspace following a system call. Cc: <stable@vger.kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Tested-by: Luis Machado <luis.machado@linaro.org> Reported-by: Keno Fischer <keno@juliacomputing.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-08-13arm64: ptrace: Override SPSR.SS when single-stepping is enabledWill Deacon
commit 3a5a4366cecc25daa300b9a9174f7fdd352b9068 upstream. Luis reports that, when reverse debugging with GDB, single-step does not function as expected on arm64: | I've noticed, under very specific conditions, that a PTRACE_SINGLESTEP | request by GDB won't execute the underlying instruction. As a consequence, | the PC doesn't move, but we return a SIGTRAP just like we would for a | regular successful PTRACE_SINGLESTEP request. The underlying problem is that when the CPU register state is restored as part of a reverse step, the SPSR.SS bit is cleared and so the hardware single-step state can transition to the "active-pending" state, causing an unexpected step exception to be taken immediately if a step operation is attempted. In hindsight, we probably shouldn't have exposed SPSR.SS in the pstate accessible by the GPR regset, but it's a bit late for that now. Instead, simply prevent userspace from configuring the bit to a value which is inconsistent with the TIF_SINGLESTEP state for the task being traced. Cc: <stable@vger.kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Keno Fischer <keno@juliacomputing.com> Link: https://lore.kernel.org/r/1eed6d69-d53d-9657-1fc9-c089be07f98c@linaro.org Reported-by: Luis Machado <luis.machado@linaro.org> Tested-by: Luis Machado <luis.machado@linaro.org> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-08-13thermal/drivers/cpufreq_cooling: Fix wrong frequency converted from powerFinley Xiao
commit 371a3bc79c11b707d7a1b7a2c938dc3cc042fffb upstream. The function cpu_power_to_freq is used to find a frequency and set the cooling device to consume at most the power to be converted. For example, if the power to be converted is 80mW, and the em table is as follow. struct em_cap_state table[] = { /* KHz mW */ { 1008000, 36, 0 }, { 1200000, 49, 0 }, { 1296000, 59, 0 }, { 1416000, 72, 0 }, { 1512000, 86, 0 }, }; The target frequency should be 1416000KHz, not 1512000KHz. Fixes: 349d39dc5739 ("thermal: cpu_cooling: merge frequency and power tables") Cc: <stable@vger.kernel.org> # v4.13+ Signed-off-by: Finley Xiao <finley.xiao@rock-chips.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200619090825.32747-1-finley.xiao@rock-chips.com Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [PG: use v5.4.x-stable version instead of mainline.] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-08-13thermal: int3403_thermal: Downgrade error messageAlex Hung
commit f3d7fb38976b1b0a8462ba1c7cbd404ddfaad086 upstream. Downgrade "Unsupported event" message from dev_err to dev_dbg to avoid flooding with this message on some platforms. Cc: stable@vger.kernel.org # v5.4+ Suggested-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Alex Hung <alex.hung@canonical.com> [ rzhang: fix typo in changelog ] Signed-off-by: Zhang Rui <rui.zhang@intel.com> Link: https://lore.kernel.org/r/20200615223957.183153-1-alex.hung@canonical.com Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-08-13misc: atmel-ssc: lock with mutex instead of spinlockMichał Mirosław
commit b037d60a3b1d1227609fd858fa34321f41829911 upstream. Uninterruptible context is not needed in the driver and causes lockdep warning because of mutex taken in of_alias_get_id(). Convert the lock to mutex to avoid the issue. Cc: stable@vger.kernel.org Fixes: 099343c64e16 ("ARM: at91: atmel-ssc: add device tree support") Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Link: https://lore.kernel.org/r/50f0d7fa107f318296afb49477c3571e4d6978c5.1592998403.git.mirq-linux@rere.qmqm.pl Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-08-13dmaengine: fsl-edma-common: correct DSIZE_32BYTERobin Gong
commit e142087b15960a4e1e5932942e5abae1f49d2318 upstream. Correct EDMA_TCD_ATTR_DSIZE_32BYTE define since it's broken by the below: '0x0005 --> BIT(3) | BIT(0))' Fixes: 4d6d3a90e4ac ("dmaengine: fsl-edma: fix macros") Signed-off-by: Robin Gong <yibin.gong@nxp.com> Tested-by: Angelo Dureghello <angelo@sysam.it> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/1593449998-32091-1-git-send-email-yibin.gong@nxp.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-08-13dmaengine: mcf-edma: Fix NULL pointer exception in mcf_edma_tx_handlerKrzysztof Kozlowski
commit 8995aa3d164ddd9200e6abcf25c449cf5298c858 upstream. On Toradex Colibri VF50 (Vybrid VF5xx) with fsl-edma driver NULL pointer exception happens occasionally on serial output initiated by login timeout. This was reproduced only if kernel was built with significant debugging options and EDMA driver is used with serial console. Issue looks like a race condition between interrupt handler fsl_edma_tx_handler() (called as a result of fsl_edma_xfer_desc()) and terminating the transfer with fsl_edma_terminate_all(). The fsl_edma_tx_handler() handles interrupt for a transfer with already freed edesc and idle==true. The mcf-edma driver shares design and lot of code with fsl-edma. It looks like being affected by same problem. Fix this pattern the same way as fix for fsl-edma driver. Fixes: e7a3ff92eaf1 ("dmaengine: fsl-edma: add ColdFire mcf5441x edma support") Cc: <stable@vger.kernel.org> Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Reviewed-by: Robin Gong <yibin.gong@nxp.com> Link: https://lore.kernel.org/r/1591881665-25592-1-git-send-email-krzk@kernel.org Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-08-13dmaengine: fsl-edma: Fix NULL pointer exception in fsl_edma_tx_handlerKrzysztof Kozlowski
commit f5e5677c420346b4e9788051c2e4d750996c428c upstream. NULL pointer exception happens occasionally on serial output initiated by login timeout. This was reproduced only if kernel was built with significant debugging options and EDMA driver is used with serial console. col-vf50 login: root Password: Login timed out after 60 seconds. Unable to handle kernel NULL pointer dereference at virtual address 00000044 Internal error: Oops: 5 [#1] ARM CPU: 0 PID: 157 Comm: login Not tainted 5.7.0-next-20200610-dirty #4 Hardware name: Freescale Vybrid VF5xx/VF6xx (Device Tree) (fsl_edma_tx_handler) from [<8016eb10>] (__handle_irq_event_percpu+0x64/0x304) (__handle_irq_event_percpu) from [<8016eddc>] (handle_irq_event_percpu+0x2c/0x7c) (handle_irq_event_percpu) from [<8016ee64>] (handle_irq_event+0x38/0x5c) (handle_irq_event) from [<801729e4>] (handle_fasteoi_irq+0xa4/0x160) (handle_fasteoi_irq) from [<8016ddcc>] (generic_handle_irq+0x34/0x44) (generic_handle_irq) from [<8016e40c>] (__handle_domain_irq+0x54/0xa8) (__handle_domain_irq) from [<80508bc8>] (gic_handle_irq+0x4c/0x80) (gic_handle_irq) from [<80100af0>] (__irq_svc+0x70/0x98) Exception stack(0x8459fe80 to 0x8459fec8) fe80: 72286b00 e3359f64 00000001 0000412d a0070013 85c98840 85c98840 a0070013 fea0: 8054e0d4 00000000 00000002 00000000 00000002 8459fed0 8081fbe8 8081fbec fec0: 60070013 ffffffff (__irq_svc) from [<8081fbec>] (_raw_spin_unlock_irqrestore+0x30/0x58) (_raw_spin_unlock_irqrestore) from [<8056cb48>] (uart_flush_buffer+0x88/0xf8) (uart_flush_buffer) from [<80554e60>] (tty_ldisc_hangup+0x38/0x1ac) (tty_ldisc_hangup) from [<8054c7f4>] (__tty_hangup+0x158/0x2bc) (__tty_hangup) from [<80557b90>] (disassociate_ctty.part.1+0x30/0x23c) (disassociate_ctty.part.1) from [<8011fc18>] (do_exit+0x580/0xba0) (do_exit) from [<801214f8>] (do_group_exit+0x3c/0xb4) (do_group_exit) from [<80121580>] (__wake_up_parent+0x0/0x14) Issue looks like race condition between interrupt handler fsl_edma_tx_handler() (called as result of fsl_edma_xfer_desc()) and terminating the transfer with fsl_edma_terminate_all(). The fsl_edma_tx_handler() handles interrupt for a transfer with already freed edesc and idle==true. Fixes: d6be34fbd39b ("dma: Add Freescale eDMA engine driver support") Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Reviewed-by: Robin Gong <yibin.gong@nxp.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/1591877861-28156-2-git-send-email-krzk@kernel.org Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-08-13intel_th: Fix a NULL dereference when hub driver is not loadedAlexander Shishkin
commit e78e1fdb282726beaf88aa75943682217e6ded0e upstream. Connecting master to an output port when GTH driver module is not loaded triggers a NULL dereference: > RIP: 0010:intel_th_set_output+0x35/0x70 [intel_th] > Call Trace: > ? sth_stm_link+0x12/0x20 [intel_th_sth] > stm_source_link_store+0x164/0x270 [stm_core] > dev_attr_store+0x17/0x30 > sysfs_kf_write+0x3e/0x50 > kernfs_fop_write+0xda/0x1b0 > __vfs_write+0x1b/0x40 > vfs_write+0xb9/0x1a0 > ksys_write+0x67/0xe0 > __x64_sys_write+0x1a/0x20 > do_syscall_64+0x57/0x1d0 > entry_SYSCALL_64_after_hwframe+0x44/0xa9 Make sure the module in question is loaded and return an error if not. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Fixes: 39f4034693b7c ("intel_th: Add driver infrastructure for Intel(R) Trace Hub devices") Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reported-by: Ammy Yi <ammy.yi@intel.com> Tested-by: Ammy Yi <ammy.yi@intel.com> Cc: stable@vger.kernel.org # v4.4 Link: https://lore.kernel.org/r/20200706161339.55468-5-alexander.shishkin@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-08-13intel_th: pci: Add Emmitsburg PCH supportAlexander Shishkin
commit fd73d74a32bfaaf259441322cc5a1c83caaa94f2 upstream. This adds support for the Trace Hub in Emmitsburg PCH. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: stable@vger.kernel.org # v4.14+ Link: https://lore.kernel.org/r/20200706161339.55468-4-alexander.shishkin@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-08-13intel_th: pci: Add Tiger Lake PCH-H supportAlexander Shishkin
commit 6227585dc7b6a5405fc08dc322f98cb95e2f0eb4 upstream. This adds support for the Trace Hub in Tiger Lake PCH-H. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: stable@vger.kernel.org # v4.14+ Link: https://lore.kernel.org/r/20200706161339.55468-3-alexander.shishkin@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-08-13intel_th: pci: Add Jasper Lake CPU supportAlexander Shishkin
commit 203c1f615052921901b7a8fbe2005d8ea6add076 upstream. This adds support for the Trace Hub in Jasper Lake CPU. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: stable@vger.kernel.org # v4.14+ Link: https://lore.kernel.org/r/20200706161339.55468-2-alexander.shishkin@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-08-13powerpc/book3s64/pkeys: Fix pkey_access_permitted() for execute disable pkeyAneesh Kumar K.V
commit 192b6a780598976feb7321ff007754f8511a4129 upstream. Even if the IAMR value denies execute access, the current code returns true from pkey_access_permitted() for an execute permission check, if the AMR read pkey bit is cleared. This results in repeated page fault loop with a test like below: #define _GNU_SOURCE #include <errno.h> #include <stdio.h> #include <stdlib.h> #include <signal.h> #include <inttypes.h> #include <assert.h> #include <malloc.h> #include <unistd.h> #include <pthread.h> #include <sys/mman.h> #ifdef SYS_pkey_mprotect #undef SYS_pkey_mprotect #endif #ifdef SYS_pkey_alloc #undef SYS_pkey_alloc #endif #ifdef SYS_pkey_free #undef SYS_pkey_free #endif #undef PKEY_DISABLE_EXECUTE #define PKEY_DISABLE_EXECUTE 0x4 #define SYS_pkey_mprotect 386 #define SYS_pkey_alloc 384 #define SYS_pkey_free 385 #define PPC_INST_NOP 0x60000000 #define PPC_INST_BLR 0x4e800020 #define PROT_RWX (PROT_READ | PROT_WRITE | PROT_EXEC) static int sys_pkey_mprotect(void *addr, size_t len, int prot, int pkey) { return syscall(SYS_pkey_mprotect, addr, len, prot, pkey); } static int sys_pkey_alloc(unsigned long flags, unsigned long access_rights) { return syscall(SYS_pkey_alloc, flags, access_rights); } static int sys_pkey_free(int pkey) { return syscall(SYS_pkey_free, pkey); } static void do_execute(void *region) { /* jump to region */ asm volatile( "mtctr %0;" "bctrl" : : "r"(region) : "ctr", "lr"); } static void do_protect(void *region) { size_t pgsize; int i, pkey; pgsize = getpagesize(); pkey = sys_pkey_alloc(0, PKEY_DISABLE_EXECUTE); assert (pkey > 0); /* perform mprotect */ assert(!sys_pkey_mprotect(region, pgsize, PROT_RWX, pkey)); do_execute(region); /* free pkey */ assert(!sys_pkey_free(pkey)); } int main(int argc, char **argv) { size_t pgsize, numinsns; unsigned int *region; int i; /* allocate memory region to protect */ pgsize = getpagesize(); region = memalign(pgsize, pgsize); assert(region != NULL); assert(!mprotect(region, pgsize, PROT_RWX)); /* fill page with NOPs with a BLR at the end */ numinsns = pgsize / sizeof(region[0]); for (i = 0; i < numinsns - 1; i++) region[i] = PPC_INST_NOP; region[i] = PPC_INST_BLR; do_protect(region); return EXIT_SUCCESS; } The fix is to only check the IAMR for an execute check, the AMR value is not relevant. Fixes: f2407ef3ba22 ("powerpc: helper to validate key-access permissions of a pte") Cc: stable@vger.kernel.org # v4.16+ Reported-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [mpe: Add detail to change log, tweak wording & formatting] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200712132047.1038594-1-aneesh.kumar@linux.ibm.com Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>