summaryrefslogtreecommitdiffstats
path: root/arch/powerpc
AgeCommit message (Collapse)Author
2021-01-06powerpc: sysdev: add missing iounmap() on error in mpic_msgr_probe()Qinglang Miao
[ Upstream commit ffa1797040c5da391859a9556be7b735acbe1242 ] I noticed that iounmap() of msgr_block_addr before return from mpic_msgr_probe() in the error handling case is missing. So use devm_ioremap() instead of just ioremap() when remapping the message register block, so the mapping will be automatically released on probe failure. Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201028091551.136400-1-miaoqinglang@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-01-06powerpc/bitops: Fix possible undefined behaviour with fls() and fls64()Christophe Leroy
[ Upstream commit 1891ef21d92c4801ea082ee8ed478e304ddc6749 ] fls() and fls64() are using __builtin_ctz() and _builtin_ctzll(). On powerpc, those builtins trivially use ctlzw and ctlzd power instructions. Allthough those instructions provide the expected result with input argument 0, __builtin_ctz() and __builtin_ctzll() are documented as undefined for value 0. The easiest fix would be to use fls() and fls64() functions defined in include/asm-generic/bitops/builtin-fls.h and include/asm-generic/bitops/fls64.h, but GCC output is not optimal: 00000388 <testfls>: 388: 2c 03 00 00 cmpwi r3,0 38c: 41 82 00 10 beq 39c <testfls+0x14> 390: 7c 63 00 34 cntlzw r3,r3 394: 20 63 00 20 subfic r3,r3,32 398: 4e 80 00 20 blr 39c: 38 60 00 00 li r3,0 3a0: 4e 80 00 20 blr 000003b0 <testfls64>: 3b0: 2c 03 00 00 cmpwi r3,0 3b4: 40 82 00 1c bne 3d0 <testfls64+0x20> 3b8: 2f 84 00 00 cmpwi cr7,r4,0 3bc: 38 60 00 00 li r3,0 3c0: 4d 9e 00 20 beqlr cr7 3c4: 7c 83 00 34 cntlzw r3,r4 3c8: 20 63 00 20 subfic r3,r3,32 3cc: 4e 80 00 20 blr 3d0: 7c 63 00 34 cntlzw r3,r3 3d4: 20 63 00 40 subfic r3,r3,64 3d8: 4e 80 00 20 blr When the input of fls(x) is a constant, just check x for nullity and return either 0 or __builtin_clz(x). Otherwise, use cntlzw instruction directly. For fls64() on PPC64, do the same but with __builtin_clzll() and cntlzd instruction. On PPC32, lets take the generic fls64() which will use our fls(). The result is as expected: 00000388 <testfls>: 388: 7c 63 00 34 cntlzw r3,r3 38c: 20 63 00 20 subfic r3,r3,32 390: 4e 80 00 20 blr 000003a0 <testfls64>: 3a0: 2c 03 00 00 cmpwi r3,0 3a4: 40 82 00 10 bne 3b4 <testfls64+0x14> 3a8: 7c 83 00 34 cntlzw r3,r4 3ac: 20 63 00 20 subfic r3,r3,32 3b0: 4e 80 00 20 blr 3b4: 7c 63 00 34 cntlzw r3,r3 3b8: 20 63 00 40 subfic r3,r3,64 3bc: 4e 80 00 20 blr Fixes: 2fcff790dcb4 ("powerpc: Use builtin functions for fls()/__fls()/fls64()") Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Segher Boessenkool <segher@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/348c2d3f19ffcff8abe50d52513f989c4581d000.1603375524.git.christophe.leroy@csgroup.eu Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-12-30powerpc/powernv/memtrace: Fix crashing the kernel when enabling concurrentlyDavid Hildenbrand
commit d6718941a2767fb383e105d257d2105fe4f15f0e upstream. It's very easy to crash the kernel right now by simply trying to enable memtrace concurrently, hammering on the "enable" interface loop.sh: #!/bin/bash dmesg --console-off while true; do echo 0x40000000 > /sys/kernel/debug/powerpc/memtrace/enable done [root@localhost ~]# loop.sh & [root@localhost ~]# loop.sh & Resulting quickly in a kernel crash. Let's properly protect using a mutex. Fixes: 9d5171a8f248 ("powerpc/powernv: Enable removal of memory for in memory tracing") Cc: stable@vger.kernel.org# v4.14+ Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201111145322.15793-3-david@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-30powerpc/powernv/memtrace: Don't leak kernel memory to user spaceDavid Hildenbrand
commit c74cf7a3d59a21b290fe0468f5b470d0b8ee37df upstream. We currently leak kernel memory to user space, because memory offlining doesn't do any implicit clearing of memory and we are missing explicit clearing of memory. Let's keep it simple and clear pages before removing the linear mapping. Reproduced in QEMU/TCG with 10 GiB of main memory: [root@localhost ~]# dd obs=9G if=/dev/urandom of=/dev/null [... wait until "free -m" used counter no longer changes and cancel] 19665802+0 records in 1+0 records out 9663676416 bytes (9.7 GB, 9.0 GiB) copied, 135.548 s, 71.3 MB/s [root@localhost ~]# cat /sys/devices/system/memory/block_size_bytes 40000000 [root@localhost ~]# echo 0x40000000 > /sys/kernel/debug/powerpc/memtrace/enable [ 402.978663][ T1086] page:000000001bc4bc74 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24900 [ 402.980063][ T1086] flags: 0x7ffff000001000(reserved) [ 402.980415][ T1086] raw: 007ffff000001000 c00c000000924008 c00c000000924008 0000000000000000 [ 402.980627][ T1086] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000 [ 402.980845][ T1086] page dumped because: unmovable page [ 402.989608][ T1086] Offlined Pages 16384 [ 403.324155][ T1086] memtrace: Allocated trace memory on node 0 at 0x0000000200000000 Before this patch: [root@localhost ~]# hexdump -C /sys/kernel/debug/powerpc/memtrace/00000000/trace | head 00000000 c8 25 72 51 4d 26 36 c5 5c c2 56 15 d5 1a cd 10 |.%rQM&6.\.V.....| 00000010 19 b9 50 b2 cb e3 60 b8 ec 0a f3 ec 4b 3c 39 f0 |..P...`.....K<9.|$ 00000020 4e 5a 4c cf bd 26 19 ff 37 79 13 67 24 b7 b8 57 |NZL..&..7y.g$..W|$ 00000030 98 3e f5 be 6f 14 6a bd a4 52 bc 6e e9 e0 c1 5d |.>..o.j..R.n...]|$ 00000040 76 b3 ae b5 88 d7 da e3 64 23 85 2c 10 88 07 b6 |v.......d#.,....|$ 00000050 9a d8 91 de f7 50 27 69 2e 64 9c 6f d3 19 45 79 |.....P'i.d.o..Ey|$ 00000060 6a 6f 8a 61 71 19 1f c7 f1 df 28 26 ca 0f 84 55 |jo.aq.....(&...U|$ 00000070 01 3f be e4 e2 e1 da ff 7b 8c 8e 32 37 b4 24 53 |.?......{..27.$S|$ 00000080 1b 70 30 45 56 e6 8c c4 0e b5 4c fb 9f dd 88 06 |.p0EV.....L.....|$ 00000090 ef c4 18 79 f1 60 b1 5c 79 59 4d f4 36 d7 4a 5c |...y.`.\yYM.6.J\|$ After this patch: [root@localhost ~]# hexdump -C /sys/kernel/debug/powerpc/memtrace/00000000/trace | head 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 40000000 Fixes: 9d5171a8f248 ("powerpc/powernv: Enable removal of memory for in memory tracing") Cc: stable@vger.kernel.org # v4.14+ Reported-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201111145322.15793-2-david@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-30powerpc/xmon: Change printk() to pr_cont()Christophe Leroy
commit 7c6c86b36a36dd4a13d30bba07718e767aa2e7a1 upstream. Since some time now, printk() adds carriage return, leading to unusable xmon output if there is no udbg backend available: [ 54.288722] sysrq: Entering xmon [ 54.292209] Vector: 0 at [cace3d2c] [ 54.292274] pc: [ 54.292331] c0023650 [ 54.292468] : xmon+0x28/0x58 [ 54.292519] [ 54.292574] lr: [ 54.292630] c0023724 [ 54.292749] : sysrq_handle_xmon+0xa4/0xfc [ 54.292801] [ 54.292867] sp: cace3de8 [ 54.292931] msr: 9032 [ 54.292999] current = 0xc28d0000 [ 54.293072] pid = 377, comm = sh [ 54.293157] Linux version 5.10.0-rc6-s3k-dev-01364-gedf13f0ccd76-dirty (root@po17688vm.idsi0.si.c-s.fr) (powerpc64-linux-gcc (GCC) 10.1.0, GNU ld (GNU Binutils) 2.34) #4211 PREEMPT Fri Dec 4 09:32:11 UTC 2020 [ 54.293287] enter ? for help [ 54.293470] [cace3de8] [ 54.293532] c0023724 [ 54.293654] sysrq_handle_xmon+0xa4/0xfc [ 54.293711] (unreliable) ... [ 54.296002] [ 54.296159] --- Exception: c01 (System Call) at [ 54.296217] 0fd4e784 [ 54.296303] [ 54.296375] SP (7fca6ff0) is in userspace [ 54.296431] mon> [ 54.296484] <no input ...> Use pr_cont() instead. Fixes: 4bcc595ccd80 ("printk: reinstate KERN_CONT for printing continuation lines") Cc: stable@vger.kernel.org # v4.9+ Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> [mpe: Mention that it only happens when udbg is not available] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/c8a6ec704416ecd5ff2bd26213c9bc026bdd19de.1607077340.git.christophe.leroy@csgroup.eu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-30powerpc/rtas: Fix typo of ibm,open-errinjct in RTAS filterTyrel Datwyler
commit f10881a46f8914428110d110140a455c66bdf27b upstream. Commit bd59380c5ba4 ("powerpc/rtas: Restrict RTAS requests from userspace") introduced the following error when invoking the errinjct userspace tool: [root@ltcalpine2-lp5 librtas]# errinjct open [327884.071171] sys_rtas: RTAS call blocked - exploit attempt? [327884.071186] sys_rtas: token=0x26, nargs=0 (called by errinjct) errinjct: Could not open RTAS error injection facility errinjct: librtas: open: Unexpected I/O error The entry for ibm,open-errinjct in rtas_filter array has a typo where the "j" is omitted in the rtas call name. After fixing this typo the errinjct tool functions again as expected. [root@ltcalpine2-lp5 linux]# errinjct open RTAS error injection facility open, token = 1 Fixes: bd59380c5ba4 ("powerpc/rtas: Restrict RTAS requests from userspace") Cc: stable@vger.kernel.org Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201208195434.8289-1-tyreld@linux.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-30powerpc: Fix incorrect stw{, ux, u, x} instructions in __set_pte_atMathieu Desnoyers
commit d85be8a49e733dcd23674aa6202870d54bf5600d upstream. The placeholder for instruction selection should use the second argument's operand, which is %1, not %0. This could generate incorrect assembly code if the memory addressing of operand %0 is a different form from that of operand %1. Also remove the %Un placeholder because having %Un placeholders for two operands which are based on the same local var (ptep) doesn't make much sense. By the way, it doesn't change the current behaviour because "<>" constraint is missing for the associated "=m". [chleroy: revised commit log iaw segher's comments and removed %U0] Fixes: 9bf2b5cdc5fe ("powerpc: Fixes for CONFIG_PTE_64BIT for SMP support") Cc: <stable@vger.kernel.org> # v2.6.28+ Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Segher Boessenkool <segher@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/96354bd77977a6a933fe9020da57629007fdb920.1603358942.git.christophe.leroy@csgroup.eu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-30powerpc/perf: Exclude kernel samples while counting events in user space.Athira Rajeev
commit aa8e21c053d72b6639ea5a7f1d3a1d0209534c94 upstream. Perf event attritube supports exclude_kernel flag to avoid sampling/profiling in supervisor state (kernel). Based on this event attr flag, Monitor Mode Control Register bit is set to freeze on supervisor state. But sometimes (due to hardware limitation), Sampled Instruction Address Register (SIAR) locks on to kernel address even when freeze on supervisor is set. Patch here adds a check to drop those samples. Cc: stable@vger.kernel.org Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1606289215-1433-1-git-send-email-atrajeev@linux.vnet.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-30powerpc/pseries/hibernation: remove redundant cacheinfo updateNathan Lynch
[ Upstream commit b866459489fe8ef0e92cde3cbd6bbb1af6c4e99b ] Partitions with cache nodes in the device tree can encounter the following warning on resume: CPU 0 already accounted in PowerPC,POWER9@0(Data) WARNING: CPU: 0 PID: 3177 at arch/powerpc/kernel/cacheinfo.c:197 cacheinfo_cpu_online+0x640/0x820 These calls to cacheinfo_cpu_offline/online have been redundant since commit e610a466d16a ("powerpc/pseries/mobility: rebuild cacheinfo hierarchy post-migration"). Fixes: e610a466d16a ("powerpc/pseries/mobility: rebuild cacheinfo hierarchy post-migration") Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201207215200.1785968-25-nathanl@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-12-30powerpc/pseries/hibernation: drop pseries_suspend_begin() from suspend opsNathan Lynch
[ Upstream commit 52719fce3f4c7a8ac9eaa191e8d75a697f9fbcbc ] There are three ways pseries_suspend_begin() can be reached: 1. When "mem" is written to /sys/power/state: kobj_attr_store() -> state_store() -> pm_suspend() -> suspend_devices_and_enter() -> pseries_suspend_begin() This never works because there is no way to supply a valid stream id using this interface, and H_VASI_STATE is called with a stream id of zero. So this call path is useless at best. 2. When a stream id is written to /sys/devices/system/power/hibernate. pseries_suspend_begin() is polled directly from store_hibernate() until the stream is in the "Suspending" state (i.e. the platform is ready for the OS to suspend execution): dev_attr_store() -> store_hibernate() -> pseries_suspend_begin() 3. When a stream id is written to /sys/devices/system/power/hibernate (continued). After #2, pseries_suspend_begin() is called once again from the pm core: dev_attr_store() -> store_hibernate() -> pm_suspend() -> suspend_devices_and_enter() -> pseries_suspend_begin() This is redundant because the VASI suspend state is already known to be Suspending. The begin() callback of platform_suspend_ops is optional, so we can simply remove that assignment with no loss of function. Fixes: 32d8ad4e621d ("powerpc/pseries: Partition hibernation support") Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201207215200.1785968-18-nathanl@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-12-30powerpc/feature: Fix CPU_FTRS_ALWAYS by removing CPU_FTRS_GENERIC_32Christophe Leroy
[ Upstream commit 78665179e569c7e1fe102fb6c21d0f5b6951f084 ] On 8xx, we get the following features: [ 0.000000] cpu_features = 0x0000000000000100 [ 0.000000] possible = 0x0000000000000120 [ 0.000000] always = 0x0000000000000000 This is not correct. As CONFIG_PPC_8xx is mutually exclusive with all other configurations, the three lines should be equal. The problem is due to CPU_FTRS_GENERIC_32 which is taken when CONFIG_BOOK3S_32 is NOT selected. This CPU_FTRS_GENERIC_32 is pointless because there is no generic configuration supporting all 32 bits but book3s/32. Remove this pointless generic features definition to unbreak the calculation of 'possible' features and 'always' features. Fixes: 76bc080ef5a3 ("[POWERPC] Make default cputable entries reflect selected CPU family") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/76a85f30bf981d1aeaae00df99321235494da254.1604426550.git.christophe.leroy@csgroup.eu Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-12-30powerpc/64: Set up a kernel stack for secondaries before cpu_restore()Jordan Niethe
[ Upstream commit 3c0b976bf20d236c57adcefa80f86a0a1d737727 ] Currently in generic_secondary_smp_init(), cur_cpu_spec->cpu_restore() is called before a stack has been set up in r1. This was previously fine as the cpu_restore() functions were implemented in assembly and did not use a stack. However commit 5a61ef74f269 ("powerpc/64s: Support new device tree binding for discovering CPU features") used __restore_cpu_cpufeatures() as the cpu_restore() function for a device-tree features based cputable entry. This is a C function and hence uses a stack in r1. generic_secondary_smp_init() is entered on the secondary cpus via the primary cpu using the OPAL call opal_start_cpu(). In OPAL, each hardware thread has its own stack. The OPAL call is ran in the primary's hardware thread. During the call, a job is scheduled on a secondary cpu that will start executing at the address of generic_secondary_smp_init(). Hence the value that will be left in r1 when the secondary cpu enters the kernel is part of that secondary cpu's individual OPAL stack. This means that __restore_cpu_cpufeatures() will write to that OPAL stack. This is not horribly bad as each hardware thread has its own stack and the call that enters the kernel from OPAL never returns, but it is still wrong and should be corrected. Create the temp kernel stack before calling cpu_restore(). As noted by mpe, for a kexec boot, the secondary CPUs are released from the spin loop at address 0x60 by smp_release_cpus() and then jump to generic_secondary_smp_init(). The call to smp_release_cpus() is in setup_arch(), and it comes before the call to emergency_stack_init(). emergency_stack_init() allocates an emergency stack in the PACA for each CPU. This address in the PACA is what is used to set up the temp kernel stack in generic_secondary_smp_init(). Move releasing the secondary CPUs to after the PACAs have been allocated an emergency stack, otherwise the PACA stack pointer will contain garbage and hence the temp kernel stack created from it will be broken. Fixes: 5a61ef74f269 ("powerpc/64s: Support new device tree binding for discovering CPU features") Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201014072837.24539-1-jniethe5@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-12-30powerpc: Drop -me200 addition to build flagsMichael Ellerman
[ Upstream commit e02152ba2810f7c88cb54e71cda096268dfa9241 ] Currently a build with CONFIG_E200=y will fail with: Error: invalid switch -me200 Error: unrecognized option -me200 Upstream binutils has never supported an -me200 option. Presumably it was supported at some point by either a fork or Freescale internal binutils. We can't support code that we can't even build test, so drop the addition of -me200 to the build flags, so we can at least build with CONFIG_E200=y. Reported-by: Németh Márton <nm127@freemail.hu> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Acked-by: Scott Wood <oss@buserror.net> Link: https://lore.kernel.org/r/20201116120913.165317-1-mpe@ellerman.id.au Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-11-22powerpc/8xx: Always fault when _PAGE_ACCESSED is not setChristophe Leroy
commit 29daf869cbab69088fe1755d9dd224e99ba78b56 upstream. The kernel expects pte_young() to work regardless of CONFIG_SWAP. Make sure a minor fault is taken to set _PAGE_ACCESSED when it is not already set, regardless of the selection of CONFIG_SWAP. This adds at least 3 instructions to the TLB miss exception handlers fast path. Following patch will reduce this overhead. Also update the rotation instruction to the correct number of bits to reflect all changes done to _PAGE_ACCESSED over time. Fixes: d069cb4373fe ("powerpc/8xx: Don't touch ACCESSED when no SWAP.") Fixes: 5f356497c384 ("powerpc/8xx: remove unused _PAGE_WRITETHRU") Fixes: e0a8e0d90a9f ("powerpc/8xx: Handle PAGE_USER via APG bits") Fixes: 5b2753fc3e8a ("powerpc/8xx: Implementation of PAGE_EXEC") Fixes: a891c43b97d3 ("powerpc/8xx: Prepare handlers for _PAGE_HUGE for 512k pages.") Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/af834e8a0f1fa97bfae65664950f0984a70c4750.1602492856.git.christophe.leroy@csgroup.eu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-22powerpc/64s: flush L1D after user accessesNicholas Piggin
commit 9a32a7e78bd0cd9a9b6332cbdc345ee5ffd0c5de upstream. IBM Power9 processors can speculatively operate on data in the L1 cache before it has been completely validated, via a way-prediction mechanism. It is not possible for an attacker to determine the contents of impermissible memory using this method, since these systems implement a combination of hardware and software security measures to prevent scenarios where protected data could be leaked. However these measures don't address the scenario where an attacker induces the operating system to speculatively execute instructions using data that the attacker controls. This can be used for example to speculatively bypass "kernel user access prevention" techniques, as discovered by Anthony Steinhauser of Google's Safeside Project. This is not an attack by itself, but there is a possibility it could be used in conjunction with side-channels or other weaknesses in the privileged code to construct an attack. This issue can be mitigated by flushing the L1 cache between privilege boundaries of concern. This patch flushes the L1 cache after user accesses. This is part of the fix for CVE-2020-4788. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-22powerpc/uaccess: Evaluate macro arguments once, before user access is allowedNicholas Piggin
commit d02f6b7dab8228487268298ea1f21081c0b4b3eb upstream. get/put_user() can be called with nontrivial arguments. fs/proc/page.c has a good example: if (put_user(stable_page_flags(ppage), out)) { stable_page_flags() is quite a lot of code, including spin locks in the page allocator. Ensure these arguments are evaluated before user access is allowed. This improves security by reducing code with access to userspace, but it also fixes a PREEMPT bug with KUAP on powerpc/64s: stable_page_flags() is currently called with AMR set to allow writes, it ends up calling spin_unlock(), which can call preempt_schedule. But the task switch code can not be called with AMR set (it relies on interrupts saving the register), so this blows up. It's fine if the code inside allow_user_access() is preemptible, because a timer or IPI will save the AMR, but it's not okay to explicitly cause a reschedule. Fixes: de78a9c42a79 ("powerpc: Add a framework for Kernel Userspace Access Protection") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200407041245.600651-1-npiggin@gmail.com Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-22powerpc: Fix __clear_user() with KUAP enabledAndrew Donnellan
commit 61e3acd8c693a14fc69b824cb5b08d02cb90a6e7 upstream. The KUAP implementation adds calls in clear_user() to enable and disable access to userspace memory. However, it doesn't add these to __clear_user(), which is used in the ptrace regset code. As there's only one direct user of __clear_user() (the regset code), and the time taken to set the AMR for KUAP purposes is going to dominate the cost of a quick access_ok(), there's not much point having a separate path. Rename __clear_user() to __arch_clear_user(), and make __clear_user() just call clear_user(). Reported-by: syzbot+f25ecf4b2982d8c7a640@syzkaller-ppc64.appspotmail.com Reported-by: Daniel Axtens <dja@axtens.net> Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Fixes: de78a9c42a79 ("powerpc: Add a framework for Kernel Userspace Access Protection") Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> [mpe: Use __arch_clear_user() for the asm version like arm64 & nds32] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191209132221.15328-1-ajd@linux.ibm.com Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-22powerpc: Implement user_access_begin and friendsChristophe Leroy
commit 5cd623333e7cf4e3a334c70529268b65f2a6c2c7 upstream. Today, when a function like strncpy_from_user() is called, the userspace access protection is de-activated and re-activated for every word read. By implementing user_access_begin and friends, the protection is de-activated at the beginning of the copy and re-activated at the end. Implement user_access_begin(), user_access_end() and unsafe_get_user(), unsafe_put_user() and unsafe_copy_to_user() For the time being, we keep user_access_save() and user_access_restore() as nops. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/36d4fbf9e56a75994aca4ee2214c77b26a5a8d35.1579866752.git.christophe.leroy@c-s.fr Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-22powerpc: Add a framework for user access trackingChristophe Leroy
Backported from commit de78a9c42a79 ("powerpc: Add a framework for Kernel Userspace Access Protection"). Here we don't try to add the KUAP framework, we just want the helper functions because we want to put uaccess flush helpers in them. In terms of fixes, we don't need commit 1d8f739b07bd ("powerpc/kuap: Fix set direction in allow/prevent_user_access()") as we don't have real KUAP. Likewise as all our allows are noops and all our prevents are just flushes, we don't need commit 9dc086f1e9ef ("powerpc/futex: Fix incorrect user access blocking") The other 2 fixes we do need. The original description is: This patch implements a framework for Kernel Userspace Access Protection. Then subarches will have the possibility to provide their own implementation by providing setup_kuap() and allow/prevent_user_access(). Some platforms will need to know the area accessed and whether it is accessed from read, write or both. Therefore source, destination and size and handed over to the two functions. mpe: Rename to allow/prevent rather than unlock/lock, and add read/write wrappers. Drop the 32-bit code for now until we have an implementation for it. Add kuap to pt_regs for 64-bit as well as 32-bit. Don't split strings, use pr_crit_ratelimited(). Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-22powerpc/64s: flush L1D on kernel entryNicholas Piggin
commit f79643787e0a0762d2409b7b8334e83f22d85695 upstream. IBM Power9 processors can speculatively operate on data in the L1 cache before it has been completely validated, via a way-prediction mechanism. It is not possible for an attacker to determine the contents of impermissible memory using this method, since these systems implement a combination of hardware and software security measures to prevent scenarios where protected data could be leaked. However these measures don't address the scenario where an attacker induces the operating system to speculatively execute instructions using data that the attacker controls. This can be used for example to speculatively bypass "kernel user access prevention" techniques, as discovered by Anthony Steinhauser of Google's Safeside Project. This is not an attack by itself, but there is a possibility it could be used in conjunction with side-channels or other weaknesses in the privileged code to construct an attack. This issue can be mitigated by flushing the L1 cache between privilege boundaries of concern. This patch flushes the L1 cache on kernel entry. This is part of the fix for CVE-2020-4788. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-22powerpc/64s: move some exception handlers out of lineDaniel Axtens
(backport only) We're about to grow the exception handlers, which will make a bunch of them no longer fit within the space available. We move them out of line. This is a fiddly and error-prone business, so in the interests of reviewability I haven't merged this in with the addition of the entry flush. Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-05powerpc: Fix undetected data corruption with P9N DD2.1 VSX CI load emulationMichael Neuling
commit 1da4a0272c5469169f78cd76cf175ff984f52f06 upstream. __get_user_atomic_128_aligned() stores to kaddr using stvx which is a VMX store instruction, hence kaddr must be 16 byte aligned otherwise the store won't occur as expected. Unfortunately when we call __get_user_atomic_128_aligned() in p9_hmi_special_emu(), the buffer we pass as kaddr (ie. vbuf) isn't guaranteed to be 16B aligned. This means that the write to vbuf in __get_user_atomic_128_aligned() has the bottom bits of the address truncated. This results in other local variables being overwritten. Also vbuf will not contain the correct data which results in the userspace emulation being wrong and hence undetected user data corruption. In the past we've been mostly lucky as vbuf has ended up aligned but this is fragile and isn't always true. CONFIG_STACKPROTECTOR in particular can change the stack arrangement enough that our luck runs out. This issue only occurs on POWER9 Nimbus <= DD2.1 bare metal. The fix is to align vbuf to a 16 byte boundary. Fixes: 5080332c2c89 ("powerpc/64s: Add workaround for P9 vector CI load issue") Cc: stable@vger.kernel.org # v4.15+ Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201013043741.743413-1-mikey@neuling.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-05powerpc/powernv/elog: Fix race while processing OPAL error log event.Mahesh Salgaonkar
commit aea948bb80b478ddc2448f7359d574387521a52d upstream. Every error log reported by OPAL is exported to userspace through a sysfs interface and notified using kobject_uevent(). The userspace daemon (opal_errd) then reads the error log and acknowledges the error log is saved safely to disk. Once acknowledged the kernel removes the respective sysfs file entry causing respective resources to be released including kobject. However it's possible the userspace daemon may already be scanning elog entries when a new sysfs elog entry is created by the kernel. User daemon may read this new entry and ack it even before kernel can notify userspace about it through kobject_uevent() call. If that happens then we have a potential race between elog_ack_store->kobject_put() and kobject_uevent which can lead to use-after-free of a kernfs object resulting in a kernel crash. eg: BUG: Unable to handle kernel data access on read at 0x6b6b6b6b6b6b6bfb Faulting instruction address: 0xc0000000008ff2a0 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV CPU: 27 PID: 805 Comm: irq/29-opal-elo Not tainted 5.9.0-rc2-gcc-8.2.0-00214-g6f56a67bcbb5-dirty #363 ... NIP kobject_uevent_env+0xa0/0x910 LR elog_event+0x1f4/0x2d0 Call Trace: 0x5deadbeef0000122 (unreliable) elog_event+0x1f4/0x2d0 irq_thread_fn+0x4c/0xc0 irq_thread+0x1c0/0x2b0 kthread+0x1c4/0x1d0 ret_from_kernel_thread+0x5c/0x6c This patch fixes this race by protecting the sysfs file creation/notification by holding a reference count on kobject until we safely send kobject_uevent(). The function create_elog_obj() returns the elog object which if used by caller function will end up in use-after-free problem again. However, the return value of create_elog_obj() function isn't being used today and there is no need as well. Hence change it to return void to make this fix complete. Fixes: 774fea1a38c6 ("powerpc/powernv: Read OPAL error log and export it through sysfs") Cc: stable@vger.kernel.org # v3.15+ Reported-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reviewed-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> [mpe: Rework the logic to use a single return, reword comments, add oops] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201006122051.190176-1-mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-05powerpc: Warn about use of smt_snooze_delayJoel Stanley
commit a02f6d42357acf6e5de6ffc728e6e77faf3ad217 upstream. It's not done anything for a long time. Save the percpu variable, and emit a warning to remind users to not expect it to do anything. This uses pr_warn_once instead of pr_warn_ratelimit as testing 'ppc64_cpu --smt=off' on a 24 core / 4 SMT system showed the warning to be noisy, as the online/offline loop is slow. Fixes: 3fa8cad82b94 ("powerpc/pseries/cpuidle: smt-snooze-delay cleanup.") Cc: stable@vger.kernel.org # v3.14 Signed-off-by: Joel Stanley <joel@jms.id.au> Acked-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200902000012.3440389-1-joel@jms.id.au Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-05powerpc/rtas: Restrict RTAS requests from userspaceAndrew Donnellan
commit bd59380c5ba4147dcbaad3e582b55ccfd120b764 upstream. A number of userspace utilities depend on making calls to RTAS to retrieve information and update various things. The existing API through which we expose RTAS to userspace exposes more RTAS functionality than we actually need, through the sys_rtas syscall, which allows root (or anyone with CAP_SYS_ADMIN) to make any RTAS call they want with arbitrary arguments. Many RTAS calls take the address of a buffer as an argument, and it's up to the caller to specify the physical address of the buffer as an argument. We allocate a buffer (the "RMO buffer") in the Real Memory Area that RTAS can access, and then expose the physical address and size of this buffer in /proc/powerpc/rtas/rmo_buffer. Userspace is expected to read this address, poke at the buffer using /dev/mem, and pass an address in the RMO buffer to the RTAS call. However, there's nothing stopping the caller from specifying whatever address they want in the RTAS call, and it's easy to construct a series of RTAS calls that can overwrite arbitrary bytes (even without /dev/mem access). Additionally, there are some RTAS calls that do potentially dangerous things and for which there are no legitimate userspace use cases. In the past, this would not have been a particularly big deal as it was assumed that root could modify all system state freely, but with Secure Boot and lockdown we need to care about this. We can't fundamentally change the ABI at this point, however we can address this by implementing a filter that checks RTAS calls against a list of permitted calls and forces the caller to use addresses within the RMO buffer. The list is based off the list of calls that are used by the librtas userspace library, and has been tested with a number of existing userspace RTAS utilities. For compatibility with any applications we are not aware of that require other calls, the filter can be turned off at build time. Cc: stable@vger.kernel.org Reported-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200820044512.7543-1-ajd@linux.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-05powerpc/drmem: Make lmb_size 64 bitAneesh Kumar K.V
commit ec72024e35dddb88a81e40071c87ceb18b5ee835 upstream. Similar to commit 89c140bbaeee ("pseries: Fix 64 bit logical memory block panic") make sure different variables tracking lmb_size are updated to be 64 bit. This was found by code audit. Cc: stable@vger.kernel.org Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Acked-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201007114836.282468-2-aneesh.kumar@linux.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-05powerpc: select ARCH_WANT_IRQS_OFF_ACTIVATE_MMNicholas Piggin
[ Upstream commit 66acd46080bd9e5ad2be4b0eb1d498d5145d058e ] powerpc uses IPIs in some situations to switch a kernel thread away from a lazy tlb mm, which is subject to the TLB flushing race described in the changelog introducing ARCH_WANT_IRQS_OFF_ACTIVATE_MM. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200914045219.3736466-3-npiggin@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-11-05powerpc/powernv/smp: Fix spurious DBG() warningOliver O'Halloran
[ Upstream commit f6bac19cf65c5be21d14a0c9684c8f560f2096dd ] When building with W=1 we get the following warning: arch/powerpc/platforms/powernv/smp.c: In function ‘pnv_smp_cpu_kill_self’: arch/powerpc/platforms/powernv/smp.c:276:16: error: suggest braces around empty body in an ‘if’ statement [-Werror=empty-body] 276 | cpu, srr1); | ^ cc1: all warnings being treated as errors The full context is this block: if (srr1 && !generic_check_cpu_restart(cpu)) DBG("CPU%d Unexpected exit while offline srr1=%lx!\n", cpu, srr1); When building with DEBUG undefined DBG() expands to nothing and GCC emits the warning due to the lack of braces around an empty statement. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200804005410.146094-2-oohall@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-30powerpc/powernv/dump: Fix race while processing OPAL dumpVasant Hegde
[ Upstream commit 0a43ae3e2beb77e3481d812834d33abe270768ab ] Every dump reported by OPAL is exported to userspace through a sysfs interface and notified using kobject_uevent(). The userspace daemon (opal_errd) then reads the dump and acknowledges that the dump is saved safely to disk. Once acknowledged the kernel removes the respective sysfs file entry causing respective resources to be released including kobject. However it's possible the userspace daemon may already be scanning dump entries when a new sysfs dump entry is created by the kernel. User daemon may read this new entry and ack it even before kernel can notify userspace about it through kobject_uevent() call. If that happens then we have a potential race between dump_ack_store->kobject_put() and kobject_uevent which can lead to use-after-free of a kernfs object resulting in a kernel crash. This patch fixes this race by protecting the sysfs file creation/notification by holding a reference count on kobject until we safely send kobject_uevent(). The function create_dump_obj() returns the dump object which if used by caller function will end up in use-after-free problem again. However, the return value of create_dump_obj() function isn't being used today and there is no need as well. Hence change it to return void to make this fix complete. Fixes: c7e64b9ce04a ("powerpc/powernv Platform dump interface") Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201017164210.264619-1-hegdevasant@linux.vnet.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-30powerpc/perf/hv-gpci: Fix starting index valueKajol Jain
[ Upstream commit 0f9866f7e85765bbda86666df56c92f377c3bc10 ] Commit 9e9f60108423f ("powerpc/perf/{hv-gpci, hv-common}: generate requests with counters annotated") adds a framework for defining gpci counters. In this patch, they adds starting_index value as '0xffffffffffffffff'. which is wrong as starting_index is of size 32 bits. Because of this, incase we try to run hv-gpci event we get error. In power9 machine: command#: perf stat -e hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/ -C 0 -I 1000 event syntax error: '..bie_count_and_time_tlbie_instructions_issued/' \___ value too big for format, maximum is 4294967295 This patch fix this issue and changes starting_index value to '0xffffffff' After this patch: command#: perf stat -e hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/ -C 0 -I 1000 1.000085786 1,024 hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/ 2.000287818 1,024 hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/ 2.439113909 17,408 hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/ Fixes: 9e9f60108423 ("powerpc/perf/{hv-gpci, hv-common}: generate requests with counters annotated") Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201003074943.338618-1-kjain@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-30powerpc/perf: Exclude pmc5/6 from the irrelevant PMU group constraintsAthira Rajeev
[ Upstream commit 3b6c3adbb2fa42749c3d38cfc4d4d0b7e096bb7b ] PMU counter support functions enforces event constraints for group of events to check if all events in a group can be monitored. Incase of event codes using PMC5 and PMC6 ( 500fa and 600f4 respectively ), not all constraints are applicable, say the threshold or sample bits. But current code includes pmc5 and pmc6 in some group constraints (like IC_DC Qualifier bits) which is actually not applicable and hence results in those events not getting counted when scheduled along with group of other events. Patch fixes this by excluding PMC5/6 from constraints which are not relevant for it. Fixes: 7ffd948 ("powerpc/perf: factor out power8 pmu functions") Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1600672204-1610-1-git-send-email-atrajeev@linux.vnet.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-30powerpc/64s/radix: Fix mm_cpumask trimming race vs kthread_use_mmNicholas Piggin
[ Upstream commit a665eec0a22e11cdde708c1c256a465ebe768047 ] Commit 0cef77c7798a7 ("powerpc/64s/radix: flush remote CPUs out of single-threaded mm_cpumask") added a mechanism to trim the mm_cpumask of a process under certain conditions. One of the assumptions is that mm_users would not be incremented via a reference outside the process context with mmget_not_zero() then go on to kthread_use_mm() via that reference. That invariant was broken by io_uring code (see previous sparc64 fix), but I'll point Fixes: to the original powerpc commit because we are changing that assumption going forward, so this will make backports match up. Fix this by no longer relying on that assumption, but by having each CPU check the mm is not being used, and clearing their own bit from the mask only if it hasn't been switched-to by the time the IPI is processed. This relies on commit 38cf307c1f20 ("mm: fix kthread_use_mm() vs TLB invalidate") and ARCH_WANT_IRQS_OFF_ACTIVATE_MM to disable irqs over mm switch sequences. Fixes: 0cef77c7798a7 ("powerpc/64s/radix: flush remote CPUs out of single-threaded mm_cpumask") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Michael Ellerman <mpe@ellerman.id.au> Depends-on: 38cf307c1f20 ("mm: fix kthread_use_mm() vs TLB invalidate") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200914045219.3736466-5-npiggin@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-30powerpc/tau: Disable TAU between measurementsFinn Thain
[ Upstream commit e63d6fb5637e92725cf143559672a34b706bca4f ] Enabling CONFIG_TAU_INT causes random crashes: Unrecoverable exception 1700 at c0009414 (msr=1000) Oops: Unrecoverable exception, sig: 6 [#1] BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2 PowerMac Modules linked in: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.7.0-pmac-00043-gd5f545e1a8593 #5 NIP: c0009414 LR: c0009414 CTR: c00116fc REGS: c0799eb8 TRAP: 1700 Not tainted (5.7.0-pmac-00043-gd5f545e1a8593) MSR: 00001000 <ME> CR: 22000228 XER: 00000100 GPR00: 00000000 c0799f70 c076e300 00800000 0291c0ac 00e00000 c076e300 00049032 GPR08: 00000001 c00116fc 00000000 dfbd3200 ffffffff 007f80a8 00000000 00000000 GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 c075ce04 GPR24: c075ce04 dfff8880 c07b0000 c075ce04 00080000 00000001 c079ef98 c079ef5c NIP [c0009414] arch_cpu_idle+0x24/0x6c LR [c0009414] arch_cpu_idle+0x24/0x6c Call Trace: [c0799f70] [00000001] 0x1 (unreliable) [c0799f80] [c0060990] do_idle+0xd8/0x17c [c0799fa0] [c0060ba4] cpu_startup_entry+0x20/0x28 [c0799fb0] [c072d220] start_kernel+0x434/0x44c [c0799ff0] [00003860] 0x3860 Instruction dump: XXXXXXXX XXXXXXXX XXXXXXXX 3d20c07b XXXXXXXX XXXXXXXX XXXXXXXX 7c0802a6 XXXXXXXX XXXXXXXX XXXXXXXX 4e800421 XXXXXXXX XXXXXXXX XXXXXXXX 7d2000a6 ---[ end trace 3a0c9b5cb216db6b ]--- Resolve this problem by disabling each THRMn comparator when handling the associated THRMn interrupt and by disabling the TAU entirely when updating THRMn thresholds. Fixes: 1da177e4c3f41 ("Linux-2.6.12-rc2") Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/5a0ba3dc5612c7aac596727331284a3676c08472.1599260540.git.fthain@telegraphics.com.au Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-30powerpc/tau: Check processor type before enabling TAU interruptFinn Thain
[ Upstream commit 5e3119e15fed5b9a9a7e528665ff098a4a8dbdbc ] According to Freescale's documentation, MPC74XX processors have an erratum that prevents the TAU interrupt from working, so don't try to use it when running on those processors. Fixes: 1da177e4c3f41 ("Linux-2.6.12-rc2") Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/c281611544768e758bd58fe812cf702a5bd2d042.1599260540.git.fthain@telegraphics.com.au Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-29powerpc/tau: Remove duplicated set_thresholds() callFinn Thain
[ Upstream commit 420ab2bc7544d978a5d0762ee736412fe9c796ab ] The commentary at the call site seems to disagree with the code. The conditional prevents calling set_thresholds() via the exception handler, which appears to crash. Perhaps that's because it immediately triggers another TAU exception. Anyway, calling set_thresholds() from TAUupdate() is redundant because tau_timeout() does so. Fixes: 1da177e4c3f41 ("Linux-2.6.12-rc2") Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/d7c7ee33232cf72a6a6bbb6ef05838b2e2b113c0.1599260540.git.fthain@telegraphics.com.au Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-29powerpc/tau: Convert from timer to workqueueFinn Thain
[ Upstream commit b1c6a0a10bfaf36ec82fde6f621da72407fa60a1 ] Since commit 19dbdcb8039cf ("smp: Warn on function calls from softirq context") the Thermal Assist Unit driver causes a warning like the following when CONFIG_SMP is enabled. ------------[ cut here ]------------ WARNING: CPU: 0 PID: 0 at kernel/smp.c:428 smp_call_function_many_cond+0xf4/0x38c Modules linked in: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.7.0-pmac #3 NIP: c00b37a8 LR: c00b3abc CTR: c001218c REGS: c0799c60 TRAP: 0700 Not tainted (5.7.0-pmac) MSR: 00029032 <EE,ME,IR,DR,RI> CR: 42000224 XER: 00000000 GPR00: c00b3abc c0799d18 c076e300 c079ef5c c0011fec 00000000 00000000 00000000 GPR08: 00000100 00000100 00008000 ffffffff 42000224 00000000 c079d040 c079d044 GPR16: 00000001 00000000 00000004 c0799da0 c079f054 c07a0000 c07a0000 00000000 GPR24: c0011fec 00000000 c079ef5c c079ef5c 00000000 00000000 00000000 00000000 NIP [c00b37a8] smp_call_function_many_cond+0xf4/0x38c LR [c00b3abc] on_each_cpu+0x38/0x68 Call Trace: [c0799d18] [ffffffff] 0xffffffff (unreliable) [c0799d68] [c00b3abc] on_each_cpu+0x38/0x68 [c0799d88] [c0096704] call_timer_fn.isra.26+0x20/0x7c [c0799d98] [c0096b40] run_timer_softirq+0x1d4/0x3fc [c0799df8] [c05b4368] __do_softirq+0x118/0x240 [c0799e58] [c0039c44] irq_exit+0xc4/0xcc [c0799e68] [c000ade8] timer_interrupt+0x1b0/0x230 [c0799ea8] [c0013520] ret_from_except+0x0/0x14 --- interrupt: 901 at arch_cpu_idle+0x24/0x6c LR = arch_cpu_idle+0x24/0x6c [c0799f70] [00000001] 0x1 (unreliable) [c0799f80] [c0060990] do_idle+0xd8/0x17c [c0799fa0] [c0060ba8] cpu_startup_entry+0x24/0x28 [c0799fb0] [c072d220] start_kernel+0x434/0x44c [c0799ff0] [00003860] 0x3860 Instruction dump: 8129f204 2f890000 40beff98 3d20c07a 8929eec4 2f890000 40beff88 0fe00000 81220000 552805de 550802ef 4182ff84 <0fe00000> 3860ffff 7f65db78 7f44d378 ---[ end trace 34a886e47819c2eb ]--- Don't call on_each_cpu() from a timer callback, call it from a worker thread instead. Fixes: 1da177e4c3f41 ("Linux-2.6.12-rc2") Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/bb61650bea4f4c91fb8e24b9a6f130a1438651a7.1599260540.git.fthain@telegraphics.com.au Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-29powerpc/tau: Use appropriate temperature sample intervalFinn Thain
[ Upstream commit 66943005cc41f48e4d05614e8f76c0ca1812f0fd ] According to the MPC750 Users Manual, the SITV value in Thermal Management Register 3 is 13 bits long. The present code calculates the SITV value as 60 * 500 cycles. This would overflow to give 10 us on a 500 MHz CPU rather than the intended 60 us. (But according to the Microprocessor Datasheet, there is also a factor of 266 that has to be applied to this value on certain parts i.e. speed sort above 266 MHz.) Always use the maximum cycle count, as recommended by the Datasheet. Fixes: 1da177e4c3f41 ("Linux-2.6.12-rc2") Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/896f542e5f0f1d6cf8218524c2b67d79f3d69b3c.1599260540.git.fthain@telegraphics.com.au Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-29powerpc/pseries: explicitly reschedule during drmem_lmb list traversalNathan Lynch
[ Upstream commit 9d6792ffe140240ae54c881cc4183f9acc24b4df ] The drmem lmb list can have hundreds of thousands of entries, and unfortunately lookups take the form of linear searches. As long as this is the case, traversals have the potential to monopolize the CPU and provoke lockup reports, workqueue stalls, and the like unless they explicitly yield. Rather than placing cond_resched() calls within various for_each_drmem_lmb() loop blocks in the code, put it in the iteration expression of the loop macro itself so users can't omit it. Introduce a drmem_lmb_next() iteration helper function which calls cond_resched() at a regular interval during array traversal. Each iteration of the loop in DLPAR code paths can involve around ten RTAS calls which can each take up to 250us, so this ensures the check is performed at worst every few milliseconds. Fixes: 6c6ea53725b3 ("powerpc/mm: Separate ibm, dynamic-memory data from DT format") Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200813151131.2070161-1-nathanl@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-29powerpc/icp-hv: Fix missing of_node_put() in success pathNicholas Mc Guire
[ Upstream commit d3e669f31ec35856f5e85df9224ede5bdbf1bc7b ] Both of_find_compatible_node() and of_find_node_by_type() will return a refcounted node on success - thus for the success path the node must be explicitly released with a of_node_put(). Fixes: 0b05ac6e2480 ("powerpc/xics: Rewrite XICS driver") Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1530691407-3991-1-git-send-email-hofrat@osadl.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-29powerpc/pseries: Fix missing of_node_put() in rng_init()Nicholas Mc Guire
[ Upstream commit 67c3e59443f5fc77be39e2ce0db75fbfa78c7965 ] The call to of_find_compatible_node() returns a node pointer with refcount incremented thus it must be explicitly decremented here before returning. Fixes: a489043f4626 ("powerpc/pseries: Implement arch_get_random_long() based on H_RANDOM") Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1530522496-14816-1-git-send-email-hofrat@osadl.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-01powerpc/traps: Make unrecoverable NMIs die instead of panicNicholas Piggin
[ Upstream commit 265d6e588d87194c2fe2d6c240247f0264e0c19b ] System Reset and Machine Check interrupts that are not recoverable due to being nested or interrupting when RI=0 currently panic. This is not necessary, and can often just kill the current context and recover. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr> Link: https://lore.kernel.org/r/20200508043408.886394-16-npiggin@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-01KVM: PPC: Book3S HV: Treat TM-related invalid form instructions on P9 like ↵Gustavo Romero
the valid ones [ Upstream commit 1dff3064c764b5a51c367b949b341d2e38972bec ] On P9 DD2.2 due to a CPU defect some TM instructions need to be emulated by KVM. This is handled at first by the hardware raising a softpatch interrupt when certain TM instructions that need KVM assistance are executed in the guest. Althought some TM instructions per Power ISA are invalid forms they can raise a softpatch interrupt too. For instance, 'tresume.' instruction as defined in the ISA must have bit 31 set (1), but an instruction that matches 'tresume.' PO and XO opcode fields but has bit 31 not set (0), like 0x7cfe9ddc, also raises a softpatch interrupt. Similarly for 'treclaim.' and 'trechkpt.' instructions with bit 31 = 0, i.e. 0x7c00075c and 0x7c0007dc, respectively. Hence, if a code like the following is executed in the guest it will raise a softpatch interrupt just like a 'tresume.' when the TM facility is enabled ('tabort. 0' in the example is used only to enable the TM facility): int main() { asm("tabort. 0; .long 0x7cfe9ddc;"); } Currently in such a case KVM throws a complete trace like: [345523.705984] WARNING: CPU: 24 PID: 64413 at arch/powerpc/kvm/book3s_hv_tm.c:211 kvmhv_p9_tm_emulation+0x68/0x620 [kvm_hv] [345523.705985] Modules linked in: kvm_hv(E) xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bridge stp llc sch_fq_codel ipmi_powernv at24 vmx_crypto ipmi_devintf ipmi_msghandler ibmpowernv uio_pdrv_genirq kvm opal_prd uio leds_powernv ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx libcrc32c xor raid6_pq raid1 raid0 multipath linear tg3 crct10dif_vpmsum crc32c_vpmsum ipr [last unloaded: kvm_hv] [345523.706030] CPU: 24 PID: 64413 Comm: CPU 0/KVM Tainted: G W E 5.5.0+ #1 [345523.706031] NIP: c0080000072cb9c0 LR: c0080000072b5e80 CTR: c0080000085c7850 [345523.706034] REGS: c000000399467680 TRAP: 0700 Tainted: G W E (5.5.0+) [345523.706034] MSR: 900000010282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> CR: 24022428 XER: 00000000 [345523.706042] CFAR: c0080000072b5e7c IRQMASK: 0 GPR00: c0080000072b5e80 c000000399467910 c0080000072db500 c000000375ccc720 GPR04: c000000375ccc720 00000003fbec0000 0000a10395dda5a6 0000000000000000 GPR08: 000000007cfe9ddc 7cfe9ddc000005dc 7cfe9ddc7c0005dc c0080000072cd530 GPR12: c0080000085c7850 c0000003fffeb800 0000000000000001 00007dfb737f0000 GPR16: c0002001edcca558 0000000000000000 0000000000000000 0000000000000001 GPR20: c000000001b21258 c0002001edcca558 0000000000000018 0000000000000000 GPR24: 0000000001000000 ffffffffffffffff 0000000000000001 0000000000001500 GPR28: c0002001edcc4278 c00000037dd80000 800000050280f033 c000000375ccc720 [345523.706062] NIP [c0080000072cb9c0] kvmhv_p9_tm_emulation+0x68/0x620 [kvm_hv] [345523.706065] LR [c0080000072b5e80] kvmppc_handle_exit_hv.isra.53+0x3e8/0x798 [kvm_hv] [345523.706066] Call Trace: [345523.706069] [c000000399467910] [c000000399467940] 0xc000000399467940 (unreliable) [345523.706071] [c000000399467950] [c000000399467980] 0xc000000399467980 [345523.706075] [c0000003994679f0] [c0080000072bd1c4] kvmhv_run_single_vcpu+0xa1c/0xb80 [kvm_hv] [345523.706079] [c000000399467ac0] [c0080000072bd8e0] kvmppc_vcpu_run_hv+0x5b8/0xb00 [kvm_hv] [345523.706087] [c000000399467b90] [c0080000085c93cc] kvmppc_vcpu_run+0x34/0x48 [kvm] [345523.706095] [c000000399467bb0] [c0080000085c582c] kvm_arch_vcpu_ioctl_run+0x244/0x420 [kvm] [345523.706101] [c000000399467c40] [c0080000085b7498] kvm_vcpu_ioctl+0x3d0/0x7b0 [kvm] [345523.706105] [c000000399467db0] [c0000000004adf9c] ksys_ioctl+0x13c/0x170 [345523.706107] [c000000399467e00] [c0000000004adff8] sys_ioctl+0x28/0x80 [345523.706111] [c000000399467e20] [c00000000000b278] system_call+0x5c/0x68 [345523.706112] Instruction dump: [345523.706114] 419e0390 7f8a4840 409d0048 6d497c00 2f89075d 419e021c 6d497c00 2f8907dd [345523.706119] 419e01c0 6d497c00 2f8905dd 419e00a4 <0fe00000> 38210040 38600000 ebc1fff0 and then treats the executed instruction as a 'nop'. However the POWER9 User's Manual, in section "4.6.10 Book II Invalid Forms", informs that for TM instructions bit 31 is in fact ignored, thus for the TM-related invalid forms ignoring bit 31 and handling them like the valid forms is an acceptable way to handle them. POWER8 behaves the same way too. This commit changes the handling of the cases here described by treating the TM-related invalid forms that can generate a softpatch interrupt just like their valid forms (w/ bit 31 = 1) instead of as a 'nop' and by gently reporting any other unrecognized case to the host and treating it as illegal instruction instead of throwing a trace and treating it as a 'nop'. Signed-off-by: Gustavo Romero <gromero@linux.ibm.com> Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org> Acked-By: Michael Neuling <mikey@neuling.org> Reviewed-by: Leonardo Bras <leonardo@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-01powerpc/eeh: Only dump stack once if an MMIO loop is detectedOliver O'Halloran
[ Upstream commit 4e0942c0302b5ad76b228b1a7b8c09f658a1d58a ] Many drivers don't check for errors when they get a 0xFFs response from an MMIO load. As a result after an EEH event occurs a driver can get stuck in a polling loop unless it some kind of internal timeout logic. Currently EEH tries to detect and report stuck drivers by dumping a stack trace after eeh_dev_check_failure() is called EEH_MAX_FAILS times on an already frozen PE. The value of EEH_MAX_FAILS was chosen so that a dump would occur every few seconds if the driver was spinning in a loop. This results in a lot of spurious stack traces in the kernel log. Fix this by limiting it to printing one stack trace for each PE freeze. If the driver is truely stuck the kernel's hung task detector is better suited to reporting the probelm anyway. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com> Tested-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191016012536.22588-1-oohall@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-09-23powerpc/dma: Fix dma_map_ops::get_required_maskAlexey Kardashevskiy
commit 437ef802e0adc9f162a95213a3488e8646e5fc03 upstream. There are 2 problems with it: 1. "<" vs expected "<<" 2. the shift number is an IOMMU page number mask, not an address mask as the IOMMU page shift is missing. This did not hit us before f1565c24b596 ("powerpc: use the generic dma_ops_bypass mode") because we had additional code to handle bypass mask so this chunk (almost?) never executed.However there were reports that aacraid does not work with "iommu=nobypass". After f1565c24b596, aacraid (and probably others which call dma_get_required_mask() before setting the mask) was unable to enable 64bit DMA and fall back to using IOMMU which was known not to work, one of the problems is double free of an IOMMU page. This fixes DMA for aacraid, both with and without "iommu=nobypass" in the kernel command line. Verified with "stress-ng -d 4". Fixes: 6a5c7be5e484 ("powerpc: Override dma_get_required_mask by platform hook and ops") Cc: stable@vger.kernel.org # v3.2+ Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200908015106.79661-1-aik@ozlabs.ru Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-17vgacon: remove software scrollback supportLinus Torvalds
commit 973c096f6a85e5b5f2a295126ba6928d9a6afd45 upstream. Yunhai Zhang recently fixed a VGA software scrollback bug in commit ebfdfeeae8c0 ("vgacon: Fix for missing check in scrollback handling"), but that then made people look more closely at some of this code, and there were more problems on the vgacon side, but also the fbcon software scrollback. We don't really have anybody who maintains this code - probably because nobody actually _uses_ it any more. Sure, people still use both VGA and the framebuffer consoles, but they are no longer the main user interfaces to the kernel, and haven't been for decades, so these kinds of extra features end up bitrotting and not really being used. So rather than try to maintain a likely unused set of code, I'll just aggressively remove it, and see if anybody even notices. Maybe there are people who haven't jumped on the whole GUI badnwagon yet, and think it's just a fad. And maybe those people use the scrollback code. If that turns out to be the case, we can resurrect this again, once we've found the sucker^Wmaintainer for it who actually uses it. Reported-by: NopNop Nop <nopitydays@gmail.com> Tested-by: Willy Tarreau <w@1wt.eu> Cc: 张云海 <zhangyunhai@nsfocus.com> Acked-by: Andy Lutomirski <luto@amacapital.net> Acked-by: Willy Tarreau <w@1wt.eu> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-03powerpc/perf: Fix soft lockups due to missed interrupt accountingAthira Rajeev
[ Upstream commit 17899eaf88d689529b866371344c8f269ba79b5f ] Performance monitor interrupt handler checks if any counter has overflown and calls record_and_restart() in core-book3s which invokes perf_event_overflow() to record the sample information. Apart from creating sample, perf_event_overflow() also does the interrupt and period checks via perf_event_account_interrupt(). Currently we record information only if the SIAR (Sampled Instruction Address Register) valid bit is set (using siar_valid() check) and hence the interrupt check. But it is possible that we do sampling for some events that are not generating valid SIAR, and hence there is no chance to disable the event if interrupts are more than max_samples_per_tick. This leads to soft lockup. Fix this by adding perf_event_account_interrupt() in the invalid SIAR code path for a sampling event. ie if SIAR is invalid, just do interrupt check and don't record the sample information. Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1596717992-7321-1-git-send-email-atrajeev@linux.vnet.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-09-03powerpc/spufs: add CONFIG_COREDUMP dependencyArnd Bergmann
[ Upstream commit b648a5132ca3237a0f1ce5d871fff342b0efcf8a ] The kernel test robot pointed out a slightly different error message after recent commit 5456ffdee666 ("powerpc/spufs: simplify spufs core dumping") to spufs for a configuration that never worked: powerpc64-linux-ld: arch/powerpc/platforms/cell/spufs/file.o: in function `.spufs_proxydma_info_dump': >> file.c:(.text+0x4c68): undefined reference to `.dump_emit' powerpc64-linux-ld: arch/powerpc/platforms/cell/spufs/file.o: in function `.spufs_dma_info_dump': file.c:(.text+0x4d70): undefined reference to `.dump_emit' powerpc64-linux-ld: arch/powerpc/platforms/cell/spufs/file.o: in function `.spufs_wbox_info_dump': file.c:(.text+0x4df4): undefined reference to `.dump_emit' Add a Kconfig dependency to prevent this from happening again. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200706132302.3885935-1-arnd@arndb.de Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-09-03powerpc/xive: Ignore kmemleak false positivesAlexey Kardashevskiy
[ Upstream commit f0993c839e95dd6c7f054a1015e693c87e33e4fb ] xive_native_provision_pages() allocates memory and passes the pointer to OPAL so kmemleak cannot find the pointer usage in the kernel memory and produces a false positive report (below) (even if the kernel did scan OPAL memory, it is unable to deal with __pa() addresses anyway). This silences the warning. unreferenced object 0xc000200350c40000 (size 65536): comm "qemu-system-ppc", pid 2725, jiffies 4294946414 (age 70776.530s) hex dump (first 32 bytes): 02 00 00 00 50 00 00 00 00 00 00 00 00 00 00 00 ....P........... 01 00 08 07 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<0000000081ff046c>] xive_native_alloc_vp_block+0x120/0x250 [<00000000d555d524>] kvmppc_xive_compute_vp_id+0x248/0x350 [kvm] [<00000000d69b9c9f>] kvmppc_xive_connect_vcpu+0xc0/0x520 [kvm] [<000000006acbc81c>] kvm_arch_vcpu_ioctl+0x308/0x580 [kvm] [<0000000089c69580>] kvm_vcpu_ioctl+0x19c/0xae0 [kvm] [<00000000902ae91e>] ksys_ioctl+0x184/0x1b0 [<00000000f3e68bd7>] sys_ioctl+0x48/0xb0 [<0000000001b2c127>] system_call_exception+0x124/0x1f0 [<00000000d2b2ee40>] system_call_common+0xe8/0x214 Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200612043303.84894-1-aik@ozlabs.ru Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-09-03powerpc/64s: Don't init FSCR_DSCR in __init_FSCR()Michael Ellerman
commit 0828137e8f16721842468e33df0460044a0c588b upstream. __init_FSCR() was added originally in commit 2468dcf641e4 ("powerpc: Add support for context switching the TAR register") (Feb 2013), and only set FSCR_TAR. At that point FSCR (Facility Status and Control Register) was not context switched, so the setting was permanent after boot. Later we added initialisation of FSCR_DSCR to __init_FSCR(), in commit 54c9b2253d34 ("powerpc: Set DSCR bit in FSCR setup") (Mar 2013), again that was permanent after boot. Then commit 2517617e0de6 ("powerpc: Fix context switch DSCR on POWER8") (Aug 2013) added a limited context switch of FSCR, just the FSCR_DSCR bit was context switched based on thread.dscr_inherit. That commit said "This clears the H/FSCR DSCR bit initially", but it didn't, it left the initialisation of FSCR_DSCR in __init_FSCR(). However the initial context switch from init_task to pid 1 would clear FSCR_DSCR because thread.dscr_inherit was 0. That commit also introduced the requirement that FSCR_DSCR be clear for user processes, so that we can take the facility unavailable interrupt in order to manage dscr_inherit. Then in commit 152d523e6307 ("powerpc: Create context switch helpers save_sprs() and restore_sprs()") (Dec 2015) FSCR was added to thread_struct. However it still wasn't fully context switched, we just took the existing value and set FSCR_DSCR if the new thread had dscr_inherit set. FSCR was still initialised at boot to FSCR_DSCR | FSCR_TAR, but that value was not propagated into the thread_struct, so the initial context switch set FSCR_DSCR back to 0. Finally commit b57bd2de8c6c ("powerpc: Improve FSCR init and context switching") (Jun 2016) added a full context switch of the FSCR, and added an initialisation of init_task.thread.fscr to FSCR_TAR | FSCR_EBB, but omitted FSCR_DSCR. The end result is that swapper runs with FSCR_DSCR set because of the initialisation in __init_FSCR(), but no other processes do, they use the value from init_task.thread.fscr. Having FSCR_DSCR set for swapper allows it to access SPR 3 from userspace, but swapper never runs userspace, so it has no useful effect. It's also confusing to have the value initialised in two places to two different values. So remove FSCR_DSCR from __init_FSCR(), this at least gets us to the point where there's a single value of FSCR, even if it's still set in two places. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Tested-by: Alistair Popple <alistair@popple.id.au> Link: https://lore.kernel.org/r/20200527145843.2761782-1-mpe@ellerman.id.au Cc: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-26KVM: Pass MMU notifier range flags to kvm_unmap_hva_range()Will Deacon
commit fdfe7cbd58806522e799e2a50a15aee7f2cbb7b6 upstream. The 'flags' field of 'struct mmu_notifier_range' is used to indicate whether invalidate_range_{start,end}() are permitted to block. In the case of kvm_mmu_notifier_invalidate_range_start(), this field is not forwarded on to the architecture-specific implementation of kvm_unmap_hva_range() and therefore the backend cannot sensibly decide whether or not to block. Add an extra 'flags' parameter to kvm_unmap_hva_range() so that architectures are aware as to whether or not they are permitted to block. Cc: <stable@vger.kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: James Morse <james.morse@arm.com> Signed-off-by: Will Deacon <will@kernel.org> Message-Id: <20200811102725.7121-2-will@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [will: Backport to 4.19; use 'blockable' instead of non-existent range flags] Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>