aboutsummaryrefslogtreecommitdiffstats
path: root/arch/powerpc
AgeCommit message (Collapse)Author
2020-09-29Merge tag 'v5.7.19' into v5.7/standard/basev5.7/standard/baseBruce Ashfield
This is the 5.7.19 stable release # gpg: Signature made Thu 27 Aug 2020 03:31:12 AM EDT # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2020-08-27powerpc/64s: Don't init FSCR_DSCR in __init_FSCR()Michael Ellerman
commit 0828137e8f16721842468e33df0460044a0c588b upstream. __init_FSCR() was added originally in commit 2468dcf641e4 ("powerpc: Add support for context switching the TAR register") (Feb 2013), and only set FSCR_TAR. At that point FSCR (Facility Status and Control Register) was not context switched, so the setting was permanent after boot. Later we added initialisation of FSCR_DSCR to __init_FSCR(), in commit 54c9b2253d34 ("powerpc: Set DSCR bit in FSCR setup") (Mar 2013), again that was permanent after boot. Then commit 2517617e0de6 ("powerpc: Fix context switch DSCR on POWER8") (Aug 2013) added a limited context switch of FSCR, just the FSCR_DSCR bit was context switched based on thread.dscr_inherit. That commit said "This clears the H/FSCR DSCR bit initially", but it didn't, it left the initialisation of FSCR_DSCR in __init_FSCR(). However the initial context switch from init_task to pid 1 would clear FSCR_DSCR because thread.dscr_inherit was 0. That commit also introduced the requirement that FSCR_DSCR be clear for user processes, so that we can take the facility unavailable interrupt in order to manage dscr_inherit. Then in commit 152d523e6307 ("powerpc: Create context switch helpers save_sprs() and restore_sprs()") (Dec 2015) FSCR was added to thread_struct. However it still wasn't fully context switched, we just took the existing value and set FSCR_DSCR if the new thread had dscr_inherit set. FSCR was still initialised at boot to FSCR_DSCR | FSCR_TAR, but that value was not propagated into the thread_struct, so the initial context switch set FSCR_DSCR back to 0. Finally commit b57bd2de8c6c ("powerpc: Improve FSCR init and context switching") (Jun 2016) added a full context switch of the FSCR, and added an initialisation of init_task.thread.fscr to FSCR_TAR | FSCR_EBB, but omitted FSCR_DSCR. The end result is that swapper runs with FSCR_DSCR set because of the initialisation in __init_FSCR(), but no other processes do, they use the value from init_task.thread.fscr. Having FSCR_DSCR set for swapper allows it to access SPR 3 from userspace, but swapper never runs userspace, so it has no useful effect. It's also confusing to have the value initialised in two places to two different values. So remove FSCR_DSCR from __init_FSCR(), this at least gets us to the point where there's a single value of FSCR, even if it's still set in two places. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Tested-by: Alistair Popple <alistair@popple.id.au> Link: https://lore.kernel.org/r/20200527145843.2761782-1-mpe@ellerman.id.au Cc: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-26Merge branch 'v5.7/base' into v5.7/standard/baseBruce Ashfield
2020-08-26Merge tag 'v5.7.17' into v5.7/standard/baseBruce Ashfield
This is the 5.7.17 stable release # gpg: Signature made Fri 21 Aug 2020 07:07:57 AM EDT # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2020-08-26KVM: Pass MMU notifier range flags to kvm_unmap_hva_range()Will Deacon
commit fdfe7cbd58806522e799e2a50a15aee7f2cbb7b6 upstream. The 'flags' field of 'struct mmu_notifier_range' is used to indicate whether invalidate_range_{start,end}() are permitted to block. In the case of kvm_mmu_notifier_invalidate_range_start(), this field is not forwarded on to the architecture-specific implementation of kvm_unmap_hva_range() and therefore the backend cannot sensibly decide whether or not to block. Add an extra 'flags' parameter to kvm_unmap_hva_range() so that architectures are aware as to whether or not they are permitted to block. Cc: <stable@vger.kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: James Morse <james.morse@arm.com> Signed-off-by: Will Deacon <will@kernel.org> Message-Id: <20200811102725.7121-2-will@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-26powerpc/pseries: Do not initiate shutdown when system is running on UPSVasant Hegde
commit 90a9b102eddf6a3f987d15f4454e26a2532c1c98 upstream. As per PAPR we have to look for both EPOW sensor value and event modifier to identify the type of event and take appropriate action. In LoPAPR v1.1 section 10.2.2 includes table 136 "EPOW Action Codes": SYSTEM_SHUTDOWN 3 The system must be shut down. An EPOW-aware OS logs the EPOW error log information, then schedules the system to be shut down to begin after an OS defined delay internal (default is 10 minutes.) Then in section 10.3.2.2.8 there is table 146 "Platform Event Log Format, Version 6, EPOW Section", which includes the "EPOW Event Modifier": For EPOW sensor value = 3 0x01 = Normal system shutdown with no additional delay 0x02 = Loss of utility power, system is running on UPS/Battery 0x03 = Loss of system critical functions, system should be shutdown 0x04 = Ambient temperature too high All other values = reserved We have a user space tool (rtas_errd) on LPAR to monitor for EPOW_SHUTDOWN_ON_UPS. Once it gets an event it initiates shutdown after predefined time. It also starts monitoring for any new EPOW events. If it receives "Power restored" event before predefined time it will cancel the shutdown. Otherwise after predefined time it will shutdown the system. Commit 79872e35469b ("powerpc/pseries: All events of EPOW_SYSTEM_SHUTDOWN must initiate shutdown") changed our handling of the "on UPS/Battery" case, to immediately shutdown the system. This breaks existing setups that rely on the userspace tool to delay shutdown and let the system run on the UPS. Fixes: 79872e35469b ("powerpc/pseries: All events of EPOW_SYSTEM_SHUTDOWN must initiate shutdown") Cc: stable@vger.kernel.org # v4.0+ Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> [mpe: Massage change log and add PAPR references] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200820061844.306460-1-hegdevasant@linux.vnet.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-26powerpc/pseries/hotplug-cpu: wait indefinitely for vCPU deathMichael Roth
[ Upstream commit 801980f6497946048709b9b09771a1729551d705 ] For a power9 KVM guest with XIVE enabled, running a test loop where we hotplug 384 vcpus and then unplug them, the following traces can be seen (generally within a few loops) either from the unplugged vcpu: cpu 65 (hwid 65) Ready to die... Querying DEAD? cpu 66 (66) shows 2 list_del corruption. next->prev should be c00a000002470208, but was c00a000002470048 ------------[ cut here ]------------ kernel BUG at lib/list_debug.c:56! Oops: Exception in kernel mode, sig: 5 [#1] LE SMP NR_CPUS=2048 NUMA pSeries Modules linked in: fuse nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 ... CPU: 66 PID: 0 Comm: swapper/66 Kdump: loaded Not tainted 4.18.0-221.el8.ppc64le #1 NIP: c0000000007ab50c LR: c0000000007ab508 CTR: 00000000000003ac REGS: c0000009e5a17840 TRAP: 0700 Not tainted (4.18.0-221.el8.ppc64le) MSR: 800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 28000842 XER: 20040000 ... NIP __list_del_entry_valid+0xac/0x100 LR __list_del_entry_valid+0xa8/0x100 Call Trace: __list_del_entry_valid+0xa8/0x100 (unreliable) free_pcppages_bulk+0x1f8/0x940 free_unref_page+0xd0/0x100 xive_spapr_cleanup_queue+0x148/0x1b0 xive_teardown_cpu+0x1bc/0x240 pseries_mach_cpu_die+0x78/0x2f0 cpu_die+0x48/0x70 arch_cpu_idle_dead+0x20/0x40 do_idle+0x2f4/0x4c0 cpu_startup_entry+0x38/0x40 start_secondary+0x7bc/0x8f0 start_secondary_prolog+0x10/0x14 or on the worker thread handling the unplug: pseries-hotplug-cpu: Attempting to remove CPU <NULL>, drc index: 1000013a Querying DEAD? cpu 314 (314) shows 2 BUG: Bad page state in process kworker/u768:3 pfn:95de1 cpu 314 (hwid 314) Ready to die... page:c00a000002577840 refcount:0 mapcount:-128 mapping:0000000000000000 index:0x0 flags: 0x5ffffc00000000() raw: 005ffffc00000000 5deadbeef0000100 5deadbeef0000200 0000000000000000 raw: 0000000000000000 0000000000000000 00000000ffffff7f 0000000000000000 page dumped because: nonzero mapcount Modules linked in: kvm xt_CHECKSUM ipt_MASQUERADE xt_conntrack ... CPU: 0 PID: 548 Comm: kworker/u768:3 Kdump: loaded Not tainted 4.18.0-224.el8.bz1856588.ppc64le #1 Workqueue: pseries hotplug workque pseries_hp_work_fn Call Trace: dump_stack+0xb0/0xf4 (unreliable) bad_page+0x12c/0x1b0 free_pcppages_bulk+0x5bc/0x940 page_alloc_cpu_dead+0x118/0x120 cpuhp_invoke_callback.constprop.5+0xb8/0x760 _cpu_down+0x188/0x340 cpu_down+0x5c/0xa0 cpu_subsys_offline+0x24/0x40 device_offline+0xf0/0x130 dlpar_offline_cpu+0x1c4/0x2a0 dlpar_cpu_remove+0xb8/0x190 dlpar_cpu_remove_by_index+0x12c/0x150 dlpar_cpu+0x94/0x800 pseries_hp_work_fn+0x128/0x1e0 process_one_work+0x304/0x5d0 worker_thread+0xcc/0x7a0 kthread+0x1ac/0x1c0 ret_from_kernel_thread+0x5c/0x80 The latter trace is due to the following sequence: page_alloc_cpu_dead drain_pages drain_pages_zone free_pcppages_bulk where drain_pages() in this case is called under the assumption that the unplugged cpu is no longer executing. To ensure that is the case, and early call is made to __cpu_die()->pseries_cpu_die(), which runs a loop that waits for the cpu to reach a halted state by polling its status via query-cpu-stopped-state RTAS calls. It only polls for 25 iterations before giving up, however, and in the trace above this results in the following being printed only .1 seconds after the hotplug worker thread begins processing the unplug request: pseries-hotplug-cpu: Attempting to remove CPU <NULL>, drc index: 1000013a Querying DEAD? cpu 314 (314) shows 2 At that point the worker thread assumes the unplugged CPU is in some unknown/dead state and procedes with the cleanup, causing the race with the XIVE cleanup code executed by the unplugged CPU. Fix this by waiting indefinitely, but also making an effort to avoid spurious lockup messages by allowing for rescheduling after polling the CPU status and printing a warning if we wait for longer than 120s. Fixes: eac1e731b59ee ("powerpc/xive: guest exploitation of the XIVE interrupt controller") Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Tested-by: Greg Kurz <groug@kaod.org> Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Reviewed-by: Greg Kurz <groug@kaod.org> [mpe: Trim oopses in change log slightly for readability] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200811161544.10513-1-mdroth@linux.vnet.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-26powerpc/fixmap: Fix the size of the early debug areaChristophe Leroy
[ Upstream commit fdc6edbb31fba76fd25d7bd016b675a92908d81e ] Commit ("03fd42d458fb powerpc/fixmap: Fix FIX_EARLY_DEBUG_BASE when page size is 256k") reworked the setup of the early debug area and mistakenly replaced 128 * 1024 by SZ_128. Change to SZ_128K to restore the original 128 kbytes size of the area. Fixes: 03fd42d458fb ("powerpc/fixmap: Fix FIX_EARLY_DEBUG_BASE when page size is 256k") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/996184974d674ff984643778cf1cdd7fe58cc065.1597644194.git.christophe.leroy@csgroup.eu Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-21pseries: Fix 64 bit logical memory block panicAnton Blanchard
commit 89c140bbaeee7a55ed0360a88f294ead2b95201b upstream. Booting with a 4GB LMB size causes us to panic: qemu-system-ppc64: OS terminated: OS panic: Memory block size not suitable: 0x0 Fix pseries_memory_block_size() to handle 64 bit LMBs. Cc: stable@vger.kernel.org Signed-off-by: Anton Blanchard <anton@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200715000820.1255764-1-anton@ozlabs.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-21powerpc: Fix circular dependency between percpu.h and mmu.hMichael Ellerman
commit 0c83b277ada72b585e6a3e52b067669df15bcedb upstream. Recently random.h started including percpu.h (see commit f227e3ec3b5c ("random32: update the net random state on interrupt and activity")), which broke corenet64_smp_defconfig: In file included from /linux/arch/powerpc/include/asm/paca.h:18, from /linux/arch/powerpc/include/asm/percpu.h:13, from /linux/include/linux/random.h:14, from /linux/lib/uuid.c:14: /linux/arch/powerpc/include/asm/mmu.h:139:22: error: unknown type name 'next_tlbcam_idx' 139 | DECLARE_PER_CPU(int, next_tlbcam_idx); This is due to a circular header dependency: asm/mmu.h includes asm/percpu.h, which includes asm/paca.h, which includes asm/mmu.h Which means DECLARE_PER_CPU() isn't defined when mmu.h needs it. We can fix it by moving the include of paca.h below the include of asm-generic/percpu.h. This moves the include of paca.h out of the #ifdef __powerpc64__, but that is OK because paca.h is almost entirely inside #ifdef CONFIG_PPC64 anyway. It also moves the include of paca.h out of the #ifdef CONFIG_SMP, which could possibly break something, but seems to have no ill effects. Fixes: f227e3ec3b5c ("random32: update the net random state on interrupt and activity") Cc: stable@vger.kernel.org # v5.8 Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200804130558.292328-1-mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-21powerpc: Allow 4224 bytes of stack expansion for the signal frameMichael Ellerman
commit 63dee5df43a31f3844efabc58972f0a206ca4534 upstream. We have powerpc specific logic in our page fault handling to decide if an access to an unmapped address below the stack pointer should expand the stack VMA. The code was originally added in 2004 "ported from 2.4". The rough logic is that the stack is allowed to grow to 1MB with no extra checking. Over 1MB the access must be within 2048 bytes of the stack pointer, or be from a user instruction that updates the stack pointer. The 2048 byte allowance below the stack pointer is there to cover the 288 byte "red zone" as well as the "about 1.5kB" needed by the signal delivery code. Unfortunately since then the signal frame has expanded, and is now 4224 bytes on 64-bit kernels with transactional memory enabled. This means if a process has consumed more than 1MB of stack, and its stack pointer lies less than 4224 bytes from the next page boundary, signal delivery will fault when trying to expand the stack and the process will see a SEGV. The total size of the signal frame is the size of struct rt_sigframe (which includes the red zone) plus __SIGNAL_FRAMESIZE (128 bytes on 64-bit). The 2048 byte allowance was correct until 2008 as the signal frame was: struct rt_sigframe { struct ucontext uc; /* 0 1440 */ /* --- cacheline 11 boundary (1408 bytes) was 32 bytes ago --- */ long unsigned int _unused[2]; /* 1440 16 */ unsigned int tramp[6]; /* 1456 24 */ struct siginfo * pinfo; /* 1480 8 */ void * puc; /* 1488 8 */ struct siginfo info; /* 1496 128 */ /* --- cacheline 12 boundary (1536 bytes) was 88 bytes ago --- */ char abigap[288]; /* 1624 288 */ /* size: 1920, cachelines: 15, members: 7 */ /* padding: 8 */ }; 1920 + 128 = 2048 Then in commit ce48b2100785 ("powerpc: Add VSX context save/restore, ptrace and signal support") (Jul 2008) the signal frame expanded to 2304 bytes: struct rt_sigframe { struct ucontext uc; /* 0 1696 */ <-- /* --- cacheline 13 boundary (1664 bytes) was 32 bytes ago --- */ long unsigned int _unused[2]; /* 1696 16 */ unsigned int tramp[6]; /* 1712 24 */ struct siginfo * pinfo; /* 1736 8 */ void * puc; /* 1744 8 */ struct siginfo info; /* 1752 128 */ /* --- cacheline 14 boundary (1792 bytes) was 88 bytes ago --- */ char abigap[288]; /* 1880 288 */ /* size: 2176, cachelines: 17, members: 7 */ /* padding: 8 */ }; 2176 + 128 = 2304 At this point we should have been exposed to the bug, though as far as I know it was never reported. I no longer have a system old enough to easily test on. Then in 2010 commit 320b2b8de126 ("mm: keep a guard page below a grow-down stack segment") caused our stack expansion code to never trigger, as there was always a VMA found for a write up to PAGE_SIZE below r1. That meant the bug was hidden as we continued to expand the signal frame in commit 2b0a576d15e0 ("powerpc: Add new transactional memory state to the signal context") (Feb 2013): struct rt_sigframe { struct ucontext uc; /* 0 1696 */ /* --- cacheline 13 boundary (1664 bytes) was 32 bytes ago --- */ struct ucontext uc_transact; /* 1696 1696 */ <-- /* --- cacheline 26 boundary (3328 bytes) was 64 bytes ago --- */ long unsigned int _unused[2]; /* 3392 16 */ unsigned int tramp[6]; /* 3408 24 */ struct siginfo * pinfo; /* 3432 8 */ void * puc; /* 3440 8 */ struct siginfo info; /* 3448 128 */ /* --- cacheline 27 boundary (3456 bytes) was 120 bytes ago --- */ char abigap[288]; /* 3576 288 */ /* size: 3872, cachelines: 31, members: 8 */ /* padding: 8 */ /* last cacheline: 32 bytes */ }; 3872 + 128 = 4000 And commit 573ebfa6601f ("powerpc: Increase stack redzone for 64-bit userspace to 512 bytes") (Feb 2014): struct rt_sigframe { struct ucontext uc; /* 0 1696 */ /* --- cacheline 13 boundary (1664 bytes) was 32 bytes ago --- */ struct ucontext uc_transact; /* 1696 1696 */ /* --- cacheline 26 boundary (3328 bytes) was 64 bytes ago --- */ long unsigned int _unused[2]; /* 3392 16 */ unsigned int tramp[6]; /* 3408 24 */ struct siginfo * pinfo; /* 3432 8 */ void * puc; /* 3440 8 */ struct siginfo info; /* 3448 128 */ /* --- cacheline 27 boundary (3456 bytes) was 120 bytes ago --- */ char abigap[512]; /* 3576 512 */ <-- /* size: 4096, cachelines: 32, members: 8 */ /* padding: 8 */ }; 4096 + 128 = 4224 Then finally in 2017, commit 1be7107fbe18 ("mm: larger stack guard gap, between vmas") exposed us to the existing bug, because it changed the stack VMA to be the correct/real size, meaning our stack expansion code is now triggered. Fix it by increasing the allowance to 4224 bytes. Hard-coding 4224 is obviously unsafe against future expansions of the signal frame in the same way as the existing code. We can't easily use sizeof() because the signal frame structure is not in a header. We will either fix that, or rip out all the custom stack expansion checking logic entirely. Fixes: ce48b2100785 ("powerpc: Add VSX context save/restore, ptrace and signal support") Cc: stable@vger.kernel.org # v2.6.27+ Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Tested-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200724092528.1578671-2-mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-21powerpc/ptdump: Fix build failure in hashpagetable.cChristophe Leroy
commit 7c466b0807960edc13e4b855be85ea765df9a6cd upstream. H_SUCCESS is only defined when CONFIG_PPC_PSERIES is defined. != H_SUCCESS means != 0. Modify the test accordingly. Fixes: 65e701b2d2a8 ("powerpc/ptdump: drop non vital #ifdefs") Cc: stable@vger.kernel.org Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/795158fc1d2b3dff3bf7347881947a887ea9391a.1592227105.git.christophe.leroy@csgroup.eu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-19Merge tag 'v5.7.16' into v5.7/standard/baseBruce Ashfield
This is the 5.7.16 stable release # gpg: Signature made Wed 19 Aug 2020 02:25:33 AM EDT # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2020-08-19Merge tag 'v5.7.15' into v5.7/standard/baseBruce Ashfield
This is the 5.7.15 stable release # gpg: Signature made Tue 11 Aug 2020 09:35:51 AM EDT # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2020-08-19powerpc/pseries/hotplug-cpu: Remove double free in error pathNathan Lynch
[ Upstream commit a0ff72f9f5a780341e7ff5e9ba50a0dad5fa1980 ] In the unlikely event that the device tree lacks a /cpus node, find_dlpar_cpus_to_add() oddly frees the cpu_drcs buffer it has been passed before returning an error. Its only caller also frees the buffer on error. Remove the less conventional kfree() of a caller-supplied buffer from find_dlpar_cpus_to_add(). Fixes: 90edf184b9b7 ("powerpc/pseries: Add CPU dlpar add functionality") Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190919231633.1344-1-nathanl@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-19powerpc/boot: Fix CONFIG_PPC_MPC52XX referencesMichael Ellerman
[ Upstream commit e5eff89657e72a9050d95fde146b54c7dc165981 ] Commit 866bfc75f40e ("powerpc: conditionally compile platform-specific serial drivers") made some code depend on CONFIG_PPC_MPC52XX, which doesn't exist. Fix it to use CONFIG_PPC_MPC52xx. Fixes: 866bfc75f40e ("powerpc: conditionally compile platform-specific serial drivers") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200724131728.1643966-7-mpe@ellerman.id.au Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-19powerpc/32s: Fix CONFIG_BOOK3S_601 usesMichael Ellerman
[ Upstream commit df4d4ef22446b3a789a4efd74d34f2ec1e24deb2 ] We have two uses of CONFIG_BOOK3S_601, which doesn't exist. Fix them to use CONFIG_PPC_BOOK3S_601 which is the correct symbol. Fixes: 12c3f1fd87bf ("powerpc/32s: get rid of CPU_FTR_601 feature") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200724131728.1643966-5-mpe@ellerman.id.au Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-19powerpc/perf: Fix missing is_sier_aviable() during buildMadhavan Srinivasan
[ Upstream commit 3c9450c053f88e525b2db1e6990cdf34d14e7696 ] Compilation error: arch/powerpc/perf/perf_regs.c:80:undefined reference to `.is_sier_available' Currently is_sier_available() is part of core-book3s.c, which is added to build based on CONFIG_PPC_PERF_CTRS. A config with CONFIG_PERF_EVENTS and without CONFIG_PPC_PERF_CTRS will have a build break because of missing is_sier_available(). In practice it only breaks when CONFIG_FSL_EMB_PERF_EVENT=n because that also guards the usage of is_sier_available(). That only happens with CONFIG_PPC_BOOK3E_64=y and CONFIG_FSL_SOC_BOOKE=n. Patch adds is_sier_available() in asm/perf_event.h to fix the build break for configs missing CONFIG_PPC_PERF_CTRS. Fixes: 333804dc3b7a ("powerpc/perf: Update perf_regs structure to include SIER") Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> [mpe: Add detail about CONFIG_FSL_SOC_BOOKE] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200614083604.302611-1-maddy@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-19powerpc/book3s64/pkeys: Use PVR check instead of cpu featureAneesh Kumar K.V
[ Upstream commit d79e7a5f26f1d179cbb915a8bf2469b6d7431c29 ] We are wrongly using CPU_FTRS_POWER8 to check for P8 support. Instead, we should use PVR value. Now considering we are using CPU_FTRS_POWER8, that implies we returned true for P9 with older firmware. Keep the same behavior by checking for P9 PVR value. Fixes: cf43d3b26452 ("powerpc: Enable pkey subsystem") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200709032946.881753-2-aneesh.kumar@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-19powerpc/vdso: Fix vdso cpu truncationMilton Miller
[ Upstream commit a9f675f950a07d5c1dbcbb97aabac56f5ed085e3 ] The code in vdso_cpu_init that exposes the cpu and numa node to userspace via SPRG_VDSO incorrctly masks the cpu to 12 bits. This means that any kernel running on a box with more than 4096 threads (NR_CPUS advertises a limit of of 8192 cpus) would expose userspace to two cpu contexts running at the same time with the same cpu number. Note: I'm not aware of any distro shipping a kernel with support for more than 4096 threads today, nor of any system image that currently exceeds 4096 threads. Found via code browsing. Fixes: 18ad51dd342a7eb09dbcd059d0b451b616d4dafc ("powerpc: Add VDSO version of getcpu") Signed-off-by: Milton Miller <miltonm@us.ibm.com> Signed-off-by: Anton Blanchard <anton@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200715233704.1352257-1-anton@ozlabs.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-19powerpc/rtas: don't online CPUs for partition suspendNathan Lynch
[ Upstream commit ec2fc2a9e9bbad9023aab65bc472ce7a3ca8608f ] Partition suspension, used for hibernation and migration, requires that the OS place all but one of the LPAR's processor threads into one of two states prior to calling the ibm,suspend-me RTAS function: * the architected offline state (via RTAS stop-self); or * the H_JOIN hcall, which does not return until the partition resumes execution Using H_CEDE as the offline mode, introduced by commit 3aa565f53c39 ("powerpc/pseries: Add hooks to put the CPU into an appropriate offline state"), means that any threads which are offline from Linux's point of view must be moved to one of those two states before a partition suspension can proceed. This was eventually addressed in commit 120496ac2d2d ("powerpc: Bring all threads online prior to migration/hibernation"), which added code to temporarily bring up any offline processor threads so they can call H_JOIN. Conceptually this is fine, but the implementation has had multiple races with cpu hotplug operations initiated from user space[1][2][3], the error handling is fragile, and it generates user-visible cpu hotplug events which is a lot of noise for a platform feature that's supposed to minimize disruption to workloads. With commit 3aa565f53c39 ("powerpc/pseries: Add hooks to put the CPU into an appropriate offline state") reverted, this code becomes unnecessary, so remove it. Since any offline CPUs now are truly offline from the platform's point of view, it is no longer necessary to bring up CPUs only to have them call H_JOIN and then go offline again upon resuming. Only active threads are required to call H_JOIN; stopped threads can be left alone. [1] commit a6717c01ddc2 ("powerpc/rtas: use device model APIs and serialization during LPM") [2] commit 9fb603050ffd ("powerpc/rtas: retry when cpu offline races with suspend/migration") [3] commit dfd718a2ed1f ("powerpc/rtas: Fix a potential race between CPU-Offline & Migration") Fixes: 120496ac2d2d ("powerpc: Bring all threads online prior to migration/hibernation") Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200612051238.1007764-3-nathanl@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-19powerpc/pseries: remove cede offline state for CPUsNathan Lynch
[ Upstream commit 48f6e7f6d948b56489da027bc3284c709b939d28 ] This effectively reverts commit 3aa565f53c39 ("powerpc/pseries: Add hooks to put the CPU into an appropriate offline state"), which added an offline mode for CPUs which uses the H_CEDE hcall instead of the architected stop-self RTAS function in order to facilitate "folding" of dedicated mode processors on PowerVM platforms to achieve energy savings. This has been the default offline mode since its introduction. There's nothing about stop-self that would prevent the hypervisor from achieving the energy savings available via H_CEDE, so the original premise of this change appears to be flawed. I also have encountered the claim that the transition to and from ceded state is much faster than stop-self/start-cpu. Certainly we would not want to use stop-self as an *idle* mode. That is what H_CEDE is for. However, this difference is insignificant in the context of Linux CPU hotplug, where the latency of an offline or online operation on current systems is on the order of 100ms, mainly attributable to all the various subsystems' cpuhp callbacks. The cede offline mode also prevents accurate accounting, as discussed before: https://lore.kernel.org/linuxppc-dev/1571740391-3251-1-git-send-email-ego@linux.vnet.ibm.com/ Unconditionally use stop-self to offline processor threads. This is the architected method for offlining CPUs on PAPR systems. The "cede_offline" boot parameter is rendered obsolete. Removing this code enables the removal of the partition suspend code which temporarily onlines all present CPUs. Fixes: 3aa565f53c39 ("powerpc/pseries: Add hooks to put the CPU into an appropriate offline state") Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200612051238.1007764-2-nathanl@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-19powerpc/fixmap: Fix FIX_EARLY_DEBUG_BASE when page size is 256kChristophe Leroy
[ Upstream commit 03fd42d458fb9cb69e712600bd69ff77ff3a45a8 ] FIX_EARLY_DEBUG_BASE reserves a 128k area for debuging. When page size is 256k, the calculation results in a 0 number of pages, leading to the following failure: CC arch/powerpc/kernel/asm-offsets.s In file included from ./arch/powerpc/include/asm/nohash/32/pgtable.h:77:0, from ./arch/powerpc/include/asm/nohash/pgtable.h:8, from ./arch/powerpc/include/asm/pgtable.h:20, from ./include/linux/pgtable.h:6, from ./arch/powerpc/include/asm/kup.h:42, from ./arch/powerpc/include/asm/uaccess.h:9, from ./include/linux/uaccess.h:11, from ./include/linux/crypto.h:21, from ./include/crypto/hash.h:11, from ./include/linux/uio.h:10, from ./include/linux/socket.h:8, from ./include/linux/compat.h:15, from arch/powerpc/kernel/asm-offsets.c:14: ./arch/powerpc/include/asm/fixmap.h:75:2: error: overflow in enumeration values __end_of_permanent_fixed_addresses, ^ make[2]: *** [arch/powerpc/kernel/asm-offsets.s] Error 1 Ensure the debug area is at least one page. Fixes: b8e8efaa8639 ("powerpc: reserve fixmap entries for early debug") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/ca8c9f8249f523b1fab873e67b81b11989d46553.1592207216.git.christophe.leroy@csgroup.eu Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-19powerpc/mm: Fix typo in IS_ENABLED()Joe Perches
[ Upstream commit 55bd9ac468397c4f12a33b7ec714b5d0362c3aa2 ] IS_ENABLED() matches names exactly, so the missing "CONFIG_" prefix means this code would never be built. Also fixes a missing newline in pr_warn(). Fixes: 970d54f99cea ("powerpc/book3s64/hash: Disable 16M linear mapping size if not aligned") Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/202006050717.A2F9809E@keescook Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-11Revert "powerpc/kasan: Fix shadow pages allocation failure"Christophe Leroy
commit b506923ee44ae87fc9f4de16b53feb313623e146 upstream. This reverts commit d2a91cef9bbdeb87b7449fdab1a6be6000930210. This commit moved too much work in kasan_init(). The allocation of shadow pages has to be moved for the reason explained in that patch, but the allocation of page tables still need to be done before switching to the final hash table. First revert the incorrect commit, following patch redoes it properly. Fixes: d2a91cef9bbd ("powerpc/kasan: Fix shadow pages allocation failure") Cc: stable@vger.kernel.org Reported-by: Erhard F. <erhard_f@mailbox.org> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://bugzilla.kernel.org/show_bug.cgi?id=208181 Link: https://lore.kernel.org/r/3667deb0911affbf999b99f87c31c77d5e870cd2.1593690707.git.christophe.leroy@csgroup.eu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-06Merge branch 'v5.7/base' into v5.7/standard/baseBruce Ashfield
2020-07-22powerpc/pseries/svm: Fix incorrect check for shared_lppaca_sizeSatheesh Rajendran
commit b710d27bf72068b15b2f0305d825988183e2ff28 upstream. Early secure guest boot hits the below crash while booting with vcpus numbers aligned with page boundary for PAGE size of 64k and LPPACA size of 1k i.e 64, 128 etc. Partition configured for 64 cpus. CPU maps initialized for 1 thread per core ------------[ cut here ]------------ kernel BUG at arch/powerpc/kernel/paca.c:89! Oops: Exception in kernel mode, sig: 5 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries This is due to the BUG_ON() for shared_lppaca_total_size equal to shared_lppaca_size. Instead the code should only BUG_ON() if we have exceeded the total_size, which indicates we've overflowed the array. Fixes: bd104e6db6f0 ("powerpc/pseries/svm: Use shared memory for LPPACA structures") Cc: stable@vger.kernel.org # v5.4+ Signed-off-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com> Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> [mpe: Reword change log to clarify we're fixing not removing the check] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200619070113.16696-1-sathnaga@linux.vnet.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-22powerpc/book3s64/pkeys: Fix pkey_access_permitted() for execute disable pkeyAneesh Kumar K.V
commit 192b6a780598976feb7321ff007754f8511a4129 upstream. Even if the IAMR value denies execute access, the current code returns true from pkey_access_permitted() for an execute permission check, if the AMR read pkey bit is cleared. This results in repeated page fault loop with a test like below: #define _GNU_SOURCE #include <errno.h> #include <stdio.h> #include <stdlib.h> #include <signal.h> #include <inttypes.h> #include <assert.h> #include <malloc.h> #include <unistd.h> #include <pthread.h> #include <sys/mman.h> #ifdef SYS_pkey_mprotect #undef SYS_pkey_mprotect #endif #ifdef SYS_pkey_alloc #undef SYS_pkey_alloc #endif #ifdef SYS_pkey_free #undef SYS_pkey_free #endif #undef PKEY_DISABLE_EXECUTE #define PKEY_DISABLE_EXECUTE 0x4 #define SYS_pkey_mprotect 386 #define SYS_pkey_alloc 384 #define SYS_pkey_free 385 #define PPC_INST_NOP 0x60000000 #define PPC_INST_BLR 0x4e800020 #define PROT_RWX (PROT_READ | PROT_WRITE | PROT_EXEC) static int sys_pkey_mprotect(void *addr, size_t len, int prot, int pkey) { return syscall(SYS_pkey_mprotect, addr, len, prot, pkey); } static int sys_pkey_alloc(unsigned long flags, unsigned long access_rights) { return syscall(SYS_pkey_alloc, flags, access_rights); } static int sys_pkey_free(int pkey) { return syscall(SYS_pkey_free, pkey); } static void do_execute(void *region) { /* jump to region */ asm volatile( "mtctr %0;" "bctrl" : : "r"(region) : "ctr", "lr"); } static void do_protect(void *region) { size_t pgsize; int i, pkey; pgsize = getpagesize(); pkey = sys_pkey_alloc(0, PKEY_DISABLE_EXECUTE); assert (pkey > 0); /* perform mprotect */ assert(!sys_pkey_mprotect(region, pgsize, PROT_RWX, pkey)); do_execute(region); /* free pkey */ assert(!sys_pkey_free(pkey)); } int main(int argc, char **argv) { size_t pgsize, numinsns; unsigned int *region; int i; /* allocate memory region to protect */ pgsize = getpagesize(); region = memalign(pgsize, pgsize); assert(region != NULL); assert(!mprotect(region, pgsize, PROT_RWX)); /* fill page with NOPs with a BLR at the end */ numinsns = pgsize / sizeof(region[0]); for (i = 0; i < numinsns - 1; i++) region[i] = PPC_INST_NOP; region[i] = PPC_INST_BLR; do_protect(region); return EXIT_SUCCESS; } The fix is to only check the IAMR for an execute check, the AMR value is not relevant. Fixes: f2407ef3ba22 ("powerpc: helper to validate key-access permissions of a pte") Cc: stable@vger.kernel.org # v4.16+ Reported-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [mpe: Add detail to change log, tweak wording & formatting] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200712132047.1038594-1-aneesh.kumar@linux.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-16powerpc/64s/exception: Fix 0x1500 interrupt handler crashNicholas Piggin
[ Upstream commit 4557ac6b344b8cdf948ff8b007e8e1de34832f2e ] A typo caused the interrupt handler to branch immediately to the common "unknown interrupt" handler and skip the special case test for denormal cause. This does not affect KVM softpatch handling (e.g., for POWER9 TM assist) because the KVM test was moved to common code by commit 9600f261acaa ("powerpc/64s/exception: Move KVM test to common code") just before this bug was introduced. Fixes: 3f7fbd97d07d ("powerpc/64s/exception: Clean up SRR specifiers") Reported-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Paul Menzel <pmenzel@molgen.mpg.de> [mpe: Split selftest into a separate patch] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200708074942.1713396-1-npiggin@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-07-16powerpc/kvm/book3s64: Fix kernel crash with nested kvm & DEBUG_VIRTUALAneesh Kumar K.V
[ Upstream commit c1ed1754f271f6b7acb1bfdc8cfb62220fbed423 ] With CONFIG_DEBUG_VIRTUAL=y, __pa() checks for addr value and if it's less than PAGE_OFFSET it leads to a BUG(). #define __pa(x) ({ VIRTUAL_BUG_ON((unsigned long)(x) < PAGE_OFFSET); (unsigned long)(x) & 0x0fffffffffffffffUL; }) kernel BUG at arch/powerpc/kvm/book3s_64_mmu_radix.c:43! cpu 0x70: Vector: 700 (Program Check) at [c0000018a2187360] pc: c000000000161b30: __kvmhv_copy_tofrom_guest_radix+0x130/0x1f0 lr: c000000000161d5c: kvmhv_copy_from_guest_radix+0x3c/0x80 ... kvmhv_copy_from_guest_radix+0x3c/0x80 kvmhv_load_from_eaddr+0x48/0xc0 kvmppc_ld+0x98/0x1e0 kvmppc_load_last_inst+0x50/0x90 kvmppc_hv_emulate_mmio+0x288/0x2b0 kvmppc_book3s_radix_page_fault+0xd8/0x2b0 kvmppc_book3s_hv_page_fault+0x37c/0x1050 kvmppc_vcpu_run_hv+0xbb8/0x1080 kvmppc_vcpu_run+0x34/0x50 kvm_arch_vcpu_ioctl_run+0x2fc/0x410 kvm_vcpu_ioctl+0x2b4/0x8f0 ksys_ioctl+0xf4/0x150 sys_ioctl+0x28/0x80 system_call_exception+0x104/0x1d0 system_call_common+0xe8/0x214 kvmhv_copy_tofrom_guest_radix() uses a NULL value for to/from to indicate direction of copy. Avoid calling __pa() if the value is NULL to avoid the BUG(). Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [mpe: Massage change log a bit to mention CONFIG_DEBUG_VIRTUAL] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200611120159.680284-1-aneesh.kumar@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-07-09powerpc/book3s64/kvm: Fix secondary page table walk warning during migrationAneesh Kumar K.V
[ Upstream commit bf8036a4098d1548cdccf9ed5c523ef4e83e3c68 ] This patch fixes the below warning reported during migration: find_kvm_secondary_pte called with kvm mmu_lock not held CPU: 23 PID: 5341 Comm: qemu-system-ppc Tainted: G W 5.7.0-rc5-kvm-00211-g9ccf10d6d088 #432 NIP: c008000000fe848c LR: c008000000fe8488 CTR: 0000000000000000 REGS: c000001e19f077e0 TRAP: 0700 Tainted: G W (5.7.0-rc5-kvm-00211-g9ccf10d6d088) MSR: 9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 42222422 XER: 20040000 CFAR: c00000000012f5ac IRQMASK: 0 GPR00: c008000000fe8488 c000001e19f07a70 c008000000ffe200 0000000000000039 GPR04: 0000000000000001 c000001ffc8b4900 0000000000018840 0000000000000007 GPR08: 0000000000000003 0000000000000001 0000000000000007 0000000000000001 GPR12: 0000000000002000 c000001fff6d9400 000000011f884678 00007fff70b70000 GPR16: 00007fff7137cb90 00007fff7dcb4410 0000000000000001 0000000000000000 GPR20: 000000000ffe0000 0000000000000000 0000000000000001 0000000000000000 GPR24: 8000000000000000 0000000000000001 c000001e1f67e600 c000001e1fd82410 GPR28: 0000000000001000 c000001e2e410000 0000000000000fff 0000000000000ffe NIP [c008000000fe848c] kvmppc_hv_get_dirty_log_radix+0x2e4/0x340 [kvm_hv] LR [c008000000fe8488] kvmppc_hv_get_dirty_log_radix+0x2e0/0x340 [kvm_hv] Call Trace: [c000001e19f07a70] [c008000000fe8488] kvmppc_hv_get_dirty_log_radix+0x2e0/0x340 [kvm_hv] (unreliable) [c000001e19f07b50] [c008000000fd42e4] kvm_vm_ioctl_get_dirty_log_hv+0x33c/0x3c0 [kvm_hv] [c000001e19f07be0] [c008000000eea878] kvm_vm_ioctl_get_dirty_log+0x30/0x50 [kvm] [c000001e19f07c00] [c008000000edc818] kvm_vm_ioctl+0x2b0/0xc00 [kvm] [c000001e19f07d50] [c00000000046e148] ksys_ioctl+0xf8/0x150 [c000001e19f07da0] [c00000000046e1c8] sys_ioctl+0x28/0x80 [c000001e19f07dc0] [c00000000003652c] system_call_exception+0x16c/0x240 [c000001e19f07e20] [c00000000000d070] system_call_common+0xf0/0x278 Instruction dump: 7d3a512a 4200ffd0 7ffefb78 4bfffdc4 60000000 3c820000 e8848468 3c620000 e86384a8 38840010 4800673d e8410018 <0fe00000> 4bfffdd4 60000000 60000000 Reported-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200528080456.87797-1-aneesh.kumar@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-07-09powerpc/kvm/book3s: Add helper to walk partition scoped linux page table.Aneesh Kumar K.V
[ Upstream commit 4b99412ed6972cc77c1f16009e1d00323fcef9ab ] The locking rules for walking partition scoped table is different from process scoped table. Hence add a helper for secondary linux page table walk and also add check whether we are holding the right locks. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200505071729.54912-10-aneesh.kumar@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-30powerpc/fsl_booke/32: Fix build with CONFIG_RANDOMIZE_BASEArseny Solokha
commit 7e4773f73dcfb92e7e33532162f722ec291e75a4 upstream. Building the current 5.8 kernel for an e500 machine with CONFIG_RANDOMIZE_BASE=y and CONFIG_BLOCK=n yields the following failure: arch/powerpc/mm/nohash/kaslr_booke.c: In function 'kaslr_early_init': arch/powerpc/mm/nohash/kaslr_booke.c:387:2: error: implicit declaration of function 'flush_icache_range'; did you mean 'flush_tlb_range'? Indeed, including asm/cacheflush.h into kaslr_booke.c fixes the build. Fixes: 2b0e86cc5de6 ("powerpc/fsl_booke/32: implement KASLR infrastructure") Cc: stable@vger.kernel.org # v5.5+ Signed-off-by: Arseny Solokha <asolokha@kb.kras.ru> Reviewed-by: Jason Yan <yanaijie@huawei.com> Acked-by: Scott Wood <oss@buserror.net> [mpe: Tweak change log to mention CONFIG_BLOCK=n] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200613162801.1946619-1-asolokha@kb.kras.ru Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-24powerpc/64s: Fix KVM interrupt using wrong save areaNicholas Piggin
commit 0bdcfa182506526fbe4e088ff9ca86a31b81828d upstream. The CTR register reload in the KVM interrupt path used the wrong save area for SLB (and NMI) interrupts. Fixes: 9600f261acaa ("powerpc/64s/exception: Move KVM test to common code") Cc: stable@vger.kernel.org # v5.7+ Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200615061247.1310763-1-npiggin@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-24powerpc: Fix kernel crash in show_instructions() w/DEBUG_VIRTUALAneesh Kumar K.V
[ Upstream commit a6e2c226c3d51fd93636320e47cabc8a8f0824c5 ] With CONFIG_DEBUG_VIRTUAL=y, we can hit a BUG() if we take a hard lockup watchdog interrupt when in OPAL mode. This happens in show_instructions() if the kernel takes the watchdog NMI IPI, or any other interrupt, with MSR_IR == 0. show_instructions() updates the variable pc in the loop and the second iteration will result in BUG(). We hit the BUG_ON due the below check in __va() #define __va(x) ({ VIRTUAL_BUG_ON((unsigned long)(x) >= PAGE_OFFSET); (void *)(unsigned long)((phys_addr_t)(x) | PAGE_OFFSET); }) Fix it by moving the check out of the loop. Also update nip so that the nip == pc check still matches. Fixes: 4dd7554a6456 ("powerpc/64: Add VIRTUAL_BUG_ON checks for __va and __pa addresses") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [mpe: Use IS_ENABLED(), massage change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200524093822.423487-1-aneesh.kumar@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-24powerpc/64s/kuap: Add missing isync to KUAP restore pathsNicholas Piggin
[ Upstream commit cb2b53cbffe3c388cd676b63f34e54ceb2643ae2 ] Writing the AMR register is documented to require context synchronizing operations before and after, for it to take effect as expected. The KUAP restore at interrupt exit time deliberately avoids the isync after the AMR update because it only needs to take effect after the context synchronizing RFID that soon follows. Add a comment for this. The missing isync before the update doesn't have an obvious justification, and seems it could theoretically allow a rogue user access to leak past the AMR update. Add isyncs for these. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200429065654.1677541-3-npiggin@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-24powerpc/4xx: Don't unmap NULL mbasehuhai
[ Upstream commit bcec081ecc940fc38730b29c743bbee661164161 ] Signed-off-by: huhai <huhai@tj.kylinos.cn> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200521072648.1254699-1-mpe@ellerman.id.au Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-24KVM: PPC: Book3S HV: Relax check on H_SVM_INIT_ABORTLaurent Dufour
[ Upstream commit e3326ae3d59e443a379367c6936941d6ab55d316 ] The commit 8c47b6ff29e3 ("KVM: PPC: Book3S HV: Check caller of H_SVM_* Hcalls") added checks of secure bit of SRR1 to filter out the Hcall reserved to the Ultravisor. However, the Hcall H_SVM_INIT_ABORT is made by the Ultravisor passing the context of the VM calling UV_ESM. This allows the Hypervisor to return to the guest without going through the Ultravisor. Thus the Secure bit of SRR1 is not set in that particular case. In the case a regular VM is calling H_SVM_INIT_ABORT, this hcall will be filtered out in kvmppc_h_svm_init_abort() because kvm->arch.secure_guest is not set in that case. Fixes: 8c47b6ff29e3 ("KVM: PPC: Book3S HV: Check caller of H_SVM_* Hcalls") Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-24KVM: PPC: Book3S: Fix some RCU-list locksQian Cai
[ Upstream commit ab8b65be183180c3eef405d449163964ecc4b571 ] It is unsafe to traverse kvm->arch.spapr_tce_tables and stt->iommu_tables without the RCU read lock held. Also, add cond_resched_rcu() in places with the RCU read lock held that could take a while to finish. arch/powerpc/kvm/book3s_64_vio.c:76 RCU-list traversed in non-reader section!! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 no locks held by qemu-kvm/4265. stack backtrace: CPU: 96 PID: 4265 Comm: qemu-kvm Not tainted 5.7.0-rc4-next-20200508+ #2 Call Trace: [c000201a8690f720] [c000000000715948] dump_stack+0xfc/0x174 (unreliable) [c000201a8690f770] [c0000000001d9470] lockdep_rcu_suspicious+0x140/0x164 [c000201a8690f7f0] [c008000010b9fb48] kvm_spapr_tce_release_iommu_group+0x1f0/0x220 [kvm] [c000201a8690f870] [c008000010b8462c] kvm_spapr_tce_release_vfio_group+0x54/0xb0 [kvm] [c000201a8690f8a0] [c008000010b84710] kvm_vfio_destroy+0x88/0x140 [kvm] [c000201a8690f8f0] [c008000010b7d488] kvm_put_kvm+0x370/0x600 [kvm] [c000201a8690f990] [c008000010b7e3c0] kvm_vm_release+0x38/0x60 [kvm] [c000201a8690f9c0] [c0000000005223f4] __fput+0x124/0x330 [c000201a8690fa20] [c000000000151cd8] task_work_run+0xb8/0x130 [c000201a8690fa70] [c0000000001197e8] do_exit+0x4e8/0xfa0 [c000201a8690fb70] [c00000000011a374] do_group_exit+0x64/0xd0 [c000201a8690fbb0] [c000000000132c90] get_signal+0x1f0/0x1200 [c000201a8690fcc0] [c000000000020690] do_notify_resume+0x130/0x3c0 [c000201a8690fda0] [c000000000038d64] syscall_exit_prepare+0x1a4/0x280 [c000201a8690fe20] [c00000000000c8f8] system_call_common+0xf8/0x278 ==== arch/powerpc/kvm/book3s_64_vio.c:368 RCU-list traversed in non-reader section!! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 2 locks held by qemu-kvm/4264: #0: c000201ae2d000d8 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0xdc/0x950 [kvm] #1: c000200c9ed0c468 (&kvm->srcu){....}-{0:0}, at: kvmppc_h_put_tce+0x88/0x340 [kvm] ==== arch/powerpc/kvm/book3s_64_vio.c:108 RCU-list traversed in non-reader section!! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by qemu-kvm/4257: #0: c000200b1b363a40 (&kv->lock){+.+.}-{3:3}, at: kvm_vfio_set_attr+0x598/0x6c0 [kvm] ==== arch/powerpc/kvm/book3s_64_vio.c:146 RCU-list traversed in non-reader section!! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by qemu-kvm/4257: #0: c000200b1b363a40 (&kv->lock){+.+.}-{3:3}, at: kvm_vfio_set_attr+0x598/0x6c0 [kvm] Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-24KVM: PPC: Book3S HV: Ignore kmemleak false positivesQian Cai
[ Upstream commit 0aca8a5575544bd21b3363058afb8f1e81505150 ] kvmppc_pmd_alloc() and kvmppc_pte_alloc() allocate some memory but then pud_populate() and pmd_populate() will use __pa() to reference the newly allocated memory. Since kmemleak is unable to track the physical memory resulting in false positives, silence those by using kmemleak_ignore(). unreferenced object 0xc000201c382a1000 (size 4096): comm "qemu-kvm", pid 124828, jiffies 4295733767 (age 341.250s) hex dump (first 32 bytes): c0 00 20 09 f4 60 03 87 c0 00 20 10 72 a0 03 87 .. ..`.... .r... c0 00 20 0e 13 a0 03 87 c0 00 20 1b dc c0 03 87 .. ....... ..... backtrace: [<000000004cc2790f>] kvmppc_create_pte+0x838/0xd20 [kvm_hv] kvmppc_pmd_alloc at arch/powerpc/kvm/book3s_64_mmu_radix.c:366 (inlined by) kvmppc_create_pte at arch/powerpc/kvm/book3s_64_mmu_radix.c:590 [<00000000d123c49a>] kvmppc_book3s_instantiate_page+0x2e0/0x8c0 [kvm_hv] [<00000000bb549087>] kvmppc_book3s_radix_page_fault+0x1b4/0x2b0 [kvm_hv] [<0000000086dddc0e>] kvmppc_book3s_hv_page_fault+0x214/0x12a0 [kvm_hv] [<000000005ae9ccc2>] kvmppc_vcpu_run_hv+0xc5c/0x15f0 [kvm_hv] [<00000000d22162ff>] kvmppc_vcpu_run+0x34/0x48 [kvm] [<00000000d6953bc4>] kvm_arch_vcpu_ioctl_run+0x314/0x420 [kvm] [<000000002543dd54>] kvm_vcpu_ioctl+0x33c/0x950 [kvm] [<0000000048155cd6>] ksys_ioctl+0xd8/0x130 [<0000000041ffeaa7>] sys_ioctl+0x28/0x40 [<000000004afc4310>] system_call_exception+0x114/0x1e0 [<00000000fb70a873>] system_call_common+0xf0/0x278 unreferenced object 0xc0002001f0c03900 (size 256): comm "qemu-kvm", pid 124830, jiffies 4295735235 (age 326.570s) hex dump (first 32 bytes): c0 00 20 10 fa a0 03 87 c0 00 20 10 fa a1 03 87 .. ....... ..... c0 00 20 10 fa a2 03 87 c0 00 20 10 fa a3 03 87 .. ....... ..... backtrace: [<0000000023f675b8>] kvmppc_create_pte+0x854/0xd20 [kvm_hv] kvmppc_pte_alloc at arch/powerpc/kvm/book3s_64_mmu_radix.c:356 (inlined by) kvmppc_create_pte at arch/powerpc/kvm/book3s_64_mmu_radix.c:593 [<00000000d123c49a>] kvmppc_book3s_instantiate_page+0x2e0/0x8c0 [kvm_hv] [<00000000bb549087>] kvmppc_book3s_radix_page_fault+0x1b4/0x2b0 [kvm_hv] [<0000000086dddc0e>] kvmppc_book3s_hv_page_fault+0x214/0x12a0 [kvm_hv] [<000000005ae9ccc2>] kvmppc_vcpu_run_hv+0xc5c/0x15f0 [kvm_hv] [<00000000d22162ff>] kvmppc_vcpu_run+0x34/0x48 [kvm] [<00000000d6953bc4>] kvm_arch_vcpu_ioctl_run+0x314/0x420 [kvm] [<000000002543dd54>] kvm_vcpu_ioctl+0x33c/0x950 [kvm] [<0000000048155cd6>] ksys_ioctl+0xd8/0x130 [<0000000041ffeaa7>] sys_ioctl+0x28/0x40 [<000000004afc4310>] system_call_exception+0x114/0x1e0 [<00000000fb70a873>] system_call_common+0xf0/0x278 Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-24powerpc/8xx: Drop CONFIG_8xx_COPYBACK optionChristophe Leroy
[ Upstream commit d3efcd38c0b99162d889e36a30425345a18edb33 ] CONFIG_8xx_COPYBACK was there to help disabling copyback cache mode for debuging hardware. But nobody will design new boards with 8xx now. All 8xx platforms select it, so make it the default and remove the option. Also remove the Mx_RESETVAL values which are pretty useless and hide the real value while reading code. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/bcc968cda075516eb76e2f25e09821f582c566b4.1589866984.git.christophe.leroy@csgroup.eu Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-24powerpc/32s: Don't warn when mapping RO data ROX.Christophe Leroy
[ Upstream commit 4b19f96a81bceaf0bcf44d79c0855c61158065ec ] Mapping RO data as ROX is not an issue since that data cannot be modified to introduce an exploit. PPC64 accepts to have RO data mapped ROX, as a trade off between kernel size and strictness of protection. On PPC32, kernel size is even more critical as amount of memory is usually small. Depending on the number of available IBATs, the last IBATs might overflow the end of text. Only warn if it crosses the end of RO data. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/6499f8eeb2a36330e5c9fc1cee9a79374875bd54.1589866984.git.christophe.leroy@csgroup.eu Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-24powerpc/kasan: Fix error detection on memory allocationChristophe Leroy
[ Upstream commit d132443a73d7a131775df46f33000f67ed92de1e ] In case (k_start & PAGE_MASK) doesn't equal (kstart), 'va' will never be NULL allthough 'block' is NULL Check the return of memblock_alloc() directly instead of the resulting address in the loop. Fixes: 509cd3f2b473 ("powerpc/32: Simplify KASAN init") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/7cb8ca82042bfc45a5cfe726c921cd7e7eeb12a3.1589866984.git.christophe.leroy@csgroup.eu Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-24powerpc/64s/pgtable: fix an undefined behaviourQian Cai
[ Upstream commit c2e929b18cea6cbf71364f22d742d9aad7f4677a ] Booting a power9 server with hash MMU could trigger an undefined behaviour because pud_offset(p4d, 0) will do, 0 >> (PAGE_SHIFT:16 + PTE_INDEX_SIZE:8 + H_PMD_INDEX_SIZE:10) Fix it by converting pud_index() and friends to static inline functions. UBSAN: shift-out-of-bounds in arch/powerpc/mm/ptdump/ptdump.c:282:15 shift exponent 34 is too large for 32-bit type 'int' CPU: 6 PID: 1 Comm: swapper/0 Not tainted 5.6.0-rc4-next-20200303+ #13 Call Trace: dump_stack+0xf4/0x164 (unreliable) ubsan_epilogue+0x18/0x78 __ubsan_handle_shift_out_of_bounds+0x160/0x21c walk_pagetables+0x2cc/0x700 walk_pud at arch/powerpc/mm/ptdump/ptdump.c:282 (inlined by) walk_pagetables at arch/powerpc/mm/ptdump/ptdump.c:311 ptdump_check_wx+0x8c/0xf0 mark_rodata_ro+0x48/0x80 kernel_init+0x74/0x194 ret_from_kernel_thread+0x5c/0x74 Suggested-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr> Link: https://lore.kernel.org/r/20200306044852.3236-1-cai@lca.pw Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-24powerpc/powernv: add NULL check after kzallocChen Zhou
[ Upstream commit ceffa63acce7165c442395b7d64a11ab8b5c5dca ] Fixes coccicheck warning: ./arch/powerpc/platforms/powernv/opal.c:813:1-5: alloc with no test, possible model on line 814 Add NULL check after kzalloc. Signed-off-by: Chen Zhou <chenzhou10@huawei.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200509020838.121660-1-chenzhou10@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-24powerpc/ps3: Fix kexec shutdown hangGeoff Levand
[ Upstream commit 126554465d93b10662742128918a5fc338cda4aa ] The ps3_mm_region_destroy() and ps3_mm_vas_destroy() routines are called very late in the shutdown via kexec's mmu_cleanup_all routine. By the time mmu_cleanup_all runs it is too late to use udbg_printf, and calling it will cause PS3 systems to hang. Remove all debugging statements from ps3_mm_region_destroy() and ps3_mm_vas_destroy() and replace any error reporting with calls to lv1_panic. With this change builds with 'DEBUG' defined will not cause kexec reboots to hang, and builds with 'DEBUG' defined or not will end in lv1_panic if an error is encountered. Signed-off-by: Geoff Levand <geoff@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/7325c4af2b4c989c19d6a26b90b1fec9c0615ddf.1589049250.git.geoff@infradead.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-24powerpc/pseries/ras: Fix FWNMI_VALID off by oneNicholas Piggin
[ Upstream commit deb70f7a35a22dffa55b2c3aac71bc6fb0f486ce ] This was discovered developing qemu fwnmi sreset support. This off-by-one bug means the last 16 bytes of the rtas area can not be used for a 16 byte save area. It's not a serious bug, and QEMU implementation has to retain a workaround for old kernels, but it's good to tighten it. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Link: https://lore.kernel.org/r/20200508043408.886394-7-npiggin@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-24powerpc/64s/exceptions: Machine check reconcile irq stateNicholas Piggin
[ Upstream commit f0fd9dd3c213c947dfb5bc2cad3ef5e30d3258ec ] pseries fwnmi machine check code pops the soft-irq checks in rtas_call (after the next patch to remove rtas_token from this call path). Rather than play whack a mole with these and forever having fragile code, it seems better to have the early machine check handler perform the same kind of reconcile as the other NMI interrupts. WARNING: CPU: 0 PID: 493 at arch/powerpc/kernel/irq.c:343 CPU: 0 PID: 493 Comm: a Tainted: G W NIP: c00000000001ed2c LR: c000000000042c40 CTR: 0000000000000000 REGS: c0000001fffd38b0 TRAP: 0700 Tainted: G W MSR: 8000000000021003 <SF,ME,RI,LE> CR: 28000488 XER: 00000000 CFAR: c00000000001ec90 IRQMASK: 0 GPR00: c000000000043820 c0000001fffd3b40 c0000000012ba300 0000000000000000 GPR04: 0000000048000488 0000000000000000 0000000000000000 00000000deadbeef GPR08: 0000000000000080 0000000000000000 0000000000000000 0000000000001001 GPR12: 0000000000000000 c0000000014a0000 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR28: 0000000000000000 0000000000000001 c000000001360810 0000000000000000 NIP [c00000000001ed2c] arch_local_irq_restore.part.0+0xac/0x100 LR [c000000000042c40] unlock_rtas+0x30/0x90 Call Trace: [c0000001fffd3b40] [c000000001360810] 0xc000000001360810 (unreliable) [c0000001fffd3b60] [c000000000043820] rtas_call+0x1c0/0x280 [c0000001fffd3bb0] [c0000000000dc328] fwnmi_release_errinfo+0x38/0x70 [c0000001fffd3c10] [c0000000000dcd8c] pseries_machine_check_realmode+0x1dc/0x540 [c0000001fffd3cd0] [c00000000003fe04] machine_check_early+0x54/0x70 [c0000001fffd3d00] [c000000000008384] machine_check_early_common+0x134/0x1f0 --- interrupt: 200 at 0x13f1307c8 LR = 0x7fff888b8528 Instruction dump: 60000000 7d2000a6 71298000 41820068 39200002 7d210164 4bffff9c 60000000 60000000 7d2000a6 71298000 4c820020 <0fe00000> 4e800020 60000000 60000000 Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200508043408.886394-5-npiggin@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-24powerpc/64s/exception: Fix machine check no-loss idle wakeupNicholas Piggin
[ Upstream commit 8a5054d8cbbe03c68dcb0957c291c942132e4101 ] The architecture allows for machine check exceptions to cause idle wakeups which resume at the 0x200 address which has to return via the idle wakeup code, but the early machine check handler is run first. The case of a no state-loss sleep is broken because the early handler uses non-volatile register r1 , which is needed for the wakeup protocol, but it is not restored. Fix this by loading r1 from the MCE exception frame before returning to the idle wakeup code. Also update the comment which has become stale since the idle rewrite in C. This crash was found and fix confirmed with a machine check injection test in qemu powernv model (which is not upstream in qemu yet). Fixes: 10d91611f426d ("powerpc/64s: Reimplement book3s idle code in C") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200508043408.886394-2-npiggin@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-24powerpc/64: Don't initialise init_task->thread.regsMichael Ellerman
[ Upstream commit 7ffa8b7dc11752827329e4e84a574ea6aaf24716 ] Aneesh increased the size of struct pt_regs by 16 bytes and started seeing this WARN_ON: smp: Bringing up secondary CPUs ... ------------[ cut here ]------------ WARNING: CPU: 0 PID: 0 at arch/powerpc/kernel/process.c:455 giveup_all+0xb4/0x110 Modules linked in: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.7.0-rc2-gcc-8.2.0-1.g8f6a41f-default+ #318 NIP: c00000000001a2b4 LR: c00000000001a29c CTR: c0000000031d0000 REGS: c0000000026d3980 TRAP: 0700 Not tainted (5.7.0-rc2-gcc-8.2.0-1.g8f6a41f-default+) MSR: 800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 48048224 XER: 00000000 CFAR: c000000000019cc8 IRQMASK: 1 GPR00: c00000000001a264 c0000000026d3c20 c0000000026d7200 800000000280b033 GPR04: 0000000000000001 0000000000000000 0000000000000077 30206d7372203164 GPR08: 0000000000002000 0000000002002000 800000000280b033 3230303030303030 GPR12: 0000000000008800 c0000000031d0000 0000000000800050 0000000002000066 GPR16: 000000000309a1a0 000000000309a4b0 000000000309a2d8 000000000309a890 GPR20: 00000000030d0098 c00000000264da40 00000000fd620000 c0000000ff798080 GPR24: c00000000264edf0 c0000001007469f0 00000000fd620000 c0000000020e5e90 GPR28: c00000000264edf0 c00000000264d200 000000001db60000 c00000000264d200 NIP [c00000000001a2b4] giveup_all+0xb4/0x110 LR [c00000000001a29c] giveup_all+0x9c/0x110 Call Trace: [c0000000026d3c20] [c00000000001a264] giveup_all+0x64/0x110 (unreliable) [c0000000026d3c90] [c00000000001ae34] __switch_to+0x104/0x480 [c0000000026d3cf0] [c000000000e0b8a0] __schedule+0x320/0x970 [c0000000026d3dd0] [c000000000e0c518] schedule_idle+0x38/0x70 [c0000000026d3df0] [c00000000019c7c8] do_idle+0x248/0x3f0 [c0000000026d3e70] [c00000000019cbb8] cpu_startup_entry+0x38/0x40 [c0000000026d3ea0] [c000000000011bb0] rest_init+0xe0/0xf8 [c0000000026d3ed0] [c000000002004820] start_kernel+0x990/0x9e0 [c0000000026d3f90] [c00000000000c49c] start_here_common+0x1c/0x400 Which was unexpected. The warning is checking the thread.regs->msr value of the task we are switching from: usermsr = tsk->thread.regs->msr; ... WARN_ON((usermsr & MSR_VSX) && !((usermsr & MSR_FP) && (usermsr & MSR_VEC))); ie. if MSR_VSX is set then both of MSR_FP and MSR_VEC are also set. Dumping tsk->thread.regs->msr we see that it's: 0x1db60000 Which is not a normal looking MSR, in fact the only valid bit is MSR_VSX, all the other bits are reserved in the current definition of the MSR. We can see from the oops that it was swapper/0 that we were switching from when we hit the warning, ie. init_task. So its thread.regs points to the base (high addresses) in init_stack. Dumping the content of init_task->thread.regs, with the members of pt_regs annotated (the 16 bytes larger version), we see: 0000000000000000 c000000002780080 gpr[0] gpr[1] 0000000000000000 c000000002666008 gpr[2] gpr[3] c0000000026d3ed0 0000000000000078 gpr[4] gpr[5] c000000000011b68 c000000002780080 gpr[6] gpr[7] 0000000000000000 0000000000000000 gpr[8] gpr[9] c0000000026d3f90 0000800000002200 gpr[10] gpr[11] c000000002004820 c0000000026d7200 gpr[12] gpr[13] 000000001db60000 c0000000010aabe8 gpr[14] gpr[15] c0000000010aabe8 c0000000010aabe8 gpr[16] gpr[17] c00000000294d598 0000000000000000 gpr[18] gpr[19] 0000000000000000 0000000000001ff8 gpr[20] gpr[21] 0000000000000000 c00000000206d608 gpr[22] gpr[23] c00000000278e0cc 0000000000000000 gpr[24] gpr[25] 000000002fff0000 c000000000000000 gpr[26] gpr[27] 0000000002000000 0000000000000028 gpr[28] gpr[29] 000000001db60000 0000000004750000 gpr[30] gpr[31] 0000000002000000 000000001db60000 nip msr 0000000000000000 0000000000000000 orig_r3 ctr c00000000000c49c 0000000000000000 link xer 0000000000000000 0000000000000000 ccr softe 0000000000000000 0000000000000000 trap dar 0000000000000000 0000000000000000 dsisr result 0000000000000000 0000000000000000 ppr kuap 0000000000000000 0000000000000000 pad[2] pad[3] This looks suspiciously like stack frames, not a pt_regs. If we look closely we can see return addresses from the stack trace above, c000000002004820 (start_kernel) and c00000000000c49c (start_here_common). init_task->thread.regs is setup at build time in processor.h: #define INIT_THREAD { \ .ksp = INIT_SP, \ .regs = (struct pt_regs *)INIT_SP - 1, /* XXX bogus, I think */ \ The early boot code where we setup the initial stack is: LOAD_REG_ADDR(r3,init_thread_union) /* set up a stack pointer */ LOAD_REG_IMMEDIATE(r1,THREAD_SIZE) add r1,r3,r1 li r0,0 stdu r0,-STACK_FRAME_OVERHEAD(r1) Which creates a stack frame of size 112 bytes (STACK_FRAME_OVERHEAD). Which is far too small to contain a pt_regs. So the result is init_task->thread.regs is pointing at some stack frames on the init stack, not at a pt_regs. We have gotten away with this for so long because with pt_regs at its current size the MSR happens to point into the first frame, at a location that is not written to by the early asm. With the 16 byte expansion the MSR falls into the second frame, which is used by the compiler, and collides with a saved register that tends to be non-zero. As far as I can see this has been wrong since the original merge of 64-bit ppc support, back in 2002. Conceptually swapper should have no regs, it never entered from userspace, and in fact that's what we do on 32-bit. It's also presumably what the "bogus" comment is referring to. So I think the right fix is to just not-initialise regs at all. I'm slightly worried this will break some code that isn't prepared for a NULL regs, but we'll have to see. Remove the comment in head_64.S which refers to us setting up the regs (even though we never did), and is otherwise not really accurate any more. Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200428123130.73078-1-mpe@ellerman.id.au Signed-off-by: Sasha Levin <sashal@kernel.org>