summaryrefslogtreecommitdiffstats
path: root/arch/powerpc
AgeCommit message (Collapse)Author
2020-02-11KVM: Use vcpu-specific gva->hva translation when querying host page sizeSean Christopherson
[ Upstream commit f9b84e19221efc5f493156ee0329df3142085f28 ] Use kvm_vcpu_gfn_to_hva() when retrieving the host page size so that the correct set of memslots is used when handling x86 page faults in SMM. Fixes: 54bf36aac520 ("KVM: x86: use vcpu-specific functions to read/write/translate GFNs") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-11powerpc/44x: Adjust indentation in ibm4xx_denali_fixup_memsizeNathan Chancellor
commit c3aae14e5d468d18dbb5d7c0c8c7e2968cc14aad upstream. Clang warns: ../arch/powerpc/boot/4xx.c:231:3: warning: misleading indentation; statement is not part of the previous 'else' [-Wmisleading-indentation] val = SDRAM0_READ(DDR0_42); ^ ../arch/powerpc/boot/4xx.c:227:2: note: previous statement is here else ^ This is because there is a space at the beginning of this line; remove it so that the indentation is consistent according to the Linux kernel coding style and clang no longer warns. Fixes: d23f5099297c ("[POWERPC] 4xx: Adds decoding of 440SPE memory size to boot wrapper library") Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://github.com/ClangBuiltLinux/linux/issues/780 Link: https://lore.kernel.org/r/20191209200338.12546-1-natechancellor@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11mm/mmu_gather: invalidate TLB correctly on batch allocation failure and flushPeter Zijlstra
commit 0ed1325967ab5f7a4549a2641c6ebe115f76e228 upstream. Architectures for which we have hardware walkers of Linux page table should flush TLB on mmu gather batch allocation failures and batch flush. Some architectures like POWER supports multiple translation modes (hash and radix) and in the case of POWER only radix translation mode needs the above TLBI. This is because for hash translation mode kernel wants to avoid this extra flush since there are no hardware walkers of linux page table. With radix translation, the hardware also walks linux page table and with that, kernel needs to make sure to TLB invalidate page walk cache before page table pages are freed. More details in commit d86564a2f085 ("mm/tlb, x86/mm: Support invalidating TLB caches for RCU_TABLE_FREE") The changes to sparc are to make sure we keep the old behavior since we are now removing HAVE_RCU_TABLE_NO_INVALIDATE. The default value for tlb_needs_table_invalidate is to always force an invalidate and sparc can avoid the table invalidate. Hence we define tlb_needs_table_invalidate to false for sparc architecture. Link: http://lkml.kernel.org/r/20200116064531.483522-3-aneesh.kumar@linux.ibm.com Fixes: a46cc7a90fd8 ("powerpc/mm/radix: Improve TLB/PWC flushes") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Cc: <stable@vger.kernel.org> [4.14+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11KVM: PPC: Book3S PR: Free shared page if mmu initialization failsSean Christopherson
commit cb10bf9194f4d2c5d830eddca861f7ca0fecdbb4 upstream. Explicitly free the shared page if kvmppc_mmu_init() fails during kvmppc_core_vcpu_create(), as the page is freed only in kvmppc_core_vcpu_free(), which is not reached via kvm_vcpu_uninit(). Fixes: 96bc451a15329 ("KVM: PPC: Introduce shared page") Cc: stable@vger.kernel.org Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11KVM: PPC: Book3S HV: Uninit vCPU if vcore creation failsSean Christopherson
commit 1a978d9d3e72ddfa40ac60d26301b154247ee0bc upstream. Call kvm_vcpu_uninit() if vcore creation fails to avoid leaking any resources allocated by kvm_vcpu_init(), i.e. the vcpu->run page. Fixes: 371fefd6f2dc4 ("KVM: PPC: Allow book3s_hv guests to use SMT processor modes") Cc: stable@vger.kernel.org Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11powerpc/futex: Fix incorrect user access blockingMichael Ellerman
commit 9dc086f1e9ef39dd823bd27954b884b2062f9e70 upstream. The early versions of our kernel user access prevention (KUAP) were written by Russell and Christophe, and didn't have separate read/write access. At some point I picked up the series and added the read/write access, but I failed to update the usages in futex.h to correctly allow read and write. However we didn't notice because of another bug which was causing the low-level code to always enable read and write. That bug was fixed recently in commit 1d8f739b07bd ("powerpc/kuap: Fix set direction in allow/prevent_user_access()"). futex_atomic_cmpxchg_inatomic() is passed the user address as %3 and does: 1: lwarx %1, 0, %3 cmpw 0, %1, %4 bne- 3f 2: stwcx. %5, 0, %3 Which clearly loads and stores from/to %3. The logic in arch_futex_atomic_op_inuser() is similar, so fix both of them to use allow_read_write_user(). Without this fix, and with PPC_KUAP_DEBUG=y, we see eg: Bug: Read fault blocked by AMR! WARNING: CPU: 94 PID: 149215 at arch/powerpc/include/asm/book3s/64/kup-radix.h:126 __do_page_fault+0x600/0xf30 CPU: 94 PID: 149215 Comm: futex_requeue_p Tainted: G W 5.5.0-rc7-gcc9x-g4c25df5640ae #1 ... NIP [c000000000070680] __do_page_fault+0x600/0xf30 LR [c00000000007067c] __do_page_fault+0x5fc/0xf30 Call Trace: [c00020138e5637e0] [c00000000007067c] __do_page_fault+0x5fc/0xf30 (unreliable) [c00020138e5638c0] [c00000000000ada8] handle_page_fault+0x10/0x30 --- interrupt: 301 at cmpxchg_futex_value_locked+0x68/0xd0 LR = futex_lock_pi_atomic+0xe0/0x1f0 [c00020138e563bc0] [c000000000217b50] futex_lock_pi_atomic+0x80/0x1f0 (unreliable) [c00020138e563c30] [c00000000021b668] futex_requeue+0x438/0xb60 [c00020138e563d60] [c00000000021c6cc] do_futex+0x1ec/0x2b0 [c00020138e563d90] [c00000000021c8b8] sys_futex+0x128/0x200 [c00020138e563e20] [c00000000000b7ac] system_call+0x5c/0x68 Fixes: de78a9c42a79 ("powerpc: Add a framework for Kernel Userspace Access Protection") Cc: stable@vger.kernel.org # v5.2+ Reported-by: syzbot+e808452bad7c375cbee6@syzkaller-ppc64.appspotmail.com Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr> Link: https://lore.kernel.org/r/20200207122145.11928-1-mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11of: Add OF_DMA_DEFAULT_COHERENT & select it on powerpcMichael Ellerman
commit dabf6b36b83a18d57e3d4b9d50544ed040d86255 upstream. There's an OF helper called of_dma_is_coherent(), which checks if a device has a "dma-coherent" property to see if the device is coherent for DMA. But on some platforms devices are coherent by default, and on some platforms it's not possible to update existing device trees to add the "dma-coherent" property. So add a Kconfig symbol to allow arch code to tell of_dma_is_coherent() that devices are coherent by default, regardless of the presence of the property. Select that symbol on powerpc when NOT_COHERENT_CACHE is not set, ie. when the system has a coherent cache. Fixes: 92ea637edea3 ("of: introduce of_dma_is_coherent() helper") Cc: stable@vger.kernel.org # v3.16+ Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de> Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11powerpc/32s: Fix CPU wake-up from sleep modeChristophe Leroy
commit 9933819099c4600b41a042f27a074470a43cf6b9 upstream. Commit f7354ccac844 ("powerpc/32: Remove CURRENT_THREAD_INFO and rename TI_CPU") broke the CPU wake-up from sleep mode (i.e. when _TLF_SLEEPING is set) by delaying the tovirt(r2, r2). This is because r2 is not restored by fast_exception_return. It used to work (by chance ?) because CPU wake-up interrupt never comes from user, so r2 is expected to point to 'current' on return. Commit e2fb9f544431 ("powerpc/32: Prepare for Kernel Userspace Access Protection") broke it even more by clobbering r0 which is not restored by fast_exception_return either. Use r6 instead of r0. This is possible because r3-r6 are restored by fast_exception_return and only r3-r5 are used for exception arguments. For r2 it could be converted back to virtual address, but stay on the safe side and restore it from the stack instead. It should be live in the cache at that moment, so loading from the stack should make no difference compared to converting it from phys to virt. Fixes: f7354ccac844 ("powerpc/32: Remove CURRENT_THREAD_INFO and rename TI_CPU") Fixes: e2fb9f544431 ("powerpc/32: Prepare for Kernel Userspace Access Protection") Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/6d02c3ae6ad77af34392e98117e44c2bf6d13ba1.1580121710.git.christophe.leroy@c-s.fr Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11powerpc/32s: Fix bad_kuap_fault()Christophe Leroy
commit 6ec20aa2e510b6297906c45f009aa08b2d97269a upstream. At the moment, bad_kuap_fault() reports a fault only if a bad access to userspace occurred while access to userspace was not granted. But if a fault occurs for a write outside the allowed userspace segment(s) that have been unlocked, bad_kuap_fault() fails to detect it and the kernel loops forever in do_page_fault(). Fix it by checking that the accessed address is within the allowed range. Fixes: a68c31fc01ef ("powerpc/32s: Implement Kernel Userspace Access Protection") Cc: stable@vger.kernel.org # v5.2+ Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/f48244e9485ada0a304ed33ccbb8da271180c80d.1579866752.git.christophe.leroy@c-s.fr Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11powerpc/pseries: Advance pfn if section is not present in lmb_is_removable()Pingfan Liu
commit fbee6ba2dca30d302efe6bddb3a886f5e964a257 upstream. In lmb_is_removable(), if a section is not present, it should continue to test the rest of the sections in the block. But the current code fails to do so. Fixes: 51925fb3c5c9 ("powerpc/pseries: Implement memory hotplug remove in the kernel") Cc: stable@vger.kernel.org # v4.1+ Signed-off-by: Pingfan Liu <kernelfans@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1578632042-12415-1-git-send-email-kernelfans@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11powerpc/xmon: don't access ASDR in VMsSukadev Bhattiprolu
commit c2a20711fc181e7f22ee5c16c28cb9578af84729 upstream. ASDR is HV-privileged and must only be accessed in HV-mode. Fixes a Program Check (0x700) when xmon in a VM dumps SPRs. Fixes: d1e1b351f50f ("powerpc/xmon: Add ISA v3.0 SPRs to SPR dump") Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200107021633.GB29843@us.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11powerpc/ptdump: Fix W+X verificationChristophe Leroy
commit d80ae83f1f932ab7af47b54d0d3bef4f4dba489f upstream. Verification cannot rely on simple bit checking because on some platforms PAGE_RW is 0, checking that a page is not W means checking that PAGE_RO is set instead of checking that PAGE_RW is not set. Use pte helpers instead of checking bits. Fixes: 453d87f6a8ae ("powerpc/mm: Warn if W+X pages found on boot") Cc: stable@vger.kernel.org # v5.2+ Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/0d894839fdbb19070f0e1e4140363be4f2bb62fc.1578989540.git.christophe.leroy@c-s.fr Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11powerpc/mmu_gather: enable RCU_TABLE_FREE even for !SMP caseAneesh Kumar K.V
commit 12e4d53f3f04e81f9e83d6fc10edc7314ab9f6b9 upstream. Patch series "Fixup page directory freeing", v4. This is a repost of patch series from Peter with the arch specific changes except ppc64 dropped. ppc64 changes are added here because we are redoing the patch series on top of ppc64 changes. This makes it easy to backport these changes. Only the first 2 patches need to be backported to stable. The thing is, on anything SMP, freeing page directories should observe the exact same order as normal page freeing: 1) unhook page/directory 2) TLB invalidate 3) free page/directory Without this, any concurrent page-table walk could end up with a Use-after-Free. This is esp. trivial for anything that has software page-table walkers (HAVE_FAST_GUP / software TLB fill) or the hardware caches partial page-walks (ie. caches page directories). Even on UP this might give issues since mmu_gather is preemptible these days. An interrupt or preempted task accessing user pages might stumble into the free page if the hardware caches page directories. This patch series fixes ppc64 and add generic MMU_GATHER changes to support the conversion of other architectures. I haven't added patches w.r.t other architecture because they are yet to be acked. This patch (of 9): A followup patch is going to make sure we correctly invalidate page walk cache before we free page table pages. In order to keep things simple enable RCU_TABLE_FREE even for !SMP so that we don't have to fixup the !SMP case differently in the followup patch !SMP case is right now broken for radix translation w.r.t page walk cache flush. We can get interrupted in between page table free and that would imply we have page walk cache entries pointing to tables which got freed already. Michael said "both our platforms that run on Power9 force SMP on in Kconfig, so the !SMP case is unlikely to be a problem for anyone in practice, unless they've hacked their kernel to build it !SMP." Link: http://lkml.kernel.org/r/20200116064531.483522-2-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-05powerpc/fsl/dts: add fsl,erratum-a011043Madalin Bucur
[ Upstream commit 73d527aef68f7644e59f22ce7f9ac75e7b533aea ] Add fsl,erratum-a011043 to internal MDIO buses. Software may get false read error when reading internal PCS registers through MDIO. As a workaround, all internal MDIO accesses should ignore the MDIO_CFG[MDIO_RD_ER] bit. Signed-off-by: Madalin Bucur <madalin.bucur@oss.nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-29powerpc/xive: Discard ESB load value when interrupt is invalidFrederic Barrat
commit 17328f218fb760c9c6accc5b52494889243a6b98 upstream. A load on an ESB page returning all 1's means that the underlying device has invalidated the access to the PQ state of the interrupt through mmio. It may happen, for example when querying a PHB interrupt while the PHB is in an error state. In that case, we should consider the interrupt to be invalid when checking its state in the irq_get_irqchip_state() handler. Fixes: da15c03b047d ("powerpc/xive: Implement get_irqchip_state method for XIVE to fix shutdown race") Cc: stable@vger.kernel.org # v5.4+ Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> [clg: wrote a commit log, introduced XIVE_ESB_INVALID ] Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200113130118.27969-1-clg@kaod.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-29powerpc/mm/hash: Fix sharing context ids between kernel & userspaceAneesh Kumar K.V
commit 5d2e5dd5849b4ef5e8ec35e812cdb732c13cd27e upstream. Commit 0034d395f89d ("powerpc/mm/hash64: Map all the kernel regions in the same 0xc range") has a bug in the definition of MIN_USER_CONTEXT. The result is that the context id used for the vmemmap and the lowest context id handed out to userspace are the same. The context id is essentially the process identifier as far as the first stage of the MMU translation is concerned. This can result in multiple SLB entries with the same VSID (Virtual Segment ID), accessible to the kernel and some random userspace process that happens to get the overlapping id, which is not expected eg: 07 c00c000008000000 40066bdea7000500 1T ESID= c00c00 VSID= 66bdea7 LLP:100 12 0002000008000000 40066bdea7000d80 1T ESID= 200 VSID= 66bdea7 LLP:100 Even though the user process and the kernel use the same VSID, the permissions in the hash page table prevent the user process from reading or writing to any kernel mappings. It can also lead to SLB entries with different base page size encodings (LLP), eg: 05 c00c000008000000 00006bde0053b500 256M ESID=c00c00000 VSID= 6bde0053b LLP:100 09 0000000008000000 00006bde0053bc80 256M ESID= 0 VSID= 6bde0053b LLP: 0 Such SLB entries can result in machine checks, eg. as seen on a G5: Oops: Machine check, sig: 7 [#1] BE PAGE SIZE=64K MU-Hash SMP NR_CPUS=4 NUMA Power Mac NIP: c00000000026f248 LR: c000000000295e58 CTR: 0000000000000000 REGS: c0000000erfd3d70 TRAP: 0200 Tainted: G M (5.5.0-rcl-gcc-8.2.0-00010-g228b667d8ea1) MSR: 9000000000109032 <SF,HV,EE,ME,IR,DR,RI> CR: 24282048 XER: 00000000 DAR: c00c000000612c80 DSISR: 00000400 IRQMASK: 0 ... NIP [c00000000026f248] .kmem_cache_free+0x58/0x140 LR [c088000008295e58] .putname 8x88/0xa Call Trace: .putname+0xB8/0xa .filename_lookup.part.76+0xbe/0x160 .do_faccessat+0xe0/0x380 system_call+0x5c/ex68 This happens with 256MB segments and 64K pages, as the duplicate VSID is hit with the first vmemmap segment and the first user segment, and older 32-bit userspace maps things in the first user segment. On other CPUs a machine check is not seen. Instead the userspace process can get stuck continuously faulting, with the fault never properly serviced, due to the kernel not understanding that there is already a HPTE for the address but with inaccessible permissions. On machines with 1T segments we've not seen the bug hit other than by deliberately exercising it. That seems to be just a matter of luck though, due to the typical layout of the user virtual address space and the ranges of vmemmap that are typically populated. To fix it we add 2 to MIN_USER_CONTEXT. This ensures the lowest context given to userspace doesn't overlap with the VMEMMAP context, or with the context for INVALID_REGION_ID. Fixes: 0034d395f89d ("powerpc/mm/hash64: Map all the kernel regions in the same 0xc range") Cc: stable@vger.kernel.org # v5.2+ Reported-by: Christian Marillat <marillat@debian.org> Reported-by: Romain Dolbeau <romain@dolbeau.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [mpe: Account for INVALID_REGION_ID, mostly rewrite change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200123102547.11623-1-mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-26powerpc/archrandom: fix arch_get_random_seed_int()Ard Biesheuvel
commit b6afd1234cf93aa0d71b4be4788c47534905f0be upstream. Commit 01c9348c7620ec65 powerpc: Use hardware RNG for arch_get_random_seed_* not arch_get_random_* updated arch_get_random_[int|long]() to be NOPs, and moved the hardware RNG backing to arch_get_random_seed_[int|long]() instead. However, it failed to take into account that arch_get_random_int() was implemented in terms of arch_get_random_long(), and so we ended up with a version of the former that is essentially a NOP as well. Fix this by calling arch_get_random_seed_long() from arch_get_random_seed_int() instead. Fixes: 01c9348c7620ec65 ("powerpc: Use hardware RNG for arch_get_random_seed_* not arch_get_random_*") Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191204115015.18015-1-ardb@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-26powerpc/kasan: Fix boot failure with RELOCATABLE && FSL_BOOKEChristophe Leroy
commit 71eb40fc53371bc247c8066ae76ad9e22ae1e18d upstream. When enabling CONFIG_RELOCATABLE and CONFIG_KASAN on FSL_BOOKE, the kernel doesn't boot. relocate_init() requires KASAN early shadow area to be set up because it needs access to the device tree through generic functions. Call kasan_early_init() before calling relocate_init() Reported-by: Lexi Shao <shaolexi@huawei.com> Fixes: 2edb16efc899 ("powerpc/32: Add KASAN support") Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/b58426f1664a4b344ff696d18cacf3b3e8962111.1575036985.git.christophe.leroy@c-s.fr Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-26powerpc/pseries: Enable support for ibm,drc-info propertyTyrel Datwyler
commit 0a87ccd3699983645f54cafd2258514a716b20b8 upstream. Advertise client support for the PAPR architected ibm,drc-info device tree property during CAS handshake. Fixes: c7a3275e0f9e ("powerpc/pseries: Revert support for ibm,drc-info devtree property") Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1573449697-5448-11-git-send-email-tyreld@linux.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-26powerpc/security: Fix debugfs data leak on 32-bitGeert Uytterhoeven
commit 3b05a1e517e1a8cfda4866ec31d28b2bc4fee4c4 upstream. "powerpc_security_features" is "unsigned long", i.e. 32-bit or 64-bit, depending on the platform (PPC_FSL_BOOK3E or PPC_BOOK3S_64). Hence casting its address to "u64 *", and calling debugfs_create_x64() is wrong, and leaks 32-bit of nearby data to userspace on 32-bit platforms. While all currently defined SEC_FTR_* security feature flags fit in 32-bit, they all have "ULL" suffixes to make them 64-bit constants. Hence fix the leak by changing the type of "powerpc_security_features" (and the parameter types of its accessors) to "u64". This also allows to drop the cast. Fixes: 398af571128fe75f ("powerpc/security: Show powerpc_security_features in debugfs") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191021142309.28105-1-geert+renesas@glider.be Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-17powerpc/powernv: Disable native PCIe port managementOliver O'Halloran
commit 9d72dcef891030545f39ad386a30cf91df517fb2 upstream. On PowerNV the PCIe topology is (currently) managed by the powernv platform code in Linux in cooperation with the platform firmware. Linux's native PCIe port service drivers operate independently of both and this can cause problems. The main issue is that the portbus driver will conflict with the platform specific hotplug driver (pnv_php) over ownership of the MSI used to notify the host when a hotplug event occurs. The portbus driver claims this MSI on behalf of the individual port services because the same interrupt is used for hotplug events, PMEs (on root ports), and link bandwidth change notifications. The portbus driver will always claim the interrupt even if the individual port service drivers, such as pciehp, are compiled out. The second, bigger, problem is that the hotplug port service driver fundamentally does not work on PowerNV. The platform assumes that all PCI devices have a corresponding arch-specific handle derived from the DT node for the device (pci_dn) and without one the platform will not allow a PCI device to be enabled. This problem is largely due to historical baggage, but it can't be resolved without significant re-factoring of the platform PCI support. We can fix these problems in the interim by setting the "pcie_ports_disabled" flag during platform initialisation. The flag indicates the platform owns the PCIe ports which stops the portbus driver from being registered. This does have the side effect of disabling all port services drivers that is: AER, PME, BW notifications, hotplug, and DPC. However, this is not a huge disadvantage on PowerNV since these services are either unused or handled through other means. Fixes: 66725152fb9f ("PCI/hotplug: PowerPC PowerNV PCI hotplug driver") Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191118065553.30362-1-oohall@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-12powerpc/spinlocks: Include correct header for static keyJason A. Donenfeld
commit 6da3eced8c5f3b03340b0c395bacd552c4d52411 upstream. Recently, the spinlock implementation grew a static key optimization, but the jump_label.h header include was left out, leading to build errors: linux/arch/powerpc/include/asm/spinlock.h:44:7: error: implicit declaration of function ‘static_branch_unlikely’ 44 | if (!static_branch_unlikely(&shared_processor)) This commit adds the missing header. mpe: The build break is only seen with CONFIG_JUMP_LABEL=n. Fixes: 656c21d6af5d ("powerpc/shared: Use static key to detect shared processor") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Srikar Dronamraju <srikar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191223133147.129983-1-Jason@zx2c4.com Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-12powerpc/vcpu: Assume dedicated processors as non-preemptSrikar Dronamraju
commit 14c73bd344da60abaf7da3ea2e7733ddda35bbac upstream. With commit 247f2f6f3c70 ("sched/core: Don't schedule threads on pre-empted vCPUs"), the scheduler avoids preempted vCPUs to schedule tasks on wakeup. This leads to wrong choice of CPU, which in-turn leads to larger wakeup latencies. Eventually, it leads to performance regression in latency sensitive benchmarks like soltp, schbench etc. On Powerpc, vcpu_is_preempted() only looks at yield_count. If the yield_count is odd, the vCPU is assumed to be preempted. However yield_count is increased whenever the LPAR enters CEDE state (idle). So any CPU that has entered CEDE state is assumed to be preempted. Even if vCPU of dedicated LPAR is preempted/donated, it should have right of first-use since they are supposed to own the vCPU. On a Power9 System with 32 cores: # lscpu Architecture: ppc64le Byte Order: Little Endian CPU(s): 128 On-line CPU(s) list: 0-127 Thread(s) per core: 8 Core(s) per socket: 1 Socket(s): 16 NUMA node(s): 2 Model: 2.2 (pvr 004e 0202) Model name: POWER9 (architected), altivec supported Hypervisor vendor: pHyp Virtualization type: para L1d cache: 32K L1i cache: 32K L2 cache: 512K L3 cache: 10240K NUMA node0 CPU(s): 0-63 NUMA node1 CPU(s): 64-127 # perf stat -a -r 5 ./schbench v5.4 v5.4 + patch Latency percentiles (usec) Latency percentiles (usec) 50.0000th: 45 50.0th: 45 75.0000th: 62 75.0th: 63 90.0000th: 71 90.0th: 74 95.0000th: 77 95.0th: 78 *99.0000th: 91 *99.0th: 82 99.5000th: 707 99.5th: 83 99.9000th: 6920 99.9th: 86 min=0, max=10048 min=0, max=96 Latency percentiles (usec) Latency percentiles (usec) 50.0000th: 45 50.0th: 46 75.0000th: 61 75.0th: 64 90.0000th: 72 90.0th: 75 95.0000th: 79 95.0th: 79 *99.0000th: 691 *99.0th: 83 99.5000th: 3972 99.5th: 85 99.9000th: 8368 99.9th: 91 min=0, max=16606 min=0, max=117 Latency percentiles (usec) Latency percentiles (usec) 50.0000th: 45 50.0th: 46 75.0000th: 61 75.0th: 64 90.0000th: 71 90.0th: 75 95.0000th: 77 95.0th: 79 *99.0000th: 106 *99.0th: 83 99.5000th: 2364 99.5th: 84 99.9000th: 7480 99.9th: 90 min=0, max=10001 min=0, max=95 Latency percentiles (usec) Latency percentiles (usec) 50.0000th: 45 50.0th: 47 75.0000th: 62 75.0th: 65 90.0000th: 72 90.0th: 75 95.0000th: 78 95.0th: 79 *99.0000th: 93 *99.0th: 84 99.5000th: 108 99.5th: 85 99.9000th: 6792 99.9th: 90 min=0, max=17681 min=0, max=117 Latency percentiles (usec) Latency percentiles (usec) 50.0000th: 46 50.0th: 45 75.0000th: 62 75.0th: 64 90.0000th: 73 90.0th: 75 95.0000th: 79 95.0th: 79 *99.0000th: 113 *99.0th: 82 99.5000th: 2724 99.5th: 83 99.9000th: 6184 99.9th: 93 min=0, max=9887 min=0, max=111 Performance counter stats for 'system wide' (5 runs): context-switches 43,373 ( +- 0.40% ) 44,597 ( +- 0.55% ) cpu-migrations 1,211 ( +- 5.04% ) 220 ( +- 6.23% ) page-faults 15,983 ( +- 5.21% ) 15,360 ( +- 3.38% ) Waiman Long suggested using static_keys. Fixes: 247f2f6f3c70 ("sched/core: Don't schedule threads on pre-empted vCPUs") Cc: stable@vger.kernel.org # v4.18+ Reported-by: Parth Shah <parth@linux.ibm.com> Reported-by: Ihor Pasichnyk <Ihor.Pasichnyk@ibm.com> Tested-by: Juri Lelli <juri.lelli@redhat.com> Acked-by: Waiman Long <longman@redhat.com> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Phil Auld <pauld@redhat.com> Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.ibm.com> Tested-by: Parth Shah <parth@linux.ibm.com> [mpe: Move the key and setting of the key to pseries/setup.c] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191213035036.6913-1-mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-12powerpc: Ensure that swiotlb buffer is allocated from low memoryMike Rapoport
[ Upstream commit 8fabc623238e68b3ac63c0dd1657bf86c1fa33af ] Some powerpc platforms (e.g. 85xx) limit DMA-able memory way below 4G. If a system has more physical memory than this limit, the swiotlb buffer is not addressable because it is allocated from memblock using top-down mode. Force memblock to bottom-up mode before calling swiotlb_init() to ensure that the swiotlb buffer is DMA-able. Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191204123524.22919-1-rppt@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-09powerpc/pmem: Fix kernel crash due to wrong range value usage in ↵Aneesh Kumar K.V
flush_dcache_range commit 6f4679b956741d2da6ad3ebb738cbe1264ac8781 upstream. This patch fix the below kernel crash. BUG: Unable to handle kernel data access on read at 0xc000000380000000 Faulting instruction address: 0xc00000000008b6f0 cpu 0x5: Vector: 300 (Data Access) at [c0000000d8587790] pc: c00000000008b6f0: arch_remove_memory+0x150/0x210 lr: c00000000008b720: arch_remove_memory+0x180/0x210 sp: c0000000d8587a20 msr: 800000000280b033 dar: c000000380000000 dsisr: 40000000 current = 0xc0000000d8558600 paca = 0xc00000000fff8f00 irqmask: 0x03 irq_happened: 0x01 pid = 1220, comm = ndctl enter ? for help memunmap_pages+0x33c/0x410 devm_action_release+0x30/0x50 release_nodes+0x30c/0x3a0 device_release_driver_internal+0x178/0x240 unbind_store+0x74/0x190 drv_attr_store+0x44/0x60 sysfs_kf_write+0x74/0xa0 kernfs_fop_write+0x1b0/0x260 __vfs_write+0x3c/0x70 vfs_write+0xe4/0x200 ksys_write+0x7c/0x140 system_call+0x5c/0x68 Fixes: 076265907cf9 ("powerpc: Chunk calls to flush_dcache_range in arch_*_memory") Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191204052909.59145-1-aneesh.kumar@linux.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-09powerpc/mm: Mark get_slice_psize() & slice_addr_is_low() as notraceMichael Ellerman
commit 91a063c956084fb21cf2523bce6892514e3f1799 upstream. These slice routines are called from the SLB miss handler, which can lead to warnings from the IRQ code, because we have not reconciled the IRQ state properly: WARNING: CPU: 72 PID: 30150 at arch/powerpc/kernel/irq.c:258 arch_local_irq_restore.part.0+0xcc/0x100 Modules linked in: CPU: 72 PID: 30150 Comm: ftracetest Not tainted 5.5.0-rc2-gcc9x-g7e0165b2f1a9 #1 NIP: c00000000001d83c LR: c00000000029ab90 CTR: c00000000026cf90 REGS: c0000007eee3b960 TRAP: 0700 Not tainted (5.5.0-rc2-gcc9x-g7e0165b2f1a9) MSR: 8000000000021033 <SF,ME,IR,DR,RI,LE> CR: 22242844 XER: 20000000 CFAR: c00000000001d780 IRQMASK: 0 ... NIP arch_local_irq_restore.part.0+0xcc/0x100 LR trace_graph_entry+0x270/0x340 Call Trace: trace_graph_entry+0x254/0x340 (unreliable) function_graph_enter+0xe4/0x1a0 prepare_ftrace_return+0xa0/0x130 ftrace_graph_caller+0x44/0x94 # (get_slice_psize()) slb_allocate_user+0x7c/0x100 do_slb_fault+0xf8/0x300 instruction_access_slb_common+0x140/0x180 Fixes: 48e7b7695745 ("powerpc/64s/hash: Convert SLB miss handlers to C") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191221121337.4894-1-mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-09powerpc: Chunk calls to flush_dcache_range in arch_*_memoryAlastair D'Silva
commit 076265907cf9633bbef861c7c2a1c26a8209f283 upstream. When presented with large amounts of memory being hotplugged (in my test case, ~890GB), the call to flush_dcache_range takes a while (~50 seconds), triggering RCU stalls. This patch breaks up the call into 1GB chunks, calling cond_resched() inbetween to allow the scheduler to run. Fixes: fb5924fddf9e ("powerpc/mm: Flush cache on memory hot(un)plug") Signed-off-by: Alastair D'Silva <alastair@d-silva.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191104023305.9581-6-alastair@au1.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-09mm/memory_hotplug: shrink zones when offlining memoryDavid Hildenbrand
commit feee6b2989165631b17ac6d4ccdbf6759254e85a upstream. We currently try to shrink a single zone when removing memory. We use the zone of the first page of the memory we are removing. If that memmap was never initialized (e.g., memory was never onlined), we will read garbage and can trigger kernel BUGs (due to a stale pointer): BUG: unable to handle page fault for address: 000000000000353d #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: 0002 [#1] SMP PTI CPU: 1 PID: 7 Comm: kworker/u8:0 Not tainted 5.3.0-rc5-next-20190820+ #317 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.4 Workqueue: kacpi_hotplug acpi_hotplug_work_fn RIP: 0010:clear_zone_contiguous+0x5/0x10 Code: 48 89 c6 48 89 c3 e8 2a fe ff ff 48 85 c0 75 cf 5b 5d c3 c6 85 fd 05 00 00 01 5b 5d c3 0f 1f 840 RSP: 0018:ffffad2400043c98 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000200000000 RCX: 0000000000000000 RDX: 0000000000200000 RSI: 0000000000140000 RDI: 0000000000002f40 RBP: 0000000140000000 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000140000 R13: 0000000000140000 R14: 0000000000002f40 R15: ffff9e3e7aff3680 FS: 0000000000000000(0000) GS:ffff9e3e7bb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000000353d CR3: 0000000058610000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __remove_pages+0x4b/0x640 arch_remove_memory+0x63/0x8d try_remove_memory+0xdb/0x130 __remove_memory+0xa/0x11 acpi_memory_device_remove+0x70/0x100 acpi_bus_trim+0x55/0x90 acpi_device_hotplug+0x227/0x3a0 acpi_hotplug_work_fn+0x1a/0x30 process_one_work+0x221/0x550 worker_thread+0x50/0x3b0 kthread+0x105/0x140 ret_from_fork+0x3a/0x50 Modules linked in: CR2: 000000000000353d Instead, shrink the zones when offlining memory or when onlining failed. Introduce and use remove_pfn_range_from_zone(() for that. We now properly shrink the zones, even if we have DIMMs whereby - Some memory blocks fall into no zone (never onlined) - Some memory blocks fall into multiple zones (offlined+re-onlined) - Multiple memory blocks that fall into different zones Drop the zone parameter (with a potential dubious value) from __remove_pages() and __remove_section(). Link: http://lkml.kernel.org/r/20191006085646.5768-6-david@redhat.com Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") [visible after d0dc12e86b319] Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: <stable@vger.kernel.org> [5.0+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-04powerpc: Fix __clear_user() with KUAP enabledAndrew Donnellan
commit 61e3acd8c693a14fc69b824cb5b08d02cb90a6e7 upstream. The KUAP implementation adds calls in clear_user() to enable and disable access to userspace memory. However, it doesn't add these to __clear_user(), which is used in the ptrace regset code. As there's only one direct user of __clear_user() (the regset code), and the time taken to set the AMR for KUAP purposes is going to dominate the cost of a quick access_ok(), there's not much point having a separate path. Rename __clear_user() to __arch_clear_user(), and make __clear_user() just call clear_user(). Reported-by: syzbot+f25ecf4b2982d8c7a640@syzkaller-ppc64.appspotmail.com Reported-by: Daniel Axtens <dja@axtens.net> Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Fixes: de78a9c42a79 ("powerpc: Add a framework for Kernel Userspace Access Protection") Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> [mpe: Use __arch_clear_user() for the asm version like arm64 & nds32] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191209132221.15328-1-ajd@linux.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-04Revert "powerpc/vcpu: Assume dedicated processors as non-preempt"Greg Kroah-Hartman
This reverts commit 8332dbe5157a0056d8ab409957dfa89930066d87 which is commit 14c73bd344da60abaf7da3ea2e7733ddda35bbac upstream. It breaks the build. Cc: Guenter Roeck <linux@roeck-us.net> Cc: Parth Shah <parth@linux.ibm.com> Cc: Ihor Pasichnyk <Ihor.Pasichnyk@ibm.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Waiman Long <longman@redhat.com> Cc: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Phil Auld <pauld@redhat.com> Cc: Vaidyanathan Srinivasan <svaidy@linux.ibm.com> Cc: Parth Shah <parth@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-04libfdt: define INT32_MAX and UINT32_MAX in libfdt_env.hMasahiro Yamada
[ Upstream commit a8de1304b7df30e3a14f2a8b9709bb4ff31a0385 ] The DTC v1.5.1 added references to (U)INT32_MAX. This is no problem for user-space programs since <stdint.h> defines (U)INT32_MAX along with (u)int32_t. For the kernel space, libfdt_env.h needs to be adjusted before we pull in the changes. In the kernel, we usually use s/u32 instead of (u)int32_t for the fixed-width types. Accordingly, we already have S/U32_MAX for their max values. So, we should not add (U)INT32_MAX to <linux/limits.h> any more. Instead, add them to the in-kernel libfdt_env.h to compile the latest libfdt. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-04powerpc: Don't add -mabi= flags when building with ClangNathan Chancellor
[ Upstream commit 465bfd9c44dea6b55962b5788a23ac87a467c923 ] When building pseries_defconfig, building vdso32 errors out: error: unknown target ABI 'elfv1' This happens because -m32 in clang changes the target to 32-bit, which does not allow the ABI to be changed. Commit 4dc831aa8813 ("powerpc: Fix compiling a BE kernel with a powerpc64le toolchain") added these flags to fix building big endian kernels with a little endian GCC. Clang doesn't need -mabi because the target triple controls the default value. -mlittle-endian and -mbig-endian manipulate the triple into either powerpc64-* or powerpc64le-*, which properly sets the default ABI. Adding a debug print out in the PPC64TargetInfo constructor after line 383 above shows this: $ echo | ./clang -E --target=powerpc64-linux -mbig-endian -o /dev/null - Default ABI: elfv1 $ echo | ./clang -E --target=powerpc64-linux -mlittle-endian -o /dev/null - Default ABI: elfv2 $ echo | ./clang -E --target=powerpc64le-linux -mbig-endian -o /dev/null - Default ABI: elfv1 $ echo | ./clang -E --target=powerpc64le-linux -mlittle-endian -o /dev/null - Default ABI: elfv2 Don't specify -mabi when building with clang to avoid the build error with -m32 and not change any code generation. -mcall-aixdesc is not an implemented flag in clang so it can be safely excluded as well, see commit 238abecde8ad ("powerpc: Don't use gcc specific options on clang"). pseries_defconfig successfully builds after this patch and powernv_defconfig and ppc44x_defconfig don't regress. Reviewed-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> [mpe: Trim clang links in change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191119045712.39633-2-natechancellor@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-04powerpc/fixmap: Use __fix_to_virt() instead of fix_to_virt()Christophe Leroy
[ Upstream commit 77693a5fb57be4606a6024ec8e3076f9499b906b ] Modify back __set_fixmap() to using __fix_to_virt() instead of fix_to_virt() otherwise the following happens because it seems GCC doesn't see idx as a builtin const. CC mm/early_ioremap.o In file included from ./include/linux/kernel.h:11:0, from mm/early_ioremap.c:11: In function ‘fix_to_virt’, inlined from ‘__set_fixmap’ at ./arch/powerpc/include/asm/fixmap.h:87:2, inlined from ‘__early_ioremap’ at mm/early_ioremap.c:156:4: ./include/linux/compiler.h:350:38: error: call to ‘__compiletime_assert_32’ declared with attribute error: BUILD_BUG_ON failed: idx >= __end_of_fixed_addresses _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__) ^ ./include/linux/compiler.h:331:4: note: in definition of macro ‘__compiletime_assert’ prefix ## suffix(); \ ^ ./include/linux/compiler.h:350:2: note: in expansion of macro ‘_compiletime_assert’ _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__) ^ ./include/linux/build_bug.h:39:37: note: in expansion of macro ‘compiletime_assert’ #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg) ^ ./include/linux/build_bug.h:50:2: note: in expansion of macro ‘BUILD_BUG_ON_MSG’ BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition) ^ ./include/asm-generic/fixmap.h:32:2: note: in expansion of macro ‘BUILD_BUG_ON’ BUILD_BUG_ON(idx >= __end_of_fixed_addresses); ^ Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Fixes: 4cfac2f9c7f1 ("powerpc/mm: Simplify __set_fixmap()") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/f4984c615f90caa3277775a68849afeea846850d.1568295907.git.christophe.leroy@c-s.fr Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-04powerpc/book3s/mm: Update Oops message to print the correct translation in useAneesh Kumar K.V
[ Upstream commit d7e02f7b7991dbe14a2acfb0e53d675cd149001c ] Avoids confusion when printing Oops message like below Faulting instruction address: 0xc00000000008bdb4 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV This was because we never clear the MMU_FTR_HPTE_TABLE feature flag even if we run with radix translation. It was discussed that we should look at this feature flag as an indication of the capability to run hash translation and we should not clear the flag even if we run in radix translation. All the code paths check for radix_enabled() check and if found true consider we are running with radix translation. Follow the same sequence for finding the MMU translation string to be used in Oops message. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Acked-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190711145814.17970-1-aneesh.kumar@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-04powerpc/eeh: differentiate duplicate detection messageSam Bobroff
[ Upstream commit de84ffc3ccbeec3678f95a3d898fc188efa0d9c5 ] Currently when an EEH error is detected, the system log receives the same (or almost the same) message twice: EEH: PHB#0 failure detected, location: N/A EEH: PHB#0 failure detected, location: N/A or EEH: eeh_dev_check_failure: Frozen PHB#0-PE#0 detected EEH: Frozen PHB#0-PE#0 detected This looks like a bug, but in fact the messages are from different functions and mean slightly different things. So keep both but change one of the messages slightly, so that it's clear they are different: EEH: PHB#0 failure detected, location: N/A EEH: Recovering PHB#0, location: N/A or EEH: eeh_dev_check_failure: Frozen PHB#0-PE#0 detected EEH: Recovering PHB#0-PE#0 Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/43817cb6e6631b0828b9a6e266f60d1f8ca8eb22.1571288375.git.sbobroff@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-04powerpc/security: Fix wrong message when RFI Flush is disableGustavo L. F. Walbon
[ Upstream commit 4e706af3cd8e1d0503c25332b30cad33c97ed442 ] The issue was showing "Mitigation" message via sysfs whatever the state of "RFI Flush", but it should show "Vulnerable" when it is disabled. If you have "L1D private" feature enabled and not "RFI Flush" you are vulnerable to meltdown attacks. "RFI Flush" is the key feature to mitigate the meltdown whatever the "L1D private" state. SEC_FTR_L1D_THREAD_PRIV is a feature for Power9 only. So the message should be as the truth table shows: CPU | L1D private | RFI Flush | sysfs ----|-------------|-----------|------------------------------------- P9 | False | False | Vulnerable P9 | False | True | Mitigation: RFI Flush P9 | True | False | Vulnerable: L1D private per thread P9 | True | True | Mitigation: RFI Flush, L1D private per thread P8 | False | False | Vulnerable P8 | False | True | Mitigation: RFI Flush Output before this fix: # cat /sys/devices/system/cpu/vulnerabilities/meltdown Mitigation: RFI Flush, L1D private per thread # echo 0 > /sys/kernel/debug/powerpc/rfi_flush # cat /sys/devices/system/cpu/vulnerabilities/meltdown Mitigation: L1D private per thread Output after fix: # cat /sys/devices/system/cpu/vulnerabilities/meltdown Mitigation: RFI Flush, L1D private per thread # echo 0 > /sys/kernel/debug/powerpc/rfi_flush # cat /sys/devices/system/cpu/vulnerabilities/meltdown Vulnerable: L1D private per thread Signed-off-by: Gustavo L. F. Walbon <gwalbon@linux.ibm.com> Signed-off-by: Mauro S. M. Rodrigues <maurosr@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190502210907.42375-1-gwalbon@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-04powerpc/pseries/cmm: Implement release() function for sysfs deviceDavid Hildenbrand
[ Upstream commit 7d8212747435c534c8d564fbef4541a463c976ff ] When unloading the module, one gets ------------[ cut here ]------------ Device 'cmm0' does not have a release() function, it is broken and must be fixed. See Documentation/kobject.txt. WARNING: CPU: 0 PID: 19308 at drivers/base/core.c:1244 .device_release+0xcc/0xf0 ... We only have one static fake device. There is nothing to do when releasing the device (via cmm_exit()). Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191031142933.10779-2-david@redhat.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-04powerpc/book3s64/hash: Add cond_resched to avoid soft lockup warningAneesh Kumar K.V
[ Upstream commit 16f6b67cf03cb43db7104acb2ca877bdc2606c92 ] With large memory (8TB and more) hotplug, we can get soft lockup warnings as below. These were caused by a long loop without any explicit cond_resched which is a problem for !PREEMPT kernels. Avoid this using cond_resched() while inserting hash page table entries. We already do similar cond_resched() in __add_pages(), see commit f64ac5e6e306 ("mm, memory_hotplug: add scheduling point to __add_pages"). rcu: 3-....: (24002 ticks this GP) idle=13e/1/0x4000000000000002 softirq=722/722 fqs=12001 (t=24003 jiffies g=4285 q=2002) NMI backtrace for cpu 3 CPU: 3 PID: 3870 Comm: ndctl Not tainted 5.3.0-197.18-default+ #2 Call Trace: dump_stack+0xb0/0xf4 (unreliable) nmi_cpu_backtrace+0x124/0x130 nmi_trigger_cpumask_backtrace+0x1ac/0x1f0 arch_trigger_cpumask_backtrace+0x28/0x3c rcu_dump_cpu_stacks+0xf8/0x154 rcu_sched_clock_irq+0x878/0xb40 update_process_times+0x48/0x90 tick_sched_handle.isra.16+0x4c/0x80 tick_sched_timer+0x68/0xe0 __hrtimer_run_queues+0x180/0x430 hrtimer_interrupt+0x110/0x300 timer_interrupt+0x108/0x2f0 decrementer_common+0x114/0x120 --- interrupt: 901 at arch_add_memory+0xc0/0x130 LR = arch_add_memory+0x74/0x130 memremap_pages+0x494/0x650 devm_memremap_pages+0x3c/0xa0 pmem_attach_disk+0x188/0x750 nvdimm_bus_probe+0xac/0x2c0 really_probe+0x148/0x570 driver_probe_device+0x19c/0x1d0 device_driver_attach+0xcc/0x100 bind_store+0x134/0x1c0 drv_attr_store+0x44/0x60 sysfs_kf_write+0x64/0x90 kernfs_fop_write+0x1a0/0x270 __vfs_write+0x3c/0x70 vfs_write+0xd0/0x260 ksys_write+0xdc/0x130 system_call+0x5c/0x68 Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191001084656.31277-1-aneesh.kumar@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-04powerpc/security/book3s64: Report L1TF status in sysfsAnthony Steinhauser
[ Upstream commit 8e6b6da91ac9b9ec5a925b6cb13f287a54bd547d ] Some PowerPC CPUs are vulnerable to L1TF to the same extent as to Meltdown. It is also mitigated by flushing the L1D on privilege transition. Currently the sysfs gives a false negative on L1TF on CPUs that I verified to be vulnerable, a Power9 Talos II Boston 004e 1202, PowerNV T2P9D01. Signed-off-by: Anthony Steinhauser <asteinhauser@google.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> [mpe: Just have cpu_show_l1tf() call cpu_show_meltdown() directly] Link: https://lore.kernel.org/r/20191029190759.84821-1-asteinhauser@google.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-04powerpc/tools: Don't quote $objdump in scriptsMichael Ellerman
[ Upstream commit e44ff9ea8f4c8a90c82f7b85bd4f5e497c841960 ] Some of our scripts are passed $objdump and then call it as "$objdump". This doesn't work if it contains spaces because we're using ccache, for example you get errors such as: ./arch/powerpc/tools/relocs_check.sh: line 48: ccache ppc64le-objdump: No such file or directory ./arch/powerpc/tools/unrel_branch_check.sh: line 26: ccache ppc64le-objdump: No such file or directory Fix it by not quoting the string when we expand it, allowing the shell to do the right thing for us. Fixes: a71aa05e1416 ("powerpc: Convert relocs_check to a shell script using grep") Fixes: 4ea80652dc75 ("powerpc/64s: Tool to flag direct branches from unrelocated interrupt vectors") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191024004730.32135-1-mpe@ellerman.id.au Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-04powerpc/pseries: Don't fail hash page table insert for bolted mappingAneesh Kumar K.V
[ Upstream commit 75838a3290cd4ebbd1f567f310ba04b6ef017ce4 ] If the hypervisor returned H_PTEG_FULL for H_ENTER hcall, retry a hash page table insert by removing a random entry from the group. After some runtime, it is very well possible to find all the 8 hash page table entry slot in the hpte group used for mapping. Don't fail a bolted entry insert in that case. With Storage class memory a user can find this error easily since a namespace enable/disable is equivalent to memory add/remove. This results in failures as reported below: $ ndctl create-namespace -r region1 -t pmem -m devdax -a 65536 -s 100M libndctl: ndctl_dax_enable: dax1.3: failed to enable Error: namespace1.2: failed to enable failed to create namespace: No such device or address In kernel log we find the details as below: Unable to create mapping for hot added memory 0xc000042006000000..0xc00004200d000000: -1 dax_pmem: probe of dax1.3 failed with error -14 This indicates that we failed to create a bolted hash table entry for direct-map address backing the namespace. We also observe failures such that not all namespaces will be enabled with ndctl enable-namespace all command. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191024093542.29777-2-aneesh.kumar@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-04powerpc/pseries: Mark accumulate_stolen_time() as notraceMichael Ellerman
[ Upstream commit eb8e20f89093b64f48975c74ccb114e6775cee22 ] accumulate_stolen_time() is called prior to interrupt state being reconciled, which can trip the warning in arch_local_irq_restore(): WARNING: CPU: 5 PID: 1017 at arch/powerpc/kernel/irq.c:258 .arch_local_irq_restore+0x9c/0x130 ... NIP .arch_local_irq_restore+0x9c/0x130 LR .rb_start_commit+0x38/0x80 Call Trace: .ring_buffer_lock_reserve+0xe4/0x620 .trace_function+0x44/0x210 .function_trace_call+0x148/0x170 .ftrace_ops_no_ops+0x180/0x1d0 ftrace_call+0x4/0x8 .accumulate_stolen_time+0x1c/0xb0 decrementer_common+0x124/0x160 For now just mark it as notrace. We may change the ordering to call it after interrupt state has been reconciled, but that is a larger change. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191024055932.27940-1-mpe@ellerman.id.au Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-04powerpc/papr_scm: Fix an off-by-one check in papr_scm_meta_{get, set}Vaibhav Jain
[ Upstream commit 612ee81b9461475b5a5612c2e8d71559dd3c7920 ] A validation check to prevent out of bounds read/write inside functions papr_scm_meta_{get,set}() is off-by-one that prevent reads and writes to the last byte of the label area. This bug manifests as a failure to probe a dimm when libnvdimm is unable to read the entire config-area as advertised by ND_CMD_GET_CONFIG_SIZE. This usually happens when there are large number of namespaces created in the region backed by the dimm and the label-index spans max possible config-area. An error of the form below usually reported in the kernel logs: [ 255.293912] nvdimm: probe of nmem0 failed with error -22 The patch fixes these validation checks there by letting libnvdimm access the entire config-area. Fixes: 53e80bd042773('powerpc/nvdimm: Add support for multibyte read/write for metadata') Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190927062002.3169-1-vaibhav@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-31powerpc/irq: fix stack overflow verificationChristophe Leroy
commit 099bc4812f09155da77eeb960a983470249c9ce1 upstream. Before commit 0366a1c70b89 ("powerpc/irq: Run softirqs off the top of the irq stack"), check_stack_overflow() was called by do_IRQ(), before switching to the irq stack. In that commit, do_IRQ() was renamed __do_irq(), and is now executing on the irq stack, so check_stack_overflow() has just become almost useless. Move check_stack_overflow() call in do_IRQ() to do the check while still on the current stack. Fixes: 0366a1c70b89 ("powerpc/irq: Run softirqs off the top of the irq stack") Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/e033aa8116ab12b7ca9a9c75189ad0741e3b9b5f.1575872340.git.christophe.leroy@c-s.fr Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-31powerpc/vcpu: Assume dedicated processors as non-preemptSrikar Dronamraju
commit 14c73bd344da60abaf7da3ea2e7733ddda35bbac upstream. With commit 247f2f6f3c70 ("sched/core: Don't schedule threads on pre-empted vCPUs"), the scheduler avoids preempted vCPUs to schedule tasks on wakeup. This leads to wrong choice of CPU, which in-turn leads to larger wakeup latencies. Eventually, it leads to performance regression in latency sensitive benchmarks like soltp, schbench etc. On Powerpc, vcpu_is_preempted() only looks at yield_count. If the yield_count is odd, the vCPU is assumed to be preempted. However yield_count is increased whenever the LPAR enters CEDE state (idle). So any CPU that has entered CEDE state is assumed to be preempted. Even if vCPU of dedicated LPAR is preempted/donated, it should have right of first-use since they are supposed to own the vCPU. On a Power9 System with 32 cores: # lscpu Architecture: ppc64le Byte Order: Little Endian CPU(s): 128 On-line CPU(s) list: 0-127 Thread(s) per core: 8 Core(s) per socket: 1 Socket(s): 16 NUMA node(s): 2 Model: 2.2 (pvr 004e 0202) Model name: POWER9 (architected), altivec supported Hypervisor vendor: pHyp Virtualization type: para L1d cache: 32K L1i cache: 32K L2 cache: 512K L3 cache: 10240K NUMA node0 CPU(s): 0-63 NUMA node1 CPU(s): 64-127 # perf stat -a -r 5 ./schbench v5.4 v5.4 + patch Latency percentiles (usec) Latency percentiles (usec) 50.0000th: 45 50.0th: 45 75.0000th: 62 75.0th: 63 90.0000th: 71 90.0th: 74 95.0000th: 77 95.0th: 78 *99.0000th: 91 *99.0th: 82 99.5000th: 707 99.5th: 83 99.9000th: 6920 99.9th: 86 min=0, max=10048 min=0, max=96 Latency percentiles (usec) Latency percentiles (usec) 50.0000th: 45 50.0th: 46 75.0000th: 61 75.0th: 64 90.0000th: 72 90.0th: 75 95.0000th: 79 95.0th: 79 *99.0000th: 691 *99.0th: 83 99.5000th: 3972 99.5th: 85 99.9000th: 8368 99.9th: 91 min=0, max=16606 min=0, max=117 Latency percentiles (usec) Latency percentiles (usec) 50.0000th: 45 50.0th: 46 75.0000th: 61 75.0th: 64 90.0000th: 71 90.0th: 75 95.0000th: 77 95.0th: 79 *99.0000th: 106 *99.0th: 83 99.5000th: 2364 99.5th: 84 99.9000th: 7480 99.9th: 90 min=0, max=10001 min=0, max=95 Latency percentiles (usec) Latency percentiles (usec) 50.0000th: 45 50.0th: 47 75.0000th: 62 75.0th: 65 90.0000th: 72 90.0th: 75 95.0000th: 78 95.0th: 79 *99.0000th: 93 *99.0th: 84 99.5000th: 108 99.5th: 85 99.9000th: 6792 99.9th: 90 min=0, max=17681 min=0, max=117 Latency percentiles (usec) Latency percentiles (usec) 50.0000th: 46 50.0th: 45 75.0000th: 62 75.0th: 64 90.0000th: 73 90.0th: 75 95.0000th: 79 95.0th: 79 *99.0000th: 113 *99.0th: 82 99.5000th: 2724 99.5th: 83 99.9000th: 6184 99.9th: 93 min=0, max=9887 min=0, max=111 Performance counter stats for 'system wide' (5 runs): context-switches 43,373 ( +- 0.40% ) 44,597 ( +- 0.55% ) cpu-migrations 1,211 ( +- 5.04% ) 220 ( +- 6.23% ) page-faults 15,983 ( +- 5.21% ) 15,360 ( +- 3.38% ) Waiman Long suggested using static_keys. Fixes: 247f2f6f3c70 ("sched/core: Don't schedule threads on pre-empted vCPUs") Cc: stable@vger.kernel.org # v4.18+ Reported-by: Parth Shah <parth@linux.ibm.com> Reported-by: Ihor Pasichnyk <Ihor.Pasichnyk@ibm.com> Tested-by: Juri Lelli <juri.lelli@redhat.com> Acked-by: Waiman Long <longman@redhat.com> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Phil Auld <pauld@redhat.com> Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.ibm.com> Tested-by: Parth Shah <parth@linux.ibm.com> [mpe: Move the key and setting of the key to pseries/setup.c] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191213035036.6913-1-mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-31KVM: PPC: Book3S HV: Fix regression on big endian hostsMarcus Comstedt
commit 228b607d8ea1b7d4561945058d5692709099d432 upstream. VCPU_CR is the offset of arch.regs.ccr in kvm_vcpu. arch/powerpc/include/asm/kvm_host.h defines arch.regs as a struct pt_regs, and arch/powerpc/include/asm/ptrace.h defines the ccr field of pt_regs as "unsigned long ccr". Since unsigned long is 64 bits, a 64-bit load needs to be used to load it, unless an endianness specific correction offset is added to access the desired subpart. In this case there is no reason to _not_ use a 64 bit load though. Fixes: 6c85b7bc637b ("powerpc/kvm: Use UV_RETURN ucall to return to ultravisor") Cc: stable@vger.kernel.org # v5.4+ Signed-off-by: Marcus Comstedt <marcus@mc.pp.se> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191215094900.46740-1-marcus@mc.pp.se Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-17powerpc: Define arch_is_kernel_initmem_freed() for lockdepMichael Ellerman
commit 6f07048c00fd100ed8cab66c225c157e0b6c0a50 upstream. Under certain circumstances, we hit a warning in lockdep_register_key: if (WARN_ON_ONCE(static_obj(key))) return; This occurs when the key falls into initmem that has since been freed and can now be reused. This has been observed on boot, and under memory pressure. Define arch_is_kernel_initmem_freed(), which allows lockdep to correctly identify this memory as dynamic. This fixes a bug picked up by the powerpc64 syzkaller instance where we hit the WARN via alloc_netdev_mqs. Reported-by: Qian Cai <cai@lca.pw> Reported-by: ppc syzbot c/o Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Daniel Axtens <dja@axtens.net> Link: https://lore.kernel.org/r/87lfs4f7d6.fsf@dja-thinkpad.axtens.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-17powerpc: Fix vDSO clock_getres()Vincenzo Frascino
[ Upstream commit 552263456215ada7ee8700ce022d12b0cffe4802 ] clock_getres in the vDSO library has to preserve the same behaviour of posix_get_hrtimer_res(). In particular, posix_get_hrtimer_res() does: sec = 0; ns = hrtimer_resolution; and hrtimer_resolution depends on the enablement of the high resolution timers that can happen either at compile or at run time. Fix the powerpc vdso implementation of clock_getres keeping a copy of hrtimer_resolution in vdso data and using that directly. Fixes: a7f290dad32e ("[PATCH] powerpc: Merge vdso's and add vdso support to 32 bits kernel") Cc: stable@vger.kernel.org Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr> Acked-by: Shuah Khan <skhan@linuxfoundation.org> [chleroy: changed CLOCK_REALTIME_RES to CLOCK_HRTIMER_RES] Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/a55eca3a5e85233838c2349783bcb5164dae1d09.1575273217.git.christophe.leroy@c-s.fr Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-17powerpc: Avoid clang warnings around setjmp and longjmpNathan Chancellor
[ Upstream commit c9029ef9c95765e7b63c4d9aa780674447db1ec0 ] Commit aea447141c7e ("powerpc: Disable -Wbuiltin-requires-header when setjmp is used") disabled -Wbuiltin-requires-header because of a warning about the setjmp and longjmp declarations. r367387 in clang added another diagnostic around this, complaining that there is no jmp_buf declaration. In file included from ../arch/powerpc/xmon/xmon.c:47: ../arch/powerpc/include/asm/setjmp.h:10:13: error: declaration of built-in function 'setjmp' requires the declaration of the 'jmp_buf' type, commonly provided in the header <setjmp.h>. [-Werror,-Wincomplete-setjmp-declaration] extern long setjmp(long *); ^ ../arch/powerpc/include/asm/setjmp.h:11:13: error: declaration of built-in function 'longjmp' requires the declaration of the 'jmp_buf' type, commonly provided in the header <setjmp.h>. [-Werror,-Wincomplete-setjmp-declaration] extern void longjmp(long *, long); ^ 2 errors generated. We are not using the standard library's longjmp/setjmp implementations for obvious reasons; make this clear to clang by using -ffreestanding on these files. Cc: stable@vger.kernel.org # 4.14+ Suggested-by: Segher Boessenkool <segher@kernel.crashing.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191119045712.39633-3-natechancellor@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-17powerpc/xive: Skip ioremap() of ESB pages for LSI interruptsCédric Le Goater
commit b67a95f2abff0c34e5667c15ab8900de73d8d087 upstream. The PCI INTx interrupts and other LSI interrupts are handled differently under a sPAPR platform. When the interrupt source characteristics are queried, the hypervisor returns an H_INT_ESB flag to inform the OS that it should be using the H_INT_ESB hcall for interrupt management and not loads and stores on the interrupt ESB pages. A default -1 value is returned for the addresses of the ESB pages. The driver ignores this condition today and performs a bogus IO mapping. Recent changes and the DEBUG_VM configuration option make the bug visible with : kernel BUG at arch/powerpc/include/asm/book3s/64/pgtable.h:612! Oops: Exception in kernel mode, sig: 5 [#1] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=1024 NUMA pSeries Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.4.0-0.rc6.git0.1.fc32.ppc64le #1 NIP: c000000000f63294 LR: c000000000f62e44 CTR: 0000000000000000 REGS: c0000000fa45f0d0 TRAP: 0700 Not tainted (5.4.0-0.rc6.git0.1.fc32.ppc64le) ... NIP ioremap_page_range+0x4c4/0x6e0 LR ioremap_page_range+0x74/0x6e0 Call Trace: ioremap_page_range+0x74/0x6e0 (unreliable) do_ioremap+0x8c/0x120 __ioremap_caller+0x128/0x140 ioremap+0x30/0x50 xive_spapr_populate_irq_data+0x170/0x260 xive_irq_domain_map+0x8c/0x170 irq_domain_associate+0xb4/0x2d0 irq_create_mapping+0x1e0/0x3b0 irq_create_fwspec_mapping+0x27c/0x3e0 irq_create_of_mapping+0x98/0xb0 of_irq_parse_and_map_pci+0x168/0x230 pcibios_setup_device+0x88/0x250 pcibios_setup_bus_devices+0x54/0x100 __of_scan_bus+0x160/0x310 pcibios_scan_phb+0x330/0x390 pcibios_init+0x8c/0x128 do_one_initcall+0x60/0x2c0 kernel_init_freeable+0x290/0x378 kernel_init+0x2c/0x148 ret_from_kernel_thread+0x5c/0x80 Fixes: bed81ee181dd ("powerpc/xive: introduce H_INT_ESB hcall") Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Cédric Le Goater <clg@kaod.org> Tested-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191203163642.2428-1-clg@kaod.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>