aboutsummaryrefslogtreecommitdiffstats
path: root/arch/powerpc/mm/book3s64/pgtable.c
AgeCommit message (Collapse)Author
2020-05-28powerpc/64s/radix: Don't prefetch DAR in update_mmu_cacheNicholas Piggin
The idea behind this prefetch was to kick off a page table walk before returning from the fault, getting some pipelining advantage. But this never showed up any noticable performance advantage, and in fact with KUAP the prefetches are actually blocked and cause some kind of micro-architectural fault. Removing this improves page fault microbenchmark performance by about 9%. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Keep the early return in update_mmu_cache()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200504122907.49304-1-npiggin@gmail.com
2020-05-05powerpc/mm/book3s64: Fix MADV_DONTNEED and parallel page fault raceAneesh Kumar K.V
MADV_DONTNEED holds mmap_sem in read mode and that implies a parallel page fault is possible and the kernel can end up with a level 1 PTE entry (THP entry) converted to a level 0 PTE entry without flushing the THP TLB entry. Most architectures including POWER have issues with kernel instantiating a level 0 PTE entry while holding level 1 TLB entries. The code sequence I am looking at is down_read(mmap_sem) down_read(mmap_sem) zap_pmd_range() zap_huge_pmd() pmd lock held pmd_cleared table details added to mmu_gather pmd_unlock() insert a level 0 PTE entry() tlb_finish_mmu(). Fix this by forcing a tlb flush before releasing pmd lock if this is not a fullmm invalidate. We can safely skip this invalidate for task exit case (fullmm invalidate) because in that case we are sure there can be no parallel fault handlers. This do change the Qemu guest RAM del/unplug time as below 128 core, 496GB guest: Without patch: munmap start: timer = 196449 ms, PID=6681 munmap finish: timer = 196488 ms, PID=6681 - delta = 39ms With patch: munmap start: timer = 196345 ms, PID=6879 munmap finish: timer = 196714 ms, PID=6879 - delta = 369ms Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200505071729.54912-23-aneesh.kumar@linux.ibm.com
2020-05-05powerpc/mm/book3s64: Avoid sending IPI on clearing PMDAneesh Kumar K.V
Now that all the lockless page table walk is careful w.r.t the PTE address returned, we can now revert commit: 13bd817bb884 ("powerpc/thp: Serialize pmd clear against a linux page table walk.") We also drop the equivalent IPI from other pte updates routines. We still keep IPI in hash pmdp collapse and that is to take care of parallel hash page table insert. The radix pmdp collapse flush can possibly be removed once I am sure generic code doesn't have the any expectations around parallel gup walk. This speeds up Qemu guest RAM del/unplug time as below 128 core, 496GB guest: Without patch: munmap start: timer = 13162 ms, PID=7684 munmap finish: timer = 95312 ms, PID=7684 - delta = 82150 ms With patch: munmap start: timer = 196449 ms, PID=6681 munmap finish: timer = 196488 ms, PID=6681 - delta = 39ms Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200505071729.54912-21-aneesh.kumar@linux.ibm.com
2020-04-10powerpc/mm: thread pgprot_t through create_section_mapping()Logan Gunthorpe
In prepartion to support a pgprot_t argument for arch_add_memory(). Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Eric Badger <ebadger@gigaio.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200306170846.9333-6-logang@deltatee.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-04powerpc/mmu_gather: enable RCU_TABLE_FREE even for !SMP caseAneesh Kumar K.V
Patch series "Fixup page directory freeing", v4. This is a repost of patch series from Peter with the arch specific changes except ppc64 dropped. ppc64 changes are added here because we are redoing the patch series on top of ppc64 changes. This makes it easy to backport these changes. Only the first 2 patches need to be backported to stable. The thing is, on anything SMP, freeing page directories should observe the exact same order as normal page freeing: 1) unhook page/directory 2) TLB invalidate 3) free page/directory Without this, any concurrent page-table walk could end up with a Use-after-Free. This is esp. trivial for anything that has software page-table walkers (HAVE_FAST_GUP / software TLB fill) or the hardware caches partial page-walks (ie. caches page directories). Even on UP this might give issues since mmu_gather is preemptible these days. An interrupt or preempted task accessing user pages might stumble into the free page if the hardware caches page directories. This patch series fixes ppc64 and add generic MMU_GATHER changes to support the conversion of other architectures. I haven't added patches w.r.t other architecture because they are yet to be acked. This patch (of 9): A followup patch is going to make sure we correctly invalidate page walk cache before we free page table pages. In order to keep things simple enable RCU_TABLE_FREE even for !SMP so that we don't have to fixup the !SMP case differently in the followup patch !SMP case is right now broken for radix translation w.r.t page walk cache flush. We can get interrupted in between page table free and that would imply we have page walk cache entries pointing to tables which got freed already. Michael said "both our platforms that run on Power9 force SMP on in Kconfig, so the !SMP case is unlikely to be a problem for anyone in practice, unless they've hacked their kernel to build it !SMP." Link: http://lkml.kernel.org/r/20200116064531.483522-2-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-05powerpc/64s/radix: introduce options to disable use of the tlbie instructionNicholas Piggin
Introduce two options to control the use of the tlbie instruction. A boot time option which completely disables the kernel using the instruction, this is currently incompatible with HASH MMU, KVM, and coherent accelerators. And a debugfs option can be switched at runtime and avoids using tlbie for invalidating CPU TLBs for normal process and kernel address mappings. Coherent accelerators are still managed with tlbie, as will KVM partition scope translations. Cross-CPU TLB flushing is implemented with IPIs and tlbiel. This is a basic implementation which does not attempt to make any optimisation beyond the tlbie implementation. This is useful for performance testing among other things. For example in certain situations on large systems, using IPIs may be faster than tlbie as they can be directed rather than broadcast. Later we may also take advantage of the IPIs to do more interesting things such as trim the mm cpumask more aggressively. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190902152931.17840-7-npiggin@gmail.com
2019-09-05powerpc/64s: remove unnecessary translation cache flushes at bootNicholas Piggin
The various translation structure invalidations performed in early boot when the MMU is off are not required, because everything is invalidated immediately before a CPU first enables its MMU (see early_init_mmu and early_init_mmu_secondary). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190902152931.17840-6-npiggin@gmail.com
2019-09-05powerpc/64s: make mmu_partition_table_set_entry TLB flush optionalNicholas Piggin
No functional change. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190902152931.17840-4-npiggin@gmail.com
2019-09-05powerpc/64s/radix: tidy up TLB flushing codeNicholas Piggin
There should be no functional changes. - Use calls to existing radix_tlb.c functions in flush_partition. - Rename radix__flush_tlb_lpid to radix__flush_all_lpid and similar, because they flush everything, matching flush_all_mm rather than flush_tlb_mm for the lpid. - Remove some unused radix_tlb.c flush primitives. Signed-off: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190902152931.17840-3-npiggin@gmail.com
2019-09-05powerpc/64s: remove register_process_table callbackNicholas Piggin
This callback is only required because the partition table init comes before process table allocation on powernv (aka bare metal aka native). Change the order to allocate the process table first, and remove the callback. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190902152931.17840-2-npiggin@gmail.com
2019-08-30Merge branch 'topic/ppc-kvm' into nextMichael Ellerman
Merge our ppc-kvm topic branch to bring in the Ultravisor support patches.
2019-08-30powerpc/mm: Write to PTCR only if ultravisor disabledClaudio Carvalho
In ultravisor enabled systems, PTCR becomes ultravisor privileged only for writing and an attempt to write to it will cause a Hypervisor Emulation Assitance interrupt. This patch uses the set_ptcr_when_no_uv() function to restrict PTCR writing to only when ultravisor is disabled. Signed-off-by: Claudio Carvalho <cclaudio@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190822034838.27876-6-cclaudio@linux.ibm.com
2019-08-30powerpc/mm: Use UV_WRITE_PATE ucall to register a PATEMichael Anderson
When Ultravisor (UV) is enabled, the partition table is stored in secure memory and can only be accessed via the UV. The Hypervisor (HV) however maintains a copy of the partition table in normal memory to allow Nest MMU translations to occur (for normal VMs). The HV copy includes partition table entries (PATE)s for secure VMs which would currently be unused (Nest MMU translations cannot access secure memory) but they would be needed as we add functionality. This patch adds the UV_WRITE_PATE ucall which is used to update the PATE for a VM (both normal and secure) when Ultravisor is enabled. Signed-off-by: Michael Anderson <andmike@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Ram Pai <linuxram@us.ibm.com> [ cclaudio: Write the PATE in HV's table before doing that in UV's ] Signed-off-by: Claudio Carvalho <cclaudio@linux.ibm.com> Reviewed-by: Ryan Grimm <grimm@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190822034838.27876-5-cclaudio@linux.ibm.com
2019-08-27powerpc/mm: refactor ioremap_range() and use ioremap_page_range()Christophe Leroy
book3s64's ioremap_range() is almost same as fallback ioremap_range(), except that it calls radix__ioremap_range() when radix is enabled. radix__ioremap_range() is also very similar to the other ones, expect that it calls ioremap_page_range when slab is available. PPC32 __ioremap_caller() have a loop doing the same thing as ioremap_range() so use it on PPC32 as well. Lets keep only one version of ioremap_range() which calls ioremap_page_range() on all platforms when slab is available. At the same time, drop the nid parameter which is not used. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/4b1dca7096b01823b101be7338983578641547f1.1566309263.git.christophe.leroy@c-s.fr
2019-07-13Merge tag 'powerpc-5.3-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "Notable changes: - Removal of the NPU DMA code, used by the out-of-tree Nvidia driver, as well as some other functions only used by drivers that haven't (yet?) made it upstream. - A fix for a bug in our handling of hardware watchpoints (eg. perf record -e mem: ...) which could lead to register corruption and kernel crashes. - Enable HAVE_ARCH_HUGE_VMAP, which allows us to use large pages for vmalloc when using the Radix MMU. - A large but incremental rewrite of our exception handling code to use gas macros rather than multiple levels of nested CPP macros. And the usual small fixes, cleanups and improvements. Thanks to: Alastair D'Silva, Alexey Kardashevskiy, Andreas Schwab, Aneesh Kumar K.V, Anju T Sudhakar, Anton Blanchard, Arnd Bergmann, Athira Rajeev, Cédric Le Goater, Christian Lamparter, Christophe Leroy, Christophe Lombard, Christoph Hellwig, Daniel Axtens, Denis Efremov, Enrico Weigelt, Frederic Barrat, Gautham R. Shenoy, Geert Uytterhoeven, Geliang Tang, Gen Zhang, Greg Kroah-Hartman, Greg Kurz, Gustavo Romero, Krzysztof Kozlowski, Madhavan Srinivasan, Masahiro Yamada, Mathieu Malaterre, Michael Neuling, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Nishad Kamdar, Oliver O'Halloran, Qian Cai, Ravi Bangoria, Sachin Sant, Sam Bobroff, Satheesh Rajendran, Segher Boessenkool, Shaokun Zhang, Shawn Anastasio, Stewart Smith, Suraj Jitindar Singh, Thiago Jung Bauermann, YueHaibing" * tag 'powerpc-5.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (163 commits) powerpc/powernv/idle: Fix restore of SPRN_LDBAR for POWER9 stop state. powerpc/eeh: Handle hugepages in ioremap space ocxl: Update for AFU descriptor template version 1.1 powerpc/boot: pass CONFIG options in a simpler and more robust way powerpc/boot: add {get, put}_unaligned_be32 to xz_config.h powerpc/irq: Don't WARN continuously in arch_local_irq_restore() powerpc/module64: Use symbolic instructions names. powerpc/module32: Use symbolic instructions names. powerpc: Move PPC_HA() PPC_HI() and PPC_LO() to ppc-opcode.h powerpc/module64: Fix comment in R_PPC64_ENTRY handling powerpc/boot: Add lzo support for uImage powerpc/boot: Add lzma support for uImage powerpc/boot: don't force gzipped uImage powerpc/8xx: Add microcode patch to move SMC parameter RAM. powerpc/8xx: Use IO accessors in microcode programming. powerpc/8xx: replace #ifdefs by IS_ENABLED() in microcode.c powerpc/8xx: refactor programming of microcode CPM params. powerpc/8xx: refactor printing of microcode patch name. powerpc/8xx: Refactor microcode write powerpc/8xx: refactor writing of CPM microcode arrays ...
2019-07-05powerpc/mm: pmd_devmap implies pmd_large().Aneesh Kumar K.V
large devmap usage is dependent on THP. Hence once check is sufficient. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-01Merge branch 'fixes' into nextMichael Ellerman
Merge our fixes branch into next, this brings in a number of commits that fix bugs we don't want to hit in next, in particular the fix for CVE-2019-12817.
2019-06-19powerpc/64s/radix: ioremap use ioremap_page_rangeNicholas Piggin
Radix can use ioremap_page_range for ioremap, after slab is available. This makes it possible to enable huge ioremap mapping support. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-06-15Merge tag 'powerpc-5.2-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "One fix for a regression introduced by our 32-bit KASAN support, which broke booting on machines with "bootx" early debugging enabled. A fix for a bug which broke kexec on 32-bit, introduced by changes to the 32-bit STRICT_KERNEL_RWX support in v5.1. Finally two fixes going to stable for our THP split/collapse handling, discovered by Nick. The first fixes random crashes and/or corruption in guests under sufficient load. Thanks to: Nicholas Piggin, Christophe Leroy, Aaro Koskinen, Mathieu Malaterre" * tag 'powerpc-5.2-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/32s: fix booting with CONFIG_PPC_EARLY_DEBUG_BOOTX powerpc/64s: __find_linux_pte() synchronization vs pmdp_invalidate() powerpc/64s: Fix THP PMD collapse serialisation powerpc: Fix kexec failure on book3s/32
2019-06-07powerpc/64s: Fix THP PMD collapse serialisationNicholas Piggin
Commit 1b2443a547f9 ("powerpc/book3s64: Avoid multiple endian conversion in pte helpers") changed the actual bitwise tests in pte_access_permitted by using pte_write() and pte_present() helpers rather than raw bitwise testing _PAGE_WRITE and _PAGE_PRESENT bits. The pte_present() change now returns true for PTEs which are !_PAGE_PRESENT and _PAGE_INVALID, which is the combination used by pmdp_invalidate() to synchronize access from lock-free lookups. pte_access_permitted() is used by pmd_access_permitted(), so allowing GUP lock free access to proceed with such PTEs breaks this synchronisation. This bug has been observed on a host using the hash page table MMU, with random crashes and corruption in guests, usually together with bad PMD messages in the host. Fix this by adding an explicit check in pmd_access_permitted(), and documenting the condition explicitly. The pte_write() change should be okay, and would prevent GUP from falling back to the slow path when encountering savedwrite PTEs, which matches what x86 (that does not implement savedwrite) does. Fixes: 1b2443a547f9 ("powerpc/book3s64: Avoid multiple endian conversion in pte helpers") Cc: stable@vger.kernel.org # v4.20+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-05-30treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152Thomas Gleixner
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 3029 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-03powerpc/mm: Move book3s64 specifics in subdirectory mm/book3s64Christophe Leroy
Many files in arch/powerpc/mm are only for book3S64. This patch creates a subdirectory for them. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> [mpe: Update the selftest sym links, shorten new filenames, cleanup some whitespace and formatting in the new files.] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>