aboutsummaryrefslogtreecommitdiffstats
path: root/arch/powerpc/include/asm/page.h
AgeCommit message (Collapse)Author
2018-05-03powerpc/fadump: Do not use hugepages when fadump is activeHari Bathini
FADump capture kernel boots in restricted memory environment preserving the context of previous kernel to save vmcore. Supporting hugepages in such environment makes things unnecessarily complicated, as hugepages need memory set aside for them. This means most of the capture kernel's memory is used in supporting hugepages. In most cases, this results in out-of-memory issues while booting FADump capture kernel. But hugepages are not of much use in capture kernel whose only job is to save vmcore. So, disabling hugepages support, when fadump is active, is a reliable solution for the out of memory issues. Introducing a flag variable to disable HugeTLB support when fadump is active. Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-13powerpc: Avoid comparison of unsigned long >= 0 in pfn_valid()Mathieu Malaterre
Rewrite comparison since all values compared are of type `unsigned long`. Instead of using unsigned properties and rewriting the original code as: (originally suggested by Segher Boessenkool <segher@kernel.crashing.org>) #define pfn_valid(pfn) \ (((pfn) - ARCH_PFN_OFFSET) < (max_mapnr - ARCH_PFN_OFFSET)) Prefer a static inline function to make code as readable as possible. Fix a warning (treated as error in W=1): arch/powerpc/include/asm/page.h:129:32: error: comparison of unsigned expression >= 0 is always true [-Werror=type-limits] #define pfn_valid(pfn) ((pfn) >= ARCH_PFN_OFFSET && (pfn) < max_mapnr) ^ Suggested-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-06powerpc/mm/slice: create header files dedicated to slicesChristophe Leroy
In preparation for the following patch which will enhance 'slices' for supporting PPC32 in order to fix an issue on hugepages on 8xx, this patch takes out of page*.h all bits related to 'slices' and put them into newly created slice.h header files. While common parts go into asm/slice.h, subarch specific parts go into respective books3s/64/slice.c and nohash/64/slice.c 'slices' Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-19powerpc/mm: Fix virt_addr_valid() etc. on 64-bit hashMichael Ellerman
virt_addr_valid() is supposed to tell you if it's OK to call virt_to_page() on an address. What this means in practice is that it should only return true for addresses in the linear mapping which are backed by a valid PFN. We are failing to properly check that the address is in the linear mapping, because virt_to_pfn() will return a valid looking PFN for more or less any address. That bug is actually caused by __pa(), used in virt_to_pfn(). eg: __pa(0xc000000000010000) = 0x10000 # Good __pa(0xd000000000010000) = 0x10000 # Bad! __pa(0x0000000000010000) = 0x10000 # Bad! This started happening after commit bdbc29c19b26 ("powerpc: Work around gcc miscompilation of __pa() on 64-bit") (Aug 2013), where we changed the definition of __pa() to work around a GCC bug. Prior to that we subtracted PAGE_OFFSET from the value passed to __pa(), meaning __pa() of a 0xd or 0x0 address would give you something bogus back. Until we can verify if that GCC bug is no longer an issue, or come up with another solution, this commit does the minimal fix to make virt_addr_valid() work, by explicitly checking that the address is in the linear mapping region. Fixes: bdbc29c19b26 ("powerpc: Work around gcc miscompilation of __pa() on 64-bit") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Paul Mackerras <paulus@ozlabs.org> Reviewed-by: Balbir Singh <bsingharora@gmail.com> Tested-by: Breno Leitao <breno.leitao@gmail.com>
2017-02-22powerpc: do not make the entire heap executableDenys Vlasenko
On 32-bit powerpc the ELF PLT sections of binaries (built with --bss-plt, or with a toolchain which defaults to it) look like this: [17] .sbss NOBITS 0002aff8 01aff8 000014 00 WA 0 0 4 [18] .plt NOBITS 0002b00c 01aff8 000084 00 WAX 0 0 4 [19] .bss NOBITS 0002b090 01aff8 0000a4 00 WA 0 0 4 Which results in an ELF load header: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align LOAD 0x019c70 0x00029c70 0x00029c70 0x01388 0x014c4 RWE 0x10000 This is all correct, the load region containing the PLT is marked as executable. Note that the PLT starts at 0002b00c but the file mapping ends at 0002aff8, so the PLT falls in the 0 fill section described by the load header, and after a page boundary. Unfortunately the generic ELF loader ignores the X bit in the load headers when it creates the 0 filled non-file backed mappings. It assumes all of these mappings are RW BSS sections, which is not the case for PPC. gcc/ld has an option (--secure-plt) to not do this, this is said to incur a small performance penalty. Currently, to support 32-bit binaries with PLT in BSS kernel maps *entire brk area* with executable rights for all binaries, even --secure-plt ones. Stop doing that. Teach the ELF loader to check the X bit in the relevant load header and create 0 filled anonymous mappings that are executable if the load header requests that. Test program showing the difference in /proc/$PID/maps: int main() { char buf[16*1024]; char *p = malloc(123); /* make "[heap]" mapping appear */ int fd = open("/proc/self/maps", O_RDONLY); int len = read(fd, buf, sizeof(buf)); write(1, buf, len); printf("%p\n", p); return 0; } Compiled using: gcc -mbss-plt -m32 -Os test.c -otest Unpatched ppc64 kernel: 00100000-00120000 r-xp 00000000 00:00 0 [vdso] 0fe10000-0ffd0000 r-xp 00000000 fd:00 67898094 /usr/lib/libc-2.17.so 0ffd0000-0ffe0000 r--p 001b0000 fd:00 67898094 /usr/lib/libc-2.17.so 0ffe0000-0fff0000 rw-p 001c0000 fd:00 67898094 /usr/lib/libc-2.17.so 10000000-10010000 r-xp 00000000 fd:00 100674505 /home/user/test 10010000-10020000 r--p 00000000 fd:00 100674505 /home/user/test 10020000-10030000 rw-p 00010000 fd:00 100674505 /home/user/test 10690000-106c0000 rwxp 00000000 00:00 0 [heap] f7f70000-f7fa0000 r-xp 00000000 fd:00 67898089 /usr/lib/ld-2.17.so f7fa0000-f7fb0000 r--p 00020000 fd:00 67898089 /usr/lib/ld-2.17.so f7fb0000-f7fc0000 rw-p 00030000 fd:00 67898089 /usr/lib/ld-2.17.so ffa90000-ffac0000 rw-p 00000000 00:00 0 [stack] 0x10690008 Patched ppc64 kernel: 00100000-00120000 r-xp 00000000 00:00 0 [vdso] 0fe10000-0ffd0000 r-xp 00000000 fd:00 67898094 /usr/lib/libc-2.17.so 0ffd0000-0ffe0000 r--p 001b0000 fd:00 67898094 /usr/lib/libc-2.17.so 0ffe0000-0fff0000 rw-p 001c0000 fd:00 67898094 /usr/lib/libc-2.17.so 10000000-10010000 r-xp 00000000 fd:00 100674505 /home/user/test 10010000-10020000 r--p 00000000 fd:00 100674505 /home/user/test 10020000-10030000 rw-p 00010000 fd:00 100674505 /home/user/test 10180000-101b0000 rw-p 00000000 00:00 0 [heap] ^^^^ this has changed f7c60000-f7c90000 r-xp 00000000 fd:00 67898089 /usr/lib/ld-2.17.so f7c90000-f7ca0000 r--p 00020000 fd:00 67898089 /usr/lib/ld-2.17.so f7ca0000-f7cb0000 rw-p 00030000 fd:00 67898089 /usr/lib/ld-2.17.so ff860000-ff890000 rw-p 00000000 00:00 0 [stack] 0x10180008 The patch was originally posted in 2012 by Jason Gunthorpe and apparently ignored: https://lkml.org/lkml/2012/9/30/138 Lightly run-tested. Link: http://lkml.kernel.org/r/20161215131950.23054-1-dvlasenk@redhat.com Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Tested-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Florian Weimer <fweimer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-18powerpc/mm: Fix little-endian 4K hugetlbAneesh Kumar K.V
When we switched to big endian page table, we never updated the hugepd format such that it can work for both big endian and little endian config. This patch series update hugepd format such that it is looked at as __be64 value in big endian page table config. This patch also switch hugepd_t.pd from signed long to unsigned long. I did update the FSL hugepd_ok check to check for the top bit instead of checking > 0. Fixes: 5dc1ef858c12 ("powerpc/mm: Use big endian Linux page tables for book3s 64") Cc: stable@vger.kernel.org # v4.7+ Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-19powerpc/32: Remove RELOCATABLE_PPC32Kevin Hao
It is seldom used in the kernel code and can be easily replaced by either RELOCATABLE or PPC32. So there is no reason to keep a separate kernel option for this. Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-11powerpc/mm: Make 4K and 64K use pte_t for pgtable_tAneesh Kumar K.V
This patch switches 4K Linux page size config to use pte_t * type instead of struct page * for pgtable_t. This simplifies the code a lot and helps in consolidating both 64K and 4K page allocator routines. The changes should not have any impact, because we already store physical address in the upper level page table tree and that implies we already do struct page * to physical address conversion. One change to note here is we move the pgtable_page_dtor() call for nohash to pte_fragment_free_mm(). The nohash related change is due to the related changes in pgtable_64.c. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-01powerpc/mm: Use big endian Linux page tables for book3s 64Aneesh Kumar K.V
Traditionally Power server machines have used the Hashed Page Table MMU mode. In this mode Linux manages its own tree of nested page tables, aka. "the Linux page tables", which are not used by the hardware directly, and software loads translations into the hash page table for use by the hardware. Power ISA 3.0 defines a new MMU mode, known as Radix Tree Translation, where the hardware can directly operate on the Linux page tables. However the hardware requires that the page tables be in big endian format. To accommodate this, switch the pgtable types to __be64 and add appropriate endian conversions. Because we will be supporting a single kernel binary that boots using either radix or hash mode, we always store the Linux page tables big endian, even in hash mode where they are not actually used by the hardware. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [mpe: Fix sparse errors, flesh out change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-03powerpc/mm: Split pgtable types to separate headerAneesh Kumar K.V
We move the page table accessors into a separate header. We will later add a big endian variant of the table which is needed for radix. No functionality change only code movement. Reviewed-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-02-29powerpc/mm/book3s-64: Use physical addresses in upper page table tree levelsPaul Mackerras
This changes the Linux page tables to store physical addresses rather than kernel virtual addresses in the upper levels of the tree (pgd, pud and pmd) for 64-bit Book 3S machines. This also changes the hugepd pointers used to implement hugepages when the base page size is 4k to store physical addresses rather than virtual addresses (again just for 64-bit Book3S machines). This frees up some high order bits, and will be needed with PowerISA v3.0 machines which read the page table tree in hardware in radix mode. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-14powerpc/mm: Move hugetlb related headersAneesh Kumar K.V
W.r.t hugetlb, we support two format for pmd. With book3s_64 and 64K linux page size, we can have pte at the pmd level. Hence we don't need to support hugepd there. For everything else hugepd is supported and pmd_huge is (0). Acked-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-14powerpc/mm: Don't use pmd_val, pud_val and pgd_val as lvalueAneesh Kumar K.V
We convert them static inline function here as we did with pte_val in the previous patch Acked-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-14powerpc/mm: Don't use pte_val as lvalueAneesh Kumar K.V
We also convert few #define to static inline in this patch for better type checking Acked-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-27powerpc/booke: Only use VIRT_PHYS_OFFSET on booke32Scott Wood
The way VIRT_PHYS_OFFSET is not correct on book3e-64, because it does not account for CONFIG_RELOCATABLE other than via the 32-bit-only virt_phys_offset. book3e-64 can (and if the comment about a GCC miscompilation is still relevant, should) use the normal ppc64 __va/__pa. At this point, only booke-32 will use VIRT_PHYS_OFFSET, so given the issues with its calculation, restrict its definition to booke-32. Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-10-12powerpc/mm: Disable hugepd for 64K page size.Aneesh Kumar K.V
After commit e2b3d202d1dba8f3546ed28224ce485bc50010be ("powerpc: Switch 16GB and 16MB explicit hugepages to a different page table format"), we don't need to support is_hugepd() for 64K page size. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-09powerpc: Fix _ALIGN_* errors due to type difference.Aneesh Kumar K.V
This avoid errors like unsigned int usize = 1 << 30; int size = 1 << 30; unsigned long addr = 64UL << 30 ; value = _ALIGN_DOWN(addr, usize); -> 0 value = _ALIGN_DOWN(addr, size); -> 0x1000000000 Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-01powerpc/mm: Add virt_to_pfn and use this instead of opencodingAneesh Kumar K.V
This add helper virt_to_pfn and remove the opencoded usage of the same. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-05-11powerpc: Make STRICT_MM_TYPECHECKS a config optionMichael Ellerman
The STRICT_MM_TYPECHECKS code has bit-rotted over the years. To make it possible to easily build test it, make it a CONFIG option. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-14powerpc/mm: Switch to generic RCU get_user_pages_fastAneesh Kumar K.V
This patch switch the ppc arch to use the generic RCU based gup implementation. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-14mm: Update generic gup implementation to handle hugepage directoryAneesh Kumar K.V
Update generic gup implementation with powerpc specific details. On powerpc at pmd level we can have hugepte, normal pmd pointer or a pointer to the hugepage directory. Tested-by: Steve Capper <steve.capper@linaro.org> Acked-by: Steve Capper <steve.capper@linaro.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-08-08arm64,ia64,ppc,s390,sh,tile,um,x86,mm: remove default gate areaAndy Lutomirski
The core mm code will provide a default gate area based on FIXADDR_USER_START and FIXADDR_USER_END if !defined(__HAVE_ARCH_GATE_AREA) && defined(AT_SYSINFO_EHDR). This default is only useful for ia64. arm64, ppc, s390, sh, tile, 64-bit UML, and x86_32 have their own code just to disable it. arm, 32-bit UML, and x86_64 have gate areas, but they have their own implementations. This gets rid of the default and moves the code into ia64. This should save some code on architectures without a gate area: it's now possible to inline the gate_area functions in the default case. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Acked-by: Nathan Lynch <nathan_lynch@mentor.com> Acked-by: H. Peter Anvin <hpa@linux.intel.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [in principle] Acked-by: Richard Weinberger <richard@nod.at> [for um] Acked-by: Will Deacon <will.deacon@arm.com> [for arm64] Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Nathan Lynch <Nathan_Lynch@mentor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-30powerpc: Fix 64K page size support for PPC44xAlistair Popple
PPC44x supports page sizes other than 4K however when 64K page sizes are selected compilation fails. This is due to a change in the definition of pgtable_t introduced by the following patch: commit 5c1f6ee9a31cbdac90bbb8ae1ba4475031ac74b4 Author: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> powerpc: Reduce PTE table memory wastage The above patch only implements the new layout for PPC64 so it doesn't compile for PPC32 with a 64K page size. Ideally we should implement the same layout for PPC32 however for the meantime this patch reverts the definition of pgtable_t for PPC32. Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-10-30powerpc: Fix a typo in comments of va to pa conversionVaishnavi Bhat
This patch fixes typo in comments virtual to physical address conversion. Signed-off-by: Vaishnavi Bhat <vaishnavi@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-08-27powerpc: Work around gcc miscompilation of __pa() on 64-bitPaul Mackerras
On 64-bit, __pa(&static_var) gets miscompiled by recent versions of gcc as something like: addis 3,2,.LANCHOR1+4611686018427387904@toc@ha addi 3,3,.LANCHOR1+4611686018427387904@toc@l This ends up effectively ignoring the offset, since its bottom 32 bits are zero, and means that the result of __pa() still has 0xC in the top nibble. This happens with gcc 4.8.1, at least. To work around this, for 64-bit we make __pa() use an AND operator, and for symmetry, we make __va() use an OR operator. Using an AND operator rather than a subtraction ends up with slightly shorter code since it can be done with a single clrldi instruction, whereas it takes three instructions to form the constant (-PAGE_OFFSET) and add it on. (Note that MEMORY_START is always 0 on 64-bit.) CC: <stable@vger.kernel.org> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30powerpc: Reduce PTE table memory wastageAneesh Kumar K.V
We allocate one page for the last level of linux page table. With THP and large page size of 16MB, that would mean we are wasting large part of that page. To map 16MB area, we only need a PTE space of 2K with 64K page size. This patch reduce the space wastage by sharing the page allocated for the last level of linux page table with multiple pmd entries. We call these smaller chunks PTE page fragments and allocated page, PTE page. In order to support systems which doesn't have 64K HPTE support, we also add another 2K to PTE page fragment. The second half of the PTE fragments is used for storing slot and secondary bit information of an HPTE. With this we now have a 4K PTE fragment. We use a simple approach to share the PTE page. On allocation, we bump the PTE page refcount to 16 and share the PTE page with the next 16 pte alloc request. This should help in the node locality of the PTE page fragment, assuming that the immediate pte alloc request will mostly come from the same NUMA node. We don't try to reuse the freed PTE page fragment. Hence we could be waisting some space. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30powerpc: Switch 16GB and 16MB explicit hugepages to a different page table ↵Aneesh Kumar K.V
format We will be switching PMD_SHIFT to 24 bits to facilitate THP impmenetation. With PMD_SHIFT set to 24, we now have 16MB huge pages allocated at PGD level. That means with 32 bit process we cannot allocate normal pages at all, because we cover the entire address space with one pgd entry. Fix this by switching to a new page table format for hugepages. With the new page table format for 16GB and 16MB hugepages we won't allocate hugepage directory. Instead we encode the PTE information directly at the directory level. This forces 16MB hugepage at PMD level. This will also make the page take walk much simpler later when we add the THP support. With the new table format we have 4 cases for pgds and pmds: (1) invalid (all zeroes) (2) pointer to next table, as normal; bottom 6 bits == 0 (3) leaf pte for huge page, bottom two bits != 00 (4) hugepd pointer, bottom two bits == 00, next 4 bits indicate size of table Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30powerpc: New hugepage directory formatAneesh Kumar K.V
Change the hugepage directory format so that we can have leaf ptes directly at page directory avoiding the allocation of hugepage directory. With the new table format we have 3 cases for pgds and pmds: (1) invalid (all zeroes) (2) pointer to next table, as normal; bottom 6 bits == 0 (4) hugepd pointer, bottom two bits == 00, next 4 bits indicate size of table Instead of storing shift value in hugepd pointer we use mmu_psize_def index so that we can fit all the supported hugepage size in 4 bits Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-12-20powerpc: Define virtual-physical translations for RELOCATABLESuzuki Poulose
We find the runtime address of _stext and relocate ourselves based on the following calculation. virtual_base = ALIGN(KERNELBASE,KERNEL_TLB_PIN_SIZE) + MODULO(_stext.run,KERNEL_TLB_PIN_SIZE) relocate() is called with the Effective Virtual Base Address (as shown below) | Phys. Addr| Virt. Addr | Page |------------------------| Boundary | | | | | | | | | Kernel Load |___________|_ __ _ _ _ _|<- Effective Addr(_stext)| | ^ |Virt. Base Addr | | | | | | | | | |reloc_offset| | | | | | | | | | |______v_____|<-(KERNELBASE)%TLB_SIZE | | | | | | | | | Page |-----------|------------| Boundary | | | On BookE, we need __va() & __pa() early in the boot process to access the device tree. Currently this has been defined as : #define __va(x) ((void *)(unsigned long)((phys_addr_t)(x) - PHYSICAL_START + KERNELBASE) where: PHYSICAL_START is kernstart_addr - a variable updated at runtime. KERNELBASE is the compile time Virtual base address of kernel. This won't work for us, as kernstart_addr is dynamic and will yield different results for __va()/__pa() for same mapping. e.g., Let the kernel be loaded at 64MB and KERNELBASE be 0xc0000000 (same as PAGE_OFFSET). In this case, we would be mapping 0 to 0xc0000000, and kernstart_addr = 64M Now __va(1MB) = (0x100000) - (0x4000000) + 0xc0000000 = 0xbc100000 , which is wrong. it should be : 0xc0000000 + 0x100000 = 0xc0100000 On platforms which support AMP, like PPC_47x (based on 44x), the kernel could be loaded at highmem. Hence we cannot always depend on the compile time constants for mapping. Here are the possible solutions: 1) Update kernstart_addr(PHSYICAL_START) to match the Physical address of compile time KERNELBASE value, instead of the actual Physical_Address(_stext). The disadvantage is that we may break other users of PHYSICAL_START. They could be replaced with __pa(_stext). 2) Redefine __va() & __pa() with relocation offset #ifdef CONFIG_RELOCATABLE_PPC32 #define __va(x) ((void *)(unsigned long)((phys_addr_t)(x) - PHYSICAL_START + (KERNELBASE + RELOC_OFFSET))) #define __pa(x) ((unsigned long)(x) + PHYSICAL_START - (KERNELBASE + RELOC_OFFSET)) #endif where, RELOC_OFFSET could be a) A variable, say relocation_offset (like kernstart_addr), updated at boot time. This impacts performance, as we have to load an additional variable from memory. OR b) #define RELOC_OFFSET ((PHYSICAL_START & PPC_PIN_SIZE_OFFSET_MASK) - \ (KERNELBASE & PPC_PIN_SIZE_OFFSET_MASK)) This introduces more calculations for doing the translation. 3) Redefine __va() & __pa() with a new variable i.e, #define __va(x) ((void *)(unsigned long)((phys_addr_t)(x) + VIRT_PHYS_OFFSET)) where VIRT_PHYS_OFFSET : #ifdef CONFIG_RELOCATABLE_PPC32 #define VIRT_PHYS_OFFSET virt_phys_offset #else #define VIRT_PHYS_OFFSET (KERNELBASE - PHYSICAL_START) #endif /* CONFIG_RELOCATABLE_PPC32 */ where virt_phy_offset is updated at runtime to : Effective KERNELBASE - kernstart_addr. Taking our example, above: virt_phys_offset = effective_kernelstart_vaddr - kernstart_addr = 0xc0400000 - 0x400000 = 0xc0000000 and __va(0x100000) = 0xc0000000 + 0x100000 = 0xc0100000 which is what we want. I have implemented (3) in the following patch which has same cost of operation as the existing one. I have tested the patches on 440x platforms only. However this should work fine for PPC_47x also, as we only depend on the runtime address and the current TLB XLAT entry for the startup code, which is available in r25. I don't have access to a 47x board yet. So, it would be great if somebody could test this on 47x. Signed-off-by: Suzuki K. Poulose <suzuki@in.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org> Signed-off-by: Josh Boyer <jwboyer@gmail.com>
2011-12-20powerpc: Rename mapping based RELOCATABLE to DYNAMIC_MEMSTART for BookESuzuki Poulose
The current implementation of CONFIG_RELOCATABLE in BookE is based on mapping the page aligned kernel load address to KERNELBASE. This approach however is not enough for platforms, where the TLB page size is large (e.g, 256M on 44x). So we are renaming the RELOCATABLE used currently in BookE to DYNAMIC_MEMSTART to reflect the actual method. The CONFIG_RELOCATABLE for PPC32(BookE) based on processing of the dynamic relocations will be introduced in the later in the patch series. This change would allow the use of the old method of RELOCATABLE for platforms which can afford to enforce the page alignment (platforms with smaller TLB size). Changes since v3: * Introduced a new config, NONSTATIC_KERNEL, to denote a kernel which is either a RELOCATABLE or DYNAMIC_MEMSTART(Suggested by: Josh Boyer) Suggested-by: Scott Wood <scottwood@freescale.com> Tested-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Suzuki K. Poulose <suzuki@in.ibm.com> Cc: Scott Wood <scottwood@freescale.com> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: Josh Boyer <jwboyer@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: linux ppc dev <linuxppc-dev@lists.ozlabs.org> Signed-off-by: Josh Boyer <jwboyer@gmail.com>
2011-11-28powerpc: Implement CONFIG_STRICT_DEVMEMsukadev@linux.vnet.ibm.com
As described in the help text in the patch, this token restricts general access to /dev/mem as a way of increasing the security. Specifically, access to exclusive IOMEM and kernel RAM is denied unless CONFIG_STRICT_DEVMEM is set to 'n'. Implement the 'devmem_is_allowed()' interface for Powerpc. It will be called from range_is_allowed() when userpsace attempts to access /dev/mem. This patch is based on an earlier patch from Steve Best and with input from Paul Mackerras and Scott Wood. [BenH] Fixed a typo or two and removed the generic change which should be submitted as a separate patch Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-09-20powerpc: Hugetlb for BookEBecky Bruce
Enable hugepages on Freescale BookE processors. This allows the kernel to use huge TLB entries to map pages, which can greatly reduce the number of TLB misses and the amount of TLB thrashing experienced by applications with large memory footprints. Care should be taken when using this on FSL processors, as the number of large TLB entries supported by the core is low (16-64) on current processors. The supported set of hugepage sizes include 4m, 16m, 64m, 256m, and 1g. Page sizes larger than the max zone size are called "gigantic" pages and must be allocated on the command line (and cannot be deallocated). This is currently only fully implemented for Freescale 32-bit BookE processors, but there is some infrastructure in the code for 64-bit BooKE. Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-03-30powerpc: ARCH_PFN_OFFSET should be unsigned longScott Wood
pfns are unsigned long, but MEMORY_START is phys_addr_t. This leads to page_to_pfn() returning phys_addr_t, and thus type mismatches in a few print statements. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-02-07powerpc: Fix pfn_valid() when memory starts at a non-zero addressScott Wood
max_mapnr is a pfn, not an index innto mem_map[]. So don't add ARCH_PFN_OFFSET a second time. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-04-26powerpc/fsl-booke: Fix CONFIG_RELOCATABLE support on FSL Book-E ppc32Kumar Gala
The following commit broke CONFIG_RELOCATABLE support on FSL Book-E parts: commit 549e8152de8039506f69c677a4546e5427aa6ae7 Author: Paul Mackerras <paulus@samba.org> Date: Sat Aug 30 11:43:47 2008 +1000 powerpc: Make the 64-bit kernel as a position-independent executable The change to __va and __pa to use PAGE_OFFSET & MEMORY_START causes problems on the Book-E parts because we don't know MEMORY_START until after we parse the device tree. We need __va to work properly to even parse the device tree so we have a chicken an egg. So go back to using he other definition of __va/__pa on CONFIG_BOOKE and use the PAGE_OFFSET/MEMORY_START version on "Classic" PPC64. Also updated casts to handle phys_addr_t being a different size from unsigned long (ie 36-bit physical on PPC32). Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-10-30powerpc/mm: Allow more flexible layouts for hugepage pagetablesDavid Gibson
Currently each available hugepage size uses a slightly different pagetable layout: that is, the bottem level table of pointers to hugepages is a different size, and may branch off from the normal page tables at a different level. Every hugepage aware path that needs to walk the pagetables must therefore look up the hugepage size from the slice info first, and work out the correct way to walk the pagetables accordingly. Future hardware is likely to add more possible hugepage sizes, more layout options and more mess. This patch, therefore reworks the handling of hugepage pagetables to reduce this complexity. In the new scheme, instead of having to consult the slice mask, pagetable walking code can check a flag in the PGD/PUD/PMD entries to see where to branch off to hugepage pagetables, and the entry also contains the information (eseentially hugepage shift) necessary to then interpret that table without recourse to the slice mask. This scheme can be extended neatly to handle multiple levels of self-describing "special" hugepage pagetables, although for now we assume only one level exists. This approach means that only the pagetable allocation path needs to know how the pagetables should be set out. All other (hugepage) pagetable walking paths can just interpret the structure as they go. There already was a flag bit in PGD/PUD/PMD entries for hugepage directory pointers, but it was only used for debug. We alter that flag bit to instead be a 0 in the MSB to indicate a hugepage pagetable pointer (normally it would be 1 since the pointer lies in the linear mapping). This means that asm pagetable walking can test for (and punt on) hugepage pointers with the same test that checks for unpopulated page directory entries (beq becomes bge), since hugepage pointers will always be positive, and normal pointers always negative. While we're at it, we get rid of the confusing (and grep defeating) #defining of hugepte_shift to be the same thing as mmu_huge_psizes. Signed-off-by: David Gibson <dwg@au1.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20powerpc: Add memory management headers for new 64-bit BookEBenjamin Herrenschmidt
This adds the PTE and pgtable format definitions, along with changes to the kernel memory map and other definitions related to implementing support for 64-bit Book3E. This also shields some asm-offset bits that are currently only relevant on 32-bit We also move the definition of the "linux" page size constants to the common mmu.h file and add a few sizes that are relevant to embedded processors. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-21powerpc/pseries: CMO unused page hintingRobert Jennings
Adds support for the "unused" page hint which can be used in shared memory partitions to flag pages not in use, which will then be stolen before active pages by the hypervisor when memory needs to be moved to LPARs in need of additional memory. Failure to mark pages as 'unused' makes the LPAR slower to give up unused memory to other partitions. This adds the kernel parameter 'cmo_free_hint' to disable this functionality. Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-14powerpc/44x: Support for 256KB PAGE_SIZEYuri Tikhonov
This patch adds support for 256KB pages on ppc44x-based boards. For simplification of implementation with 256KB pages we still assume 2-level paging. As a side effect this leads to wasting extra memory space reserved for PTE tables: only 1/4 of pages allocated for PTEs are actually used. But this may be an acceptable trade-off to achieve the high performance we have with big PAGE_SIZEs in some applications (e.g. RAID). Also with 256KB PAGE_SIZE we increase THREAD_SIZE up to 32KB to minimize the risk of stack overflows in the cases of on-stack arrays, which size depends on the page size (e.g. multipage BIOs, NTFS, etc.). With 256KB PAGE_SIZE we need to decrease the PKMAP_ORDER at least down to 9, otherwise all high memory (2 ^ 10 * PAGE_SIZE == 256MB) we'll be occupied by PKMAP addresses leaving no place for vmalloc. We do not separate PKMAP_ORDER for 256K from 16K/64K PAGE_SIZE here; actually that value of 10 in support for 16K/64K had been selected rather intuitively. Thus now for all cases of PAGE_SIZE on ppc44x (including the default, 4KB, one) we have 512 pages for PKMAP. Because ELF standard supports only page sizes up to 64K, then you should use binutils later than 2.17.50.0.3 with '-zmax-page-size' set to 256K for building applications, which are to be run with the 256KB-page sized kernel. If using the older binutils, then you should patch them like follows: --- binutils/bfd/elf32-ppc.c.orig +++ binutils/bfd/elf32-ppc.c -#define ELF_MAXPAGESIZE 0x10000 +#define ELF_MAXPAGESIZE 0x40000 One more restriction we currently have with 256KB page sizes is inability to use shmem safely, so, for now, the 256KB is available only if you turn the CONFIG_SHMEM option off (another variant is to use BROKEN). Though, if you need shmem with 256KB pages, you can always remove the !SHMEM dependency in 'config PPC_256K_PAGES', and use the workaround available here: http://lkml.org/lkml/2008/12/19/20 Signed-off-by: Yuri Tikhonov <yur@emcraft.com> Signed-off-by: Ilya Yanok <yanok@emcraft.com> Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
2008-12-29powerpc/44x: Support 16K/64K base page sizes on 44xIlya Yanok
This adds support for 16k and 64k page sizes on PowerPC 44x processors. The PGDIR table is much smaller than a page when using 16k or 64k pages (512 and 32 bytes respectively) so we allocate the PGDIR with kzalloc() instead of __get_free_pages(). One PTE table covers rather a large memory area when using 16k or 64k pages (32MB or 512MB respectively), so we can easily put FIXMAP and PKMAP in the area covered by one PTE table. Signed-off-by: Yuri Tikhonov <yur@emcraft.com> Signed-off-by: Vladimir Panfilov <pvr@emcraft.com> Signed-off-by: Ilya Yanok <yanok@emcraft.com> Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-10-21Merge commit 'origin' into masterBenjamin Herrenschmidt
Manual merge of: arch/powerpc/Kconfig arch/powerpc/include/asm/page.h
2008-10-21powerpc: Fix build issue with CONFIG_RELOCATABLE=yKumar Gala
There are two issues when we enable CONFIG_RELOCATABLE. The first is due to the fact that phys_addr_t is now defined in linux/types.h. The second is due to the fact that the DMA code changes expose memstart_addr to prom_init.c Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-10-20Merge branch 'core-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: m32r: fix build due to notify_cpu_starting() change powerpc: fix linux-next build failure
2008-10-16powerpc: fix linux-next build failureStephen Rothwell
Today's linux-next build (powerpc allyesconfig) failed like this: In file included from arch/powerpc/include/asm/mmu-hash64.h:17, from arch/powerpc/include/asm/mmu.h:8, from arch/powerpc/include/asm/pgtable.h:8, from arch/powerpc/mm/slb.c:20: arch/powerpc/include/asm/page.h:76: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'memstart_addr' arch/powerpc/include/asm/page.h:77: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'kernstart_addr' Caused by commit 600715dcdf567c86f8b2c6173fcfb4b873e25a19 ("generic: add phys_addr_t for holding physical addresses") from the tip-core tree. This only fails if CONFIG_RELOCATABLE is set. So include that instead of asm/types.h in asm/page.h for the CONFIG_RELOCATABLE case. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: ppc-dev <linuxppc-dev@ozlabs.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-09-15powerpc: Make the 64-bit kernel as a position-independent executablePaul Mackerras
This implements CONFIG_RELOCATABLE for 64-bit by making the kernel as a position-independent executable (PIE) when it is set. This involves processing the dynamic relocations in the image in the early stages of booting, even if the kernel is being run at the address it is linked at, since the linker does not necessarily fill in words in the image for which there are dynamic relocations. (In fact the linker does fill in such words for 64-bit executables, though not for 32-bit executables, so in principle we could avoid calling relocate() entirely when we're running a 64-bit kernel at the linked address.) The dynamic relocations are processed by a new function relocate(addr), where the addr parameter is the virtual address where the image will be run. In fact we call it twice; once before calling prom_init, and again when starting the main kernel. This means that reloc_offset() returns 0 in prom_init (since it has been relocated to the address it is running at), which necessitated a few adjustments. This also changes __va and __pa to use an equivalent definition that is simpler. With the relocatable kernel, PAGE_OFFSET and MEMORY_START are constants (for 64-bit) whereas PHYSICAL_START is a variable (and KERNELBASE ideally should be too, but isn't yet). With this, relocatable kernels still copy themselves down to physical address 0 and run there. Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-08-04powerpc: Move include files to arch/powerpc/include/asmStephen Rothwell
from include/asm-powerpc. This is the result of a mkdir arch/powerpc/include/asm git mv include/asm-powerpc/* arch/powerpc/include/asm Followed by a few documentation/comment fixups and a couple of places where <asm-powepc/...> was being used explicitly. Of the latter only one was outside the arch code and it is a driver only built for powerpc. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org>