aboutsummaryrefslogtreecommitdiffstats
path: root/arch/powerpc/kernel/vdso.c
AgeCommit message (Collapse)Author
2020-09-04powerpc/vdso: Fix vdso cpu truncationMilton Miller
commit a9f675f950a07d5c1dbcbb97aabac56f5ed085e3 upstream. The code in vdso_cpu_init that exposes the cpu and numa node to userspace via SPRG_VDSO incorrctly masks the cpu to 12 bits. This means that any kernel running on a box with more than 4096 threads (NR_CPUS advertises a limit of of 8192 cpus) would expose userspace to two cpu contexts running at the same time with the same cpu number. Note: I'm not aware of any distro shipping a kernel with support for more than 4096 threads today, nor of any system image that currently exceeds 4096 threads. Found via code browsing. Fixes: 18ad51dd342a7eb09dbcd059d0b451b616d4dafc ("powerpc: Add VDSO version of getcpu") Signed-off-by: Milton Miller <miltonm@us.ibm.com> Signed-off-by: Anton Blanchard <anton@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200715233704.1352257-1-anton@ozlabs.org Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-05-30treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152Thomas Gleixner
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 3029 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-03-05powerpc/vdso: don't clear PG_reservedDavid Hildenbrand
The VDSO is part of the kernel image and therefore the struct pages are marked as reserved during boot. As we install a special mapping, the actual struct pages will never be exposed to MM via the page tables. We can therefore leave the pages marked as reserved. Link: http://lkml.kernel.org/r/20190114125903.24845-4-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Kees Cook <keescook@chromium.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-21powerpc: split compat syscall table out from native tableFiroz Khan
PowerPC uses a syscall table with native and compat calls interleaved, which is a slightly simpler way to define two matching tables. As we move to having the tables generated, that advantage is no longer important, but the interleaved table gets in the way of using the same scripts as on the other archit- ectures. Split out a new compat_sys_call_table symbol that contains all the compat calls, and leave the main table for the nat- ive calls, to more closely match the method we use every- where else. Suggested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Firoz Khan <firoz.khan@linaro.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-30powerpc: remove unneeded inclusions of cpu_has_feature.hChristophe Leroy
Files not using cpu_has_feature() don't need cpu_has_feature.h Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-12treewide: kzalloc() -> kcalloc()Kees Cook
The kzalloc() function has a 2-factor argument form, kcalloc(). This patch replaces cases of: kzalloc(a * b, gfp) with: kcalloc(a * b, gfp) as well as handling cases of: kzalloc(a * b * c, gfp) with: kzalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kzalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kzalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kzalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kzalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kzalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(char) * COUNT + COUNT , ...) | kzalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kzalloc + kcalloc ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kzalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kzalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kzalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kzalloc(C1 * C2 * C3, ...) | kzalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kzalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kzalloc(sizeof(THING) * C2, ...) | kzalloc(sizeof(TYPE) * C2, ...) | kzalloc(C1 * C2 * C3, ...) | kzalloc(C1 * C2, ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - (E1) * E2 + E1, E2 , ...) | - kzalloc + kcalloc ( - (E1) * (E2) + E1, E2 , ...) | - kzalloc + kcalloc ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
2018-03-24powerpc: Use feature bit for RTC presence rather than timebase presencePaul Mackerras
All PowerPC CPUs other than the original PPC601 have a timebase register rather than the "real-time clock" (RTC) register that the PPC601 (and the original POWER and POWER2 CPUs) had. Currently we have a CPU feature bit to indicate the presence of the timebase, but it makes more sense to use a bit to indicate the unusual situation rather than the common situation. This therefore defines a CPU_FTR_USE_RTC bit in place of the CPU_FTR_USE_TB bit, and arranges for it to be set on PPC601 systems. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-06powerpc/64: Clean up ppc64_caches using a struct per cacheBenjamin Herrenschmidt
We have two set of identical struct members for the I and D sides and mostly identical bunches of code to parse the device-tree to populate them. Instead make a ppc_cache_info structure with one copy for I and one for D Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-06powerpc/64: Fix naming of cache block vs. cache lineBenjamin Herrenschmidt
In a number of places we called "cache line size" what is actually the cache block size, which in the powerpc architecture, means the effective size to use with cache management instructions (it can be different from the actual cache line size). We fix the naming across the board and properly retrieve both pieces of information when available in the device-tree. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-09powerpc/vdso: Add missing include fileGuenter Roeck
Some powerpc builds fail with the following buld error. In file included from ./arch/powerpc/include/asm/mmu_context.h:11:0, from arch/powerpc/kernel/vdso.c:28: arch/powerpc/include/asm/cputhreads.h: In function 'get_tensr': arch/powerpc/include/asm/cputhreads.h:101:2: error: implicit declaration of function 'cpu_has_feature' Fixes: b92a226e5284 ("powerpc: Move cpu_has_feature() to a separate file") Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-23vdso: make arch_setup_additional_pages wait for mmap_sem for write killableMichal Hocko
most architectures are relying on mmap_sem for write in their arch_setup_additional_pages. If the waiting task gets killed by the oom killer it would block oom_reaper from asynchronous address space reclaim and reduce the chances of timely OOM resolving. Wait for the lock in the killable mode and return with EINTR if the task got killed while waiting. Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Andy Lutomirski <luto@amacapital.net> [x86 vdso] Acked-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-26powerpc: Standardise on NR_syscalls rather than __NR_syscalls.Rashmica Gupta
Most architectures use NR_syscalls as the #define for the number of syscalls. We use __NR_syscalls, and then define NR_syscalls as __NR_syscalls. __NR_syscalls is not used outside arch code, whereas NR_syscalls is. So as NR_syscalls must be defined and __NR_syscalls does not, replace __NR_syscalls with NR_syscalls. Signed-off-by: Rashmica Gupta <rashmicy@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-05-11powerpc/vdso: Disable building the 32-bit VDSO on little endianMichael Ellerman
The only little endian configuration we support is ppc64le. As such if we're building little endian we don't need a 32-bit VDSO, because there is no 32-bit userspace. This patch is a fairly ugly mess of #ifdefs, but is the minimal logic required to disable the 32-bit VDSO. We can hopefully clean up the result in future with some further refactoring. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-05-11powerpc/vdso: Combine start/size variablesMichael Ellerman
In vdso_fixup_features() we have start64/start32 and size64/size32, but they have the same types, ie. void * and unsigned long. They're only used to save the return value from find_sectionXX() for the subsequent call to do_feature_fixups(), so there's no overlap in their usage either. So we can just consolidate them into start/size and avoid the duplication. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-05-11powerpc/vdso: Remove unused debug codeMichael Ellerman
It's in the git history if we ever need it back. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-10powerpc: Remove superfluous bootmem includesAnton Blanchard
Lots of places included bootmem.h even when not using bootmem. Signed-off-by: Anton Blanchard <anton@samba.org> Tested-by: Emil Medve <Emilian.Medve@Freescale.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-08-08arm64,ia64,ppc,s390,sh,tile,um,x86,mm: remove default gate areaAndy Lutomirski
The core mm code will provide a default gate area based on FIXADDR_USER_START and FIXADDR_USER_END if !defined(__HAVE_ARCH_GATE_AREA) && defined(AT_SYSINFO_EHDR). This default is only useful for ia64. arm64, ppc, s390, sh, tile, 64-bit UML, and x86_32 have their own code just to disable it. arm, 32-bit UML, and x86_64 have gate areas, but they have their own implementations. This gets rid of the default and moves the code into ia64. This should save some code on architectures without a gate area: it's now possible to inline the gate_area functions in the default case. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Acked-by: Nathan Lynch <nathan_lynch@mentor.com> Acked-by: H. Peter Anvin <hpa@linux.intel.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [in principle] Acked-by: Richard Weinberger <richard@nod.at> [for um] Acked-by: Will Deacon <will.deacon@arm.com> [for arm64] Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Nathan Lynch <Nathan_Lynch@mentor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-03-19powerpc/booke64: Use SPRG7 for VDSOScott Wood
Previously SPRG3 was marked for use by both VDSO and critical interrupts (though critical interrupts were not fully implemented). In commit 8b64a9dfb091f1eca8b7e58da82f1e7d1d5fe0ad ("powerpc/booke64: Use SPRG0/3 scratch for bolted TLB miss & crit int"), Mihai Caraman made an attempt to resolve this conflict by restoring the VDSO value early in the critical interrupt, but this has some issues: - It's incompatible with EXCEPTION_COMMON which restores r13 from the by-then-overwritten scratch (this cost me some debugging time). - It forces critical exceptions to be a special case handled differently from even machine check and debug level exceptions. - It didn't occur to me that it was possible to make this work at all (by doing a final "ld r13, PACA_EXCRIT+EX_R13(r13)") until after I made (most of) this patch. :-) It might be worth investigating using a load rather than SPRG on return from all exceptions (except TLB misses where the scratch never leaves the SPRG) -- it could save a few cycles. Until then, let's stick with SPRG for all exceptions. Since we cannot use SPRG4-7 for scratch without corrupting the state of a KVM guest, move VDSO to SPRG7 on book3e. Since neither SPRG4-7 nor critical interrupts exist on book3s, SPRG3 is still used for VDSO there. Signed-off-by: Scott Wood <scottwood@freescale.com> Cc: Mihai Caraman <mihai.caraman@freescale.com> Cc: Anton Blanchard <anton@samba.org> Cc: Paul Mackerras <paulus@samba.org> Cc: kvm-ppc@vger.kernel.org
2013-10-30powerpc: Move local setup.h declarations to arch includesRobert Jennings
Move the few declarations from arch/powerpc/kernel/setup.h into arch/powerpc/include/asm/setup.h. This resolves a sparse warning for arch/powerpc/mm/numa.c which defines do_init_bootmem() but can't include the setup.h header in the prior path. Resolves: arch/powerpc/mm/numa.c:998:13: warning: symbol 'do_init_bootmem' was not declared. Should it be static? Signed-off-by: Robert C Jennings <rcj@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-07-01powerpc: Delete __cpuinit usage from all usersPaul Gortmaker
The __cpuinit type of throwaway sections might have made sense some time ago when RAM was more constrained, but now the savings do not offset the cost and complications. For example, the fix in commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time") is a good example of the nasty type of bugs that can be created with improper use of the various __init prefixes. After a discussion on LKML[1] it was decided that cpuinit should go the way of devinit and be phased out. Once all the users are gone, we can then finally remove the macros themselves from linux/init.h. This removes all the powerpc uses of the __cpuinit macros. There are no __CPUINIT users in assembly files in powerpc. [1] https://lkml.org/lkml/2013/5/20/589 Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Josh Boyer <jwboyer@gmail.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-23powerpc: Add VDSO version of timeAdhemerval Zanella
On 04/18/2013 07:38 PM, Anton Blanchard wrote: > Since you are only reading one long you shouldn't need to check the > update count and loop, you will always see a consistent value. The > system call version of time() just does an unprotected load for example. Fixed. > With the above change and with Michael's comments covered (decent > changelog entry and Signed-off-by): > > Acked-by: Anton Blanchard <anton@samba.org> Thanks for the review, below the updated patch: From: Adhemerval Zanella <azanella@linux.vnet.ibm.com> This patch implement the time syscall as vDSO. The performance speedups are: Baseline PPC32: 380 nsec Baseline PPC64: 350 nsec vdso PPC32: 20 nsec vsdo PPC64: 20 nsec Tested on 64 bit build with both 32 bit and 64 bit userland. Acked-by: Anton Blanchard <anton@samba.org> Signed-off-by: Adhemerval Zanella <azanella@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-09-07powerpc: Restore VDSO information on critical exception om BookEMihai Caraman
Critical exception on 64-bit booke uses user-visible SPRG3 as scratch. Restore VDSO information in SPRG3 on exception prolog. Use a common sprg3 field in PACA for all powerpc64 architectures. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-11powerpc: Add VDSO version of getcpuAnton Blanchard
We have a request for a fast method of getting CPU and NUMA node IDs from userspace. This patch implements a getcpu VDSO function, similar to x86. Ben suggested we use SPRG3 which is userspace readable. SPRG3 can be modified by a KVM guest, so we save the SPRG3 value in the paca and restore it when transitioning from the guest to the host. I have a glibc patch that implements sched_getcpu on top of this. Testing on a POWER7: baseline: 538 cycles vdso: 30 cycles Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-28Merge tag 'split-asm_system_h-for-linus-20120328' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_system Pull "Disintegrate and delete asm/system.h" from David Howells: "Here are a bunch of patches to disintegrate asm/system.h into a set of separate bits to relieve the problem of circular inclusion dependencies. I've built all the working defconfigs from all the arches that I can and made sure that they don't break. The reason for these patches is that I recently encountered a circular dependency problem that came about when I produced some patches to optimise get_order() by rewriting it to use ilog2(). This uses bitops - and on the SH arch asm/bitops.h drags in asm-generic/get_order.h by a circuituous route involving asm/system.h. The main difficulty seems to be asm/system.h. It holds a number of low level bits with no/few dependencies that are commonly used (eg. memory barriers) and a number of bits with more dependencies that aren't used in many places (eg. switch_to()). These patches break asm/system.h up into the following core pieces: (1) asm/barrier.h Move memory barriers here. This already done for MIPS and Alpha. (2) asm/switch_to.h Move switch_to() and related stuff here. (3) asm/exec.h Move arch_align_stack() here. Other process execution related bits could perhaps go here from asm/processor.h. (4) asm/cmpxchg.h Move xchg() and cmpxchg() here as they're full word atomic ops and frequently used by atomic_xchg() and atomic_cmpxchg(). (5) asm/bug.h Move die() and related bits. (6) asm/auxvec.h Move AT_VECTOR_SIZE_ARCH here. Other arch headers are created as needed on a per-arch basis." Fixed up some conflicts from other header file cleanups and moving code around that has happened in the meantime, so David's testing is somewhat weakened by that. We'll find out anything that got broken and fix it.. * tag 'split-asm_system_h-for-linus-20120328' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_system: (38 commits) Delete all instances of asm/system.h Remove all #inclusions of asm/system.h Add #includes needed to permit the removal of asm/system.h Move all declarations of free_initmem() to linux/mm.h Disintegrate asm/system.h for OpenRISC Split arch_align_stack() out from asm-generic/system.h Split the switch_to() wrapper out of asm-generic/system.h Move the asm-generic/system.h xchg() implementation to asm-generic/cmpxchg.h Create asm-generic/barrier.h Make asm-generic/cmpxchg.h #include asm-generic/cmpxchg-local.h Disintegrate asm/system.h for Xtensa Disintegrate asm/system.h for Unicore32 [based on ver #3, changed by gxt] Disintegrate asm/system.h for Tile Disintegrate asm/system.h for Sparc Disintegrate asm/system.h for SH Disintegrate asm/system.h for Score Disintegrate asm/system.h for S390 Disintegrate asm/system.h for PowerPC Disintegrate asm/system.h for PA-RISC Disintegrate asm/system.h for MN10300 ...
2012-03-28Disintegrate asm/system.h for PowerPCDavid Howells
Disintegrate asm/system.h for PowerPC. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> cc: linuxppc-dev@lists.ozlabs.org
2012-03-28powerpc: Random little legacy iSeries removal tidy upsStephen Rothwell
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-23coredump: remove VM_ALWAYSDUMP flagJason Baron
The motivation for this patchset was that I was looking at a way for a qemu-kvm process, to exclude the guest memory from its core dump, which can be quite large. There are already a number of filter flags in /proc/<pid>/coredump_filter, however, these allow one to specify 'types' of kernel memory, not specific address ranges (which is needed in this case). Since there are no more vma flags available, the first patch eliminates the need for the 'VM_ALWAYSDUMP' flag. The flag is used internally by the kernel to mark vdso and vsyscall pages. However, it is simple enough to check if a vma covers a vdso or vsyscall page without the need for this flag. The second patch then replaces the 'VM_ALWAYSDUMP' flag with a new 'VM_NODUMP' flag, which can be set by userspace using new madvise flags: 'MADV_DONTDUMP', and unset via 'MADV_DODUMP'. The core dump filters continue to work the same as before unless 'MADV_DONTDUMP' is set on the region. The qemu code which implements this features is at: http://people.redhat.com/~jbaron/qemu-dump/qemu-dump.patch In my testing the qemu core dump shrunk from 383MB -> 13MB with this patch. I also believe that the 'MADV_DONTDUMP' flag might be useful for security sensitive apps, which might want to select which areas are dumped. This patch: The VM_ALWAYSDUMP flag is currently used by the coredump code to indicate that a vma is part of a vsyscall or vdso section. However, we can determine if a vma is in one these sections by checking it against the gate_vma and checking for a non-NULL return value from arch_vma_name(). Thus, freeing a valuable vma bit. Signed-off-by: Jason Baron <jbaron@redhat.com> Acked-by: Roland McGrath <roland@hack.frob.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Avi Kivity <avi@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31powerpc: remove non-required uses of include <linux/module.h>Paul Gortmaker
None of the files touched here are modules, and they are not exporting any symbols either -- so there is no need to be including the module.h. Builds of all the files remains successful. Even kernel/module.c does not need to include it, since it includes linux/moduleloader.h instead. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-03-23mm: arch: rename in_gate_area_no_task to in_gate_area_no_mmStephen Wilson
Now that gate vma's are referenced with respect to a particular mm and not a particular task it only makes sense to propagate the change to this predicate as well. Signed-off-by: Stephen Wilson <wilsons@start.ca> Reviewed-by: Michel Lespinasse <walken@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-23mm: arch: make in_gate_area take an mm_struct instead of a task_structStephen Wilson
Morally, the question of whether an address lies in a gate vma should be asked with respect to an mm, not a particular task. Moreover, dropping the dependency on task_struct will help make existing and future operations on mm's more flexible and convenient. Signed-off-by: Stephen Wilson <wilsons@start.ca> Reviewed-by: Michel Lespinasse <walken@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-23mm: arch: make get_gate_vma take an mm_struct instead of a task_structStephen Wilson
Morally, the presence of a gate vma is more an attribute of a particular mm than a particular task. Moreover, dropping the dependency on task_struct will help make both existing and future operations on mm's more flexible and convenient. Signed-off-by: Stephen Wilson <wilsons@start.ca> Reviewed-by: Michel Lespinasse <walken@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-09-02powerpc: Use is_32bit_task() helper to test 32-bit binaryDenis Kirjanov
This patch removes all explicit tests for the TIF_32BIT flag Signed-off-by: Denis Kirjanov <dkirjanov@kernel.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-14lmb: rename to memblockYinghai Lu
via following scripts FILES=$(find * -type f | grep -vE 'oprofile|[^K]config') sed -i \ -e 's/lmb/memblock/g' \ -e 's/LMB/MEMBLOCK/g' \ $FILES for N in $(find . -name lmb.[ch]); do M=$(echo $N | sed 's/lmb/memblock/g') mv $N $M done and remove some wrong change like lmbench and dlmb etc. also move memblock.c from lib/ to mm/ Suggested-by: Ingo Molnar <mingo@elte.hu> Acked-by: "H. Peter Anvin" <hpa@zytor.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-11-09tree-wide: fix a very frequent spelling mistakeDirk Hohndel
something-bility is spelled as something-blity so a grep for 'blit' would find these lines this is so trivial that I didn't split it by subsystem / copy additional maintainers - all changes are to comments The only purpose is to get fewer false positives when grepping around the kernel sources. Signed-off-by: Dirk Hohndel <hohndel@infradead.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-10-27powerpc: Align vDSO base addressAndreas Schwab
The ABI specifies a 64K alignment, we need to map the vDSO accordingly Signed-off-by: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-09-24powerpc/perf_counter: Fix vdso detectionAnton Blanchard
perf_counter uses arch_vma_name() to detect a vdso region which in turn uses current->mm->context.vdso_base. We need to initialise this before doing the mmap or else we fail to detect the vdso. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-09-21Use macros for .data.page_aligned section.Tim Abbott
This patch changes the remaining direct references to .data.page_aligned in C and assembly code to use the macros in include/linux/linkage.h. Signed-off-by: Tim Abbott <tabbott@ksplice.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Haavard Skinnemoen <hskinnemoen@atmel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2009-08-20powerpc: Move 64bit VDSO to improve context switch performanceAnton Blanchard
On 64bit applications the VDSO is the only thing in segment 0. Since the VDSO is position independent we can remove the hint and let get_unmapped_area pick an area. This will mean the vdso will be near other mmaps and will share an SLB entry: 10000000-10001000 r-xp 00000000 08:06 5778459 /root/context_switch_64 10010000-10011000 r--p 00000000 08:06 5778459 /root/context_switch_64 10011000-10012000 rw-p 00001000 08:06 5778459 /root/context_switch_64 fffa92ae000-fffa92b0000 rw-p 00000000 00:00 0 fffa92b0000-fffa9453000 r-xp 00000000 08:06 4334051 /lib64/power6/libc-2.9.so fffa9453000-fffa9462000 ---p 001a3000 08:06 4334051 /lib64/power6/libc-2.9.so fffa9462000-fffa9466000 r--p 001a2000 08:06 4334051 /lib64/power6/libc-2.9.so fffa9466000-fffa947c000 rw-p 001a6000 08:06 4334051 /lib64/power6/libc-2.9.so fffa947c000-fffa9480000 rw-p 00000000 00:00 0 fffa9480000-fffa94a8000 r-xp 00000000 08:06 4333852 /lib64/ld-2.9.so fffa94b3000-fffa94b4000 rw-p 00000000 00:00 0 fffa94b4000-fffa94b7000 r-xp 00000000 00:00 0 [vdso] <----- here I am fffa94b7000-fffa94b8000 r--p 00027000 08:06 4333852 /lib64/ld-2.9.so fffa94b8000-fffa94bb000 rw-p 00028000 08:06 4333852 /lib64/ld-2.9.so fffa94bb000-fffa94bc000 rw-p 00000000 00:00 0 fffe4c10000-fffe4c25000 rw-p 00000000 00:00 0 [stack] On a microbenchmark that bounces a token between two 64bit processes over pipes and calls gettimeofday each iteration (to access the VDSO), our context switch rate goes from 268k to 277k ctx switches/sec (tested on a 4GHz POWER6). Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-12-28Merge branch 'next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (144 commits) powerpc/44x: Support 16K/64K base page sizes on 44x powerpc: Force memory size to be a multiple of PAGE_SIZE powerpc/32: Wire up the trampoline code for kdump powerpc/32: Add the ability for a classic ppc kernel to be loaded at 32M powerpc/32: Allow __ioremap on RAM addresses for kdump kernel powerpc/32: Setup OF properties for kdump powerpc/32/kdump: Implement crash_setup_regs() using ppc_save_regs() powerpc: Prepare xmon_save_regs for use with kdump powerpc: Remove default kexec/crash_kernel ops assignments powerpc: Make default kexec/crash_kernel ops implicit powerpc: Setup OF properties for ppc32 kexec powerpc/pseries: Fix cpu hotplug powerpc: Fix KVM build on ppc440 powerpc/cell: add QPACE as a separate Cell platform powerpc/cell: fix build breakage with CONFIG_SPUFS disabled powerpc/mpc5200: fix error paths in PSC UART probe function powerpc/mpc5200: add rts/cts handling in PSC UART driver powerpc/mpc5200: Make PSC UART driver update serial errors counters powerpc/mpc5200: Remove obsolete code from mpc5200 MDIO driver powerpc/mpc5200: Add MDMA/UDMA support to MPC5200 ATA driver ... Fix trivial conflict in drivers/char/Makefile as per Paul's directions
2008-12-25[S390] arch_setup_additional_pages argumentsMartin Schwidefsky
arch_setup_additional_pages currently gets two arguments, the binary format descripton and an indication if the process uses an executable stack or not. The second argument is not used by anybody, it could be removed without replacement. What actually does make sense is to pass an indication if the process uses the elf interpreter or not. The glibc code will not use anything from the vdso if the process does not use the dynamic linker, so for statically linked binaries the architecture backend can choose not to map the vdso. Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-12-21powerpc/mm: Introduce MMU featuresBenjamin Herrenschmidt
We're soon running out of CPU features and I need to add some new ones for various MMU related bits, so this patch separates the MMU features from the CPU features. I moved over the 32-bit MMU related ones, added base features for MMU type families, but didn't move over any 64-bit only feature yet. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-08-04powerpc: Remove use of CONFIG_PPC_MERGEKumar Gala
Now that arch/ppc is gone and CONFIG_PPC_MERGE is always set, remove the dead code associated with !CONFIG_PPC_MERGE from arch/powerpc and include/asm-powerpc. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-07-03powerpc: Fixup lwsync at runtimeKumar Gala
To allow for a single kernel image on e500 v1/v2/mc we need to fixup lwsync at runtime. On e500v1/v2 lwsync causes an illop so we need to patch up the code. We default to 'sync' since that is always safe and if the cpu is capable we will replace 'sync' with 'lwsync'. We introduce CPU_FTR_LWSYNC as a way to determine at runtime if this is needed. This flag could be moved elsewhere since we dont really use it for the normal CPU_FTR purpose. Finally we only store the relative offset in the fixup section to keep it as small as possible rather than using a full fixup_entry. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-06-20Reinstate ZERO_PAGE optimization in 'get_user_pages()' and fix XIPLinus Torvalds
KAMEZAWA Hiroyuki and Oleg Nesterov point out that since the commit 557ed1fa2620dc119adb86b34c614e152a629a80 ("remove ZERO_PAGE") removed the ZERO_PAGE from the VM mappings, any users of get_user_pages() will generally now populate the VM with real empty pages needlessly. We used to get the ZERO_PAGE when we did the "handle_mm_fault()", but since fault handling no longer uses ZERO_PAGE for new anonymous pages, we now need to handle that special case in follow_page() instead. In particular, the removal of ZERO_PAGE effectively removed the core file writing optimization where we would skip writing pages that had not been populated at all, and increased memory pressure a lot by allocating all those useless newly zeroed pages. This reinstates the optimization by making the unmapped PTE case the same as for a non-existent page table, which already did this correctly. While at it, this also fixes the XIP case for follow_page(), where the caller could not differentiate between the case of a page that simply could not be used (because it had no "struct page" associated with it) and a page that just wasn't mapped. We do that by simply returning an error pointer for pages that could not be turned into a "struct page *". The error is arbitrarily picked to be EFAULT, since that was what get_user_pages() already used for the equivalent IO-mapped page case. [ Also removed an impossible test for pte_offset_map_lock() failing: that's not how that function works ] Acked-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Nick Piggin <npiggin@suse.de> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-26Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/lmb-2.6Paul Mackerras
2008-02-14[POWERPC] vdso_do_func_patch{32,64}() must be __initAdrian Bunk
This fixes the following section mismatches: <-- snip --> ... WARNING: vmlinux.o(.text+0xe49c): Section mismatch in reference from the function .vdso_do_func_patch64() to the function .init.text:.find_symbol64() WARNING: vmlinux.o(.text+0xe4d0): Section mismatch in reference from the function .vdso_do_func_patch64() to the function .init.text:.find_symbol64() WARNING: vmlinux.o(.text+0xe56c): Section mismatch in reference from the function .vdso_do_func_patch32() to the function .init.text:.find_symbol32() WARNING: vmlinux.o(.text+0xe5a0): Section mismatch in reference from the function .vdso_do_func_patch32() to the function .init.text:.find_symbol32() ... <-- snip --> Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-02-13[LIB]: Make PowerPC LMB code generic so sparc64 can use it too.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-20[POWERPC] vdso: Fixes for cache block sizesOlof Johansson
The current VDSO implementation is hardcoded to 128 byte cache blocks, which are only used on IBM's 64-bit processors. Convert it to get the cache block sizes out of vdso_data instead, similar to how the ppc64 in-kernel cache flush does it. Signed-off-by: Olof Johansson <olof@lixom.net> Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-10-11[POWERPC] Disable vDSO support for ARCH=ppc where it's not implementedWolfgang Denk
Signed-off-by: Wolfgang Denk <wd@denx.de> Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-09-19[POWERPC] Don't expose clock vDSO functions when CPU has no timebaseBenjamin Herrenschmidt
We forgot to remove the clock_gettime, clock_getres and get_tbfreq vDSO calls on CPUs that have no timebase such as 601 or 403 (old CPUs that have different mechanisms and for which the vDSO code will not work properly). This fixes it. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>