aboutsummaryrefslogtreecommitdiffstats
path: root/arch/arm64/kernel/signal.c
AgeCommit message (Collapse)Author
2020-01-14arm64: signal: nofpsimd: Handle fp/simd context for signal framesSuzuki K Poulose
Make sure we try to save/restore the vfp/fpsimd context for signal handling only when the fp/simd support is available. Otherwise, skip the frames. Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2019-06-19treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 234Thomas Gleixner
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not see http www gnu org licenses extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 503 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Enrico Weigelt <info@metux.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190602204653.811534538@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-03-29arm64: fpsimd: Always set TIF_FOREIGN_FPSTATE on task state flushDave Martin
This patch updates fpsimd_flush_task_state() to mirror the new semantics of fpsimd_flush_cpu_state() introduced by commit d8ad71fa38a9 ("arm64: fpsimd: Fix TIF_FOREIGN_FPSTATE after invalidating cpu regs"). Both functions now implicitly set TIF_FOREIGN_FPSTATE to indicate that the task's FPSIMD state is not loaded into the cpu. As a side-effect, fpsimd_flush_task_state() now sets TIF_FOREIGN_FPSTATE even for non-running tasks. In the case of non-running tasks this is not useful but also harmless, because the flag is live only while the corresponding task is running. This function is not called from fast paths, so special-casing this for the task == current case is not really worth it. Compiler barriers previously present in restore_sve_fpsimd_context() are pulled into fpsimd_flush_task_state() so that it can be safely called with preemption enabled if necessary. Explicit calls to set TIF_FOREIGN_FPSTATE that accompany fpsimd_flush_task_state() calls and are now redundant are removed as appropriate. fpsimd_flush_task_state() is used to get exclusive access to the representation of the task's state via task_struct, for the purpose of replacing the state. Thus, the call to this function should happen before manipulating fpsimd_state or sve_state etc. in task_struct. Anomalous cases are reordered appropriately in order to make the code more consistent, although there should be no functional difference since these cases are protected by local_bh_disable() anyway. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Julien Grall <julien.grall@arm.com> Tested-by: zhang.lei <zhang.lei@jp.fujitsu.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-01-03Remove 'type' argument from access_ok() functionLinus Torvalds
Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument of the user address range verification function since we got rid of the old racy i386-only code to walk page tables by hand. It existed because the original 80386 would not honor the write protect bit when in kernel mode, so you had to do COW by hand before doing any user access. But we haven't supported that in a long time, and these days the 'type' argument is a purely historical artifact. A discussion about extending 'user_access_begin()' to do the range checking resulted this patch, because there is no way we're going to move the old VERIFY_xyz interface to that model. And it's best done at the end of the merge window when I've done most of my merges, so let's just get this done once and for all. This patch was mostly done with a sed-script, with manual fix-ups for the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form. There were a couple of notable cases: - csky still had the old "verify_area()" name as an alias. - the iter_iov code had magical hardcoded knowledge of the actual values of VERIFY_{READ,WRITE} (not that they mattered, since nothing really used it) - microblaze used the type argument for a debug printout but other than those oddities this should be a total no-op patch. I tried to fix up all architectures, did fairly extensive grepping for access_ok() uses, and the changes are trivial, but I may have missed something. Any missed conversion should be trivially fixable, though. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-07-12arm64: use {COMPAT,}SYSCALL_DEFINE0 for sigreturnMark Rutland
We don't currently annotate our various sigreturn functions as syscalls, as we need to do to use pt_regs syscall wrappers. Let's mark them as real syscalls. For compat_sys_sigreturn and compat_sys_rt_sigreturn, this changes the return type from int to long, matching the prototypes in sys32.c. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-07-12arm64: remove sigreturn wrappersMark Rutland
The arm64 sigreturn* syscall handlers are non-standard. Rather than taking a number of user parameters in registers as per the AAPCS, they expect the pt_regs as their sole argument. To make this work, we override the syscall definitions to invoke wrappers written in assembly, which mov the SP into x0, and branch to their respective C functions. On other architectures (such as x86), the sigreturn* functions take no argument and instead use current_pt_regs() to acquire the user registers. This requires less boilerplate code, and allows for other features such as interposing C code in this path. This patch takes the same approach for arm64. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Tentatively-reviewed-by: Dave Martin <dave.martin@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-07-12arm64: consistently use unsigned long for thread flagsMark Rutland
In do_notify_resume, we manipulate thread_flags as a 32-bit unsigned int, whereas thread_info::flags is a 64-bit unsigned long, and elsewhere (e.g. in the entry assembly) we manipulate the flags as a 64-bit quantity. For consistency, and to avoid problems if we end up with more than 32 flags, let's make do_notify_resume take the flags as a 64-bit unsigned long. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Dave Martin <dave.martin@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-07-11arm64: rseq: Implement backend rseq calls and select HAVE_RSEQWill Deacon
Implement calls to rseq_signal_deliver, rseq_handle_notify_resume and rseq_syscall so that we can select HAVE_RSEQ on arm64. Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-06-08arm64: Fix syscall restarting around signal suppressed by tracerDave Martin
Commit 17c2895 ("arm64: Abstract syscallno manipulation") abstracts out the pt_regs.syscallno value for a syscall cancelled by a tracer as NO_SYSCALL, and provides helpers to set and check for this condition. However, the way this was implemented has the unintended side-effect of disabling part of the syscall restart logic. This comes about because the second in_syscall() check in do_signal() re-evaluates the "in a syscall" condition based on the updated pt_regs instead of the original pt_regs. forget_syscall() is explicitly called prior to the second check in order to prevent restart logic in the ret_to_user path being spuriously triggered, which means that the second in_syscall() check always yields false. This triggers a failure in tools/testing/selftests/seccomp/seccomp_bpf.c, when using ptrace to suppress a signal that interrups a nanosleep() syscall. Misbehaviour of this type is only expected in the case where a tracer suppresses a signal and the target process is either being single-stepped or the interrupted syscall attempts to restart via -ERESTARTBLOCK. This patch restores the old behaviour by performing the in_syscall() check only once at the start of the function. Fixes: 17c289586009 ("arm64: Abstract syscallno manipulation") Signed-off-by: Dave Martin <Dave.Martin@arm.com> Reported-by: Sumit Semwal <sumit.semwal@linaro.org> Cc: Will Deacon <will.deacon@arm.com> Cc: <stable@vger.kernel.org> # 4.14.x- Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-06-01arm64: signal: Report signal frame size to userspace via auxvDave Martin
Stateful CPU architecture extensions may require the signal frame to grow to a size that exceeds the arch's MINSIGSTKSZ #define. However, changing this #define is an ABI break. To allow userspace the option of determining the signal frame size in a more forwards-compatible way, this patch adds a new auxv entry tagged with AT_MINSIGSTKSZ, which provides the maximum signal frame size that the process can observe during its lifetime. If AT_MINSIGSTKSZ is absent from the aux vector, the caller can assume that the MINSIGSTKSZ #define is sufficient. This allows for a consistent interface with older kernels that do not provide AT_MINSIGSTKSZ. The idea is that libc could expose this via sysconf() or some similar mechanism. There is deliberately no AT_SIGSTKSZ. The kernel knows nothing about userspace's own stack overheads and should not pretend to know. For arm64: The primary motivation for this interface is the Scalable Vector Extension, which can require at least 4KB or so of extra space in the signal frame for the largest hardware implementations. To determine the correct value, a "Christmas tree" mode (via the add_all argument) is added to setup_sigframe_layout(), to simulate addition of all possible records to the signal frame at maximum possible size. If this procedure goes wrong somehow, resulting in a stupidly large frame layout and hence failure of sigframe_alloc() to allocate a record to the frame, then this is indicative of a kernel bug. In this case, we WARN() and no attempt is made to populate AT_MINSIGSTKSZ for userspace. For arm64 SVE: The SVE context block in the signal frame needs to be considered too when computing the maximum possible signal frame size. Because the size of this block depends on the vector length, this patch computes the size based not on the thread's current vector length but instead on the maximum possible vector length: this determines the maximum size of SVE context block that can be observed in any signal frame for the lifetime of the process. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-03-28arm64: uaccess: Fix omissions from usercopy whitelistDave Martin
When the hardend usercopy support was added for arm64, it was concluded that all cases of usercopy into and out of thread_struct were statically sized and so didn't require explicit whitelisting of the appropriate fields in thread_struct. Testing with usercopy hardening enabled has revealed that this is not the case for certain ptrace regset manipulation calls on arm64. This occurs because the sizes of usercopies associated with the regset API are dynamic by construction, and because arm64 does not always stage such copies via the stack: indeed the regset API is designed to avoid the need for that by adding some bounds checking. This is currently believed to affect only the fpsimd and TLS registers. Because the whitelisted fields in thread_struct must be contiguous, this patch groups them together in a nested struct. It is also necessary to be able to determine the location and size of that struct, so rather than making the struct anonymous (which would save on edits elsewhere) or adding an anonymous union containing named and unnamed instances of the same struct (gross), this patch gives the struct a name and makes the necessary edits to code that references it (noisy but simple). Care is needed to ensure that the new struct does not contain padding (which the usercopy hardening would fail to protect). For this reason, the presence of tp2_value is made unconditional, since a padding field would be needed there in any case. This pads up to the 16-byte alignment required by struct user_fpsimd_state. Acked-by: Kees Cook <keescook@chromium.org> Reported-by: Mark Rutland <mark.rutland@arm.com> Fixes: 9e8084d3f761 ("arm64: Implement thread_struct whitelist for hardened usercopy") Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-03-28arm64: fpsimd: Split cpu field out from struct fpsimd_stateDave Martin
In preparation for using a common representation of the FPSIMD state for tasks and KVM vcpus, this patch separates out the "cpu" field that is used to track the cpu on which the state was most recently loaded. This will allow common code to operate on task and vcpu contexts without requiring the cpu field to be stored at the same offset from the FPSIMD register data in both cases. This should avoid the need for messing with the definition of those parts of struct vcpu_arch that are exposed in the KVM user ABI. The resulting change is also convenient for grouping and defining the set of thread_struct fields that are supposed to be accessible to copy_{to,from}_user(), which includes user_fpsimd_state but should exclude the cpu field. This patch does not amend the usercopy whitelist to match: that will be addressed in a subsequent patch. Signed-off-by: Dave Martin <Dave.Martin@arm.com> [will: inline fpsimd_flush_state for now] Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-03-06arm64: signal: Call arm64_notify_segfault when failing to deliver signalWill Deacon
If we fail to deliver a signal due to taking an unhandled fault on the stackframe, we can call arm64_notify_segfault to deliver a SEGV can deal with printing any unhandled signal messages for us, rather than roll our own printing code. A side-effect of this change is that we now deliver the frame address in si_addr along with an si_code of SEGV_{ACC,MAP}ERR, rather than an si_addr of 0 and an si_code of SI_KERNEL as before. Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-01-16arm64: fpsimd: Fix state leakage when migrating after sigreturnDave Martin
When refactoring the sigreturn code to handle SVE, I changed the sigreturn implementation to store the new FPSIMD state from the user sigframe into task_struct before reloading the state into the CPU regs. This makes it easier to convert the data for SVE when needed. However, it turns out that the fpsimd_state structure passed into fpsimd_update_current_state is not fully initialised, so assigning the structure as a whole corrupts current->thread.fpsimd_state.cpu with uninitialised data. This means that if the garbage data written to .cpu happens to be a valid cpu number, and the task is subsequently migrated to the cpu identified by the that number, and then tries to enter userspace, the CPU FPSIMD regs will be assumed to be correct for the task and not reloaded as they should be. This can result in returning to userspace with the FPSIMD registers containing data that is stale or that belongs to another task or to the kernel. Knowingly handing around a kernel structure that is incompletely initialised with user data is a potential source of mistakes, especially across source file boundaries. To help avoid a repeat of this issue, this patch adapts the relevant internal API to hand around the user-accessible subset only: struct user_fpsimd_state. To avoid future surprises, this patch also converts all uses of struct fpsimd_state that really only access the user subset, to use struct user_fpsimd_state. A few missing consts are added to function prototypes for good measure. Thanks to Will for spotting the cause of the bug here. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Dave Martin <Dave.Martin@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-11-03arm64/sve: Signal handling supportDave Martin
This patch implements support for saving and restoring the SVE registers around signals. A fixed-size header struct sve_context is always included in the signal frame encoding the thread's vector length at the time of signal delivery, optionally followed by a variable-layout structure encoding the SVE registers. Because of the need to preserve backwards compatibility, the FPSIMD view of the SVE registers is always dumped as a struct fpsimd_context in the usual way, in addition to any sve_context. The SVE vector registers are dumped in full, including bits 127:0 of each register which alias the corresponding FPSIMD vector registers in the hardware. To avoid any ambiguity about which alias to restore during sigreturn, the kernel always restores bits 127:0 of each SVE vector register from the fpsimd_context in the signal frame (which must be present): userspace needs to take this into account if it wants to modify the SVE vector register contents on return from a signal. FPSR and FPCR, which are used by both FPSIMD and SVE, are not included in sve_context because they are always present in fpsimd_context anyway. For signal delivery, a new helper fpsimd_signal_preserve_current_state() is added to update _both_ the FPSIMD and SVE views in the task struct, to make it easier to populate this information into the signal frame. Because of the redundancy between the two views of the state, only one is updated otherwise. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Dave Martin <Dave.Martin@arm.com> Cc: Alex Bennée <alex.bennee@linaro.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-11-03arm64: signal: Verify extra data is user-readable in sys_rt_sigreturnDave Martin
Currently sys_rt_sigreturn() verifies that the base sigframe is readable, but no similar check is performed on the extra data to which an extra_context record points. This matters because the extra data will be read with the unprotected user accessors. However, this is not a problem at present because the extra data base address is required to be exactly at the end of the base sigframe. So, there would need to be a non-user-readable kernel address within about 59K (SIGFRAME_MAXSZ - sizeof(struct rt_sigframe)) of some address for which access_ok(VERIFY_READ) returns true, in order for sigreturn to be able to read kernel memory that should be inaccessible to the user task. This is currently impossible due to the untranslatable address hole between the TTBR0 and TTBR1 address ranges. Disappearance of the hole between the TTBR0 and TTBR1 mapping ranges would require the VA size for TTBR0 and TTBR1 to grow to at least 55 bits, and either the disabling of tagged pointers for userspace or enabling of tagged pointers for kernel space; none of which is currently envisaged. Even so, it is wrong to use the unprotected user accessors without an accompanying access_ok() check. To avoid the potential for future surprises, this patch does an explicit access_ok() check on the extra data space when parsing an extra_context record. Fixes: 33f082614c34 ("arm64: signal: Allow expansion of the signal frame") Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-11-02arm64: Mask all exceptions during kernel_exitJames Morse
To take RAS Exceptions as quickly as possible we need to keep SError unmasked as much as possible. We need to mask it during kernel_exit as taking an error from this code will overwrite the exception-registers. Adding a naked 'disable_daif' to kernel_exit causes a performance problem for micro-benchmarks that do no real work, (e.g. calling getpid() in a loop). This is because the ret_to_user loop has already masked IRQs so that the TIF_WORK_MASK thread flags can't change underneath it, adding disable_daif is an additional self-synchronising operation. In the future, the RAS APEI code may need to modify the TIF_WORK_MASK flags from an SError, in which case the ret_to_user loop must mask SError while it examines the flags. Disable all exceptions for return to EL1. For return to EL0 get the ret_to_user loop to leave all exceptions masked once it has done its work, this avoids an extra pstate-write. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Julien Thierry <julien.thierry@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-09-17arm64/syscalls: Move address limit check in loopThomas Garnier
A bug was reported on ARM where set_fs might be called after it was checked on the work pending function. ARM64 is not affected by this bug but has a similar construct. In order to avoid any similar problems in the future, the addr_limit_user_check function is moved at the beginning of the loop. Fixes: cf7de27ab351 ("arm64/syscalls: Check address limit on user-mode return") Reported-by: Leonard Crestez <leonard.crestez@nxp.com> Signed-off-by: Thomas Garnier <thgarnie@google.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Pratyush Anand <panand@redhat.com> Cc: Dave Martin <Dave.Martin@arm.com> Cc: Will Drewry <wad@chromium.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Andy Lutomirski <luto@amacapital.net> Cc: David Howells <dhowells@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-api@vger.kernel.org Cc: Yonghong Song <yhs@fb.com> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/1504798247-48833-5-git-send-email-keescook@chromium.org
2017-09-05Merge tag 'arm64-upstream' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: - VMAP_STACK support, allowing the kernel stacks to be allocated in the vmalloc space with a guard page for trapping stack overflows. One of the patches introduces THREAD_ALIGN and changes the generic alloc_thread_stack_node() to use this instead of THREAD_SIZE (no functional change for other architectures) - Contiguous PTE hugetlb support re-enabled (after being reverted a couple of times). We now have the semantics agreed in the generic mm layer together with API improvements so that the architecture code can detect between contiguous and non-contiguous huge PTEs - Initial support for persistent memory on ARM: DC CVAP instruction exposed to user space (HWCAP) and the in-kernel pmem API implemented - raid6 improvements for arm64: faster algorithm for the delta syndrome and implementation of the recovery routines using Neon - FP/SIMD refactoring and removal of support for Neon in interrupt context. This is in preparation for full SVE support - PTE accessors converted from inline asm to cmpxchg so that we can use LSE atomics if available (ARMv8.1) - Perf support for Cortex-A35 and A73 - Non-urgent fixes and cleanups * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (75 commits) arm64: cleanup {COMPAT_,}SET_PERSONALITY() macro arm64: introduce separated bits for mm_context_t flags arm64: hugetlb: Cleanup setup_hugepagesz arm64: Re-enable support for contiguous hugepages arm64: hugetlb: Override set_huge_swap_pte_at() to support contiguous hugepages arm64: hugetlb: Override huge_pte_clear() to support contiguous hugepages arm64: hugetlb: Handle swap entries in huge_pte_offset() for contiguous hugepages arm64: hugetlb: Add break-before-make logic for contiguous entries arm64: hugetlb: Spring clean huge pte accessors arm64: hugetlb: Introduce pte_pgprot helper arm64: hugetlb: set_huge_pte_at Add WARN_ON on !pte_present arm64: kexec: have own crash_smp_send_stop() for crash dump for nonpanic cores arm64: dma-mapping: Mark atomic_pool as __ro_after_init arm64: dma-mapping: Do not pass data to gen_pool_set_algo() arm64: Remove the !CONFIG_ARM64_HW_AFDBM alternative code paths arm64: Ignore hardware dirty bit updates in ptep_set_wrprotect() arm64: Move PTE_RDONLY bit handling out of set_pte_at() kvm: arm64: Convert kvm_set_s2pte_readonly() from inline asm to cmpxchg() arm64: Convert pte handling from inline asm to using (cmp)xchg arm64: neon/efi: Make EFI fpsimd save/restore variables static ...
2017-09-04Merge branch 'x86-syscall-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull syscall updates from Ingo Molnar: "Improve the security of set_fs(): we now check the address limit on a number of key platforms (x86, arm, arm64) before returning to user-space - without adding overhead to the typical system call fast path" * 'x86-syscall-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: arm64/syscalls: Check address limit on user-mode return arm/syscalls: Check address limit on user-mode return x86/syscalls: Check address limit on user-mode return
2017-08-07arm64: Abstract syscallno manipulationDave Martin
The -1 "no syscall" value is written in various ways, shared with the user ABI in some places, and generally obscure. This patch attempts to make things a little more consistent and readable by replacing all these uses with a single #define. A couple of symbolic helpers are provided to clarify the intent further. Because the in-syscall check in do_signal() is changed from >= 0 to != NO_SYSCALL by this patch, different behaviour may be observable if syscallno is set to values less than -1 by a tracer. However, this is not different from the behaviour that is already observable if a tracer sets syscallno to a value >= __NR_(compat_)syscalls. It appears that this can cause spurious syscall restarting, but that is not a new behaviour either, and does not appear harmful. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-08-07arm64: syscallno is secretly an int, make it officialDave Martin
The upper 32 bits of the syscallno field in thread_struct are handled inconsistently, being sometimes zero extended and sometimes sign-extended. In fact, only the lower 32 bits seem to have any real significance for the behaviour of the code: it's been OK to handle the upper bits inconsistently because they don't matter. Currently, the only place I can find where those bits are significant is in calling trace_sys_enter(), which may be unintentional: for example, if a compat tracer attempts to cancel a syscall by passing -1 to (COMPAT_)PTRACE_SET_SYSCALL at the syscall-enter-stop, it will be traced as syscall 4294967295 rather than -1 as might be expected (and as occurs for a native tracer doing the same thing). Elsewhere, reads of syscallno cast it to an int or truncate it. There's also a conspicuous amount of code and casting to bodge around the fact that although semantically an int, syscallno is stored as a u64. Let's not pretend any more. In order to preserve the stp x instruction that stores the syscall number in entry.S, this patch special-cases the layout of struct pt_regs for big endian so that the newly 32-bit syscallno field maps onto the low bits of the stored value. This is not beautiful, but benchmarking of the getpid syscall on Juno suggests indicates a minor slowdown if the stp is split into an stp x and stp w. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-07-08arm64/syscalls: Check address limit on user-mode returnThomas Garnier
Ensure the address limit is a user-mode segment before returning to user-mode. Otherwise a process can corrupt kernel-mode memory and elevate privileges [1]. The set_fs function sets the TIF_SETFS flag to force a slow path on return. In the slow path, the address limit is checked to be USER_DS if needed. [1] https://bugs.chromium.org/p/project-zero/issues/detail?id=990 Signed-off-by: Thomas Garnier <thgarnie@google.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Mark Rutland <mark.rutland@arm.com> Cc: kernel-hardening@lists.openwall.com Cc: Will Deacon <will.deacon@arm.com> Cc: David Howells <dhowells@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Miroslav Benes <mbenes@suse.cz> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Pratyush Anand <panand@redhat.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Petr Mladek <pmladek@suse.com> Cc: Rik van Riel <riel@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andy Lutomirski <luto@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: linux-arm-kernel@lists.infradead.org Cc: Will Drewry <wad@chromium.org> Cc: linux-api@vger.kernel.org Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Paolo Bonzini <pbonzini@redhat.com> Link: http://lkml.kernel.org/r/20170615011203.144108-3-thgarnie@google.com
2017-06-23arm64: signal: Allow expansion of the signal frameDave Martin
This patch defines an extra_context signal frame record that can be used to describe an expanded signal frame, and modifies the context block allocator and signal frame setup and parsing code to create, populate, parse and decode this block as necessary. To avoid abuse by userspace, parse_user_sigframe() attempts to ensure that: * no more than one extra_context is accepted; * the extra context data is a sensible size, and properly placed and aligned. The extra_context data is required to start at the first 16-byte aligned address immediately after the dummy terminator record following extra_context in rt_sigframe.__reserved[] (as ensured during signal delivery). This serves as a sanity-check that the signal frame has not been moved or copied without taking the extra data into account. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> [will: add __force annotation when casting extra_datap to __user pointer] Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-20arm64: signal: factor out signal frame record allocationDave Martin
This patch factors out the allocator for signal frame optional records into a separate function, to ensure consistency and facilitate later expansion. No overrun checking is currently done, because the allocation is in user memory and anyway the kernel never tries to allocate enough space in the signal frame yet for an overrun to occur. This behaviour will be refined in future patches. The approach taken in this patch to allocation of the terminator record is not very clean: this will also be replaced in subsequent patches. For future extension, a comment is added in sigcontext.h documenting the current static allocations in __reserved[]. This will be important for determining under what circumstances userspace may or may not see an expanded signal frame. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-20arm64: signal: factor frame layout and population into separate passesDave Martin
In preparation for expanding the signal frame, this patch refactors the signal frame setup code in setup_sigframe() into two separate passes. The first pass, setup_sigframe_layout(), determines the size of the signal frame and its internal layout, including the presence and location of optional records. The resulting knowledge is used to allocate and locate the user stack space required for the signal frame and to determine which optional records to include. The second pass, setup_sigframe(), is called once the stack frame is allocated in order to populate it with the necessary context information. As a result of these changes, it becomes more natural to represent locations in the signal frame by a base pointer and an offset, since the absolute address of each location is not known during the layout pass. To be more consistent with this logic, parse_user_sigframe() is refactored to describe signal frame locations in a similar way. This change has no effect on the signal ABI, but will make it easier to expand the signal frame in future patches. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-20arm64: signal: Refactor sigcontext parsing in rt_sigreturnDave Martin
Currently, rt_sigreturn does very limited checking on the sigcontext coming from userspace. Future additions to the sigcontext data will increase the potential for surprises. Also, it is not clear whether the sigcontext extension records are supposed to occur in a particular order. To allow the parsing code to be extended more easily, this patch factors out the sigcontext parsing into a separate function, and adds extra checks to validate the well-formedness of the sigcontext structure. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-20arm64: signal: split frame link record from sigcontext structureDave Martin
In order to be able to increase the amount of the data currently written to the __reserved[] array in the signal frame, it is necessary to overwrite the locations currently occupied by the {fp,lr} frame link record pushed at the top of the signal stack. In order for this to work, this patch detaches the frame link record from struct rt_sigframe and places it separately at the top of the signal stack. This will allow subsequent patches to insert data between it and __reserved[]. This change relies on the non-ABI status of the placement of the frame record with respect to struct sigframe: this status is undocumented, but the placement is not declared or described in the user headers, and known unwinder implementations (libgcc, libunwind, gdb) appear not to rely on it. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-11-07arm64: Add uprobe supportPratyush Anand
This patch adds support for uprobe on ARM64 architecture. Unit tests for following have been done so far and they have been found working 1. Step-able instructions, like sub, ldr, add etc. 2. Simulation-able like ret, cbnz, cbz etc. 3. uretprobe 4. Reject-able instructions like sev, wfe etc. 5. trapped and abort xol path 6. probe at unaligned user address. 7. longjump test cases Currently it does not support aarch32 instruction probing. Signed-off-by: Pratyush Anand <panand@redhat.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-08-22arm64: factor work_pending state machine to CChris Metcalf
Currently ret_fast_syscall, work_pending, and ret_to_user form an ad-hoc state machine that can be difficult to reason about due to duplicated code and a large number of branch targets. This patch factors the common logic out into the existing do_notify_resume function, converting the code to C in the process, making the code more legible. This patch tries to closely mirror the existing behaviour while using the usual C control flow primitives. As local_irq_{disable,enable} may be instrumented, we balance exception entry (where we will almost most likely enable IRQs) with a call to trace_hardirqs_on just before the return to userspace. Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-03-02arm64: Rework valid_user_regsMark Rutland
We validate pstate using PSR_MODE32_BIT, which is part of the user-provided pstate (and cannot be trusted). Also, we conflate validation of AArch32 and AArch64 pstate values, making the code difficult to reason about. Instead, validate the pstate value based on the associated task. The task may or may not be current (e.g. when using ptrace), so this must be passed explicitly by callers. To avoid circular header dependencies via sched.h, is_compat_task is pulled out of asm/ptrace.h. To make the code possible to reason about, the AArch64 and AArch32 validation is split into separate functions. Software must respect the RES0 policy for SPSR bits, and thus the kernel mirrors the hardware policy (RAZ/WI) for bits as-yet unallocated. When these acquire an architected meaning writes may be permitted (potentially with additional validation). Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Dave Martin <dave.martin@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-04-13arm64: Removed unused variableRichard Weinberger
arch/arm64/kernel/signal.c: In function ‘handle_signal’: arch/arm64/kernel/signal.c:290:22: warning: unused variable ‘thread’ [-Wunused-variable] Fixes: arm64: Remove signal translation and exec_domain Reported-by: Thierry Reding <thierry.reding@gmail.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2015-04-12arm64: Remove signal translation and exec_domainRichard Weinberger
As execution domain support is gone we can remove signal translation from the signal code and remove exec_domain from thread_info. Signed-off-by: Richard Weinberger <richard@nod.at>
2015-02-12all arches, signal: move restart_block to struct task_structAndy Lutomirski
If an attacker can cause a controlled kernel stack overflow, overwriting the restart block is a very juicy exploit target. This is because the restart_block is held in the same memory allocation as the kernel stack. Moving the restart block to struct task_struct prevents this exploit by making the restart_block harder to locate. Note that there are other fields in thread_info that are also easy targets, at least on some architectures. It's also a decent simplification, since the restart code is more or less identical on all architectures. [james.hogan@imgtec.com: metag: align thread_info::supervisor_stack] Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: David Miller <davem@davemloft.net> Acked-by: Richard Weinberger <richard@nod.at> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Steven Miao <realmz6@gmail.com> Cc: Mark Salter <msalter@redhat.com> Cc: Aurelien Jacquiot <a-jacquiot@ti.com> Cc: Mikael Starvik <starvik@axis.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: David Howells <dhowells@redhat.com> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Michal Simek <monstr@monstr.eu> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Jonas Bonn <jonas@southpole.se> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Helge Deller <deller@gmx.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Tested-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Chen Liqin <liqin.linux@gmail.com> Cc: Lennox Wu <lennox.wu@gmail.com> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Chris Zankel <chris@zankel.net> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: James Hogan <james.hogan@imgtec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-06arm64: Use sigsp()Richard Weinberger
Use sigsp() instead of the open coded variant. Signed-off-by: Richard Weinberger <richard@nod.at>
2014-08-06arm64: Use get_signal() signal_setup_done()Richard Weinberger
Use the more generic functions get_signal() signal_setup_done() for signal delivery. Signed-off-by: Richard Weinberger <richard@nod.at>
2014-05-16Merge tag 'for-3.16' of git://git.linaro.org/people/ard.biesheuvel/linux-arm ↵Catalin Marinas
into upstream FPSIMD register bank context switching and crypto algorithms optimisations for arm64 from Ard Biesheuvel. * tag 'for-3.16' of git://git.linaro.org/people/ard.biesheuvel/linux-arm: arm64/crypto: AES-ECB/CBC/CTR/XTS using ARMv8 NEON and Crypto Extensions arm64: pull in <asm/simd.h> from asm-generic arm64/crypto: AES in CCM mode using ARMv8 Crypto Extensions arm64/crypto: AES using ARMv8 Crypto Extensions arm64/crypto: GHASH secure hash using ARMv8 Crypto Extensions arm64/crypto: SHA-224/SHA-256 using ARMv8 Crypto Extensions arm64/crypto: SHA-1 using ARMv8 Crypto Extensions arm64: add support for kernel mode NEON in interrupt context arm64: defer reloading a task's FPSIMD state to userland resume arm64: add abstractions for FPSIMD state manipulation asm-generic: allow generic unaligned access if the arch supports it Conflicts: arch/arm64/include/asm/thread_info.h
2014-05-12arm64: is_compat_task is defined both in asm/compat.h and linux/compat.hAKASHI Takahiro
Some kernel files may include both linux/compat.h and asm/compat.h directly or indirectly. Since both header files contain is_compat_task() under !CONFIG_COMPAT, compiling them with !CONFIG_COMPAT will eventually fail. Such files include kernel/auditsc.c, kernel/seccomp.c and init/do_mountfs.c (do_mountfs.c may read asm/compat.h via asm/ftrace.h once ftrace is implemented). So this patch proactively 1) removes is_compat_task() under !CONFIG_COMPAT from asm/compat.h 2) replaces asm/compat.h to linux/compat.h in kernel/*.c, but asm/compat.h is still necessary in ptrace.c and process.c because they use is_compat_thread(). Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-05-09arm64: Expose ESR_EL1 information to user when SIGSEGV/SIGBUSCatalin Marinas
This information is useful for instruction emulators to detect read/write and access size without having to decode the faulting instruction. The current patch exports it via sigcontext (struct esr_context) and is only valid for SIGSEGV and SIGBUS. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-05-09arm64: Remove the aux_context structureCatalin Marinas
This patch removes the aux_context structure (and the containing file) to allow the placement of the _aarch64_ctx end magic based on the context stored on the signal stack. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-05-08arm64: defer reloading a task's FPSIMD state to userland resumeArd Biesheuvel
If a task gets scheduled out and back in again and nothing has touched its FPSIMD state in the mean time, there is really no reason to reload it from memory. Similarly, repeated calls to kernel_neon_begin() and kernel_neon_end() will preserve and restore the FPSIMD state every time. This patch defers the FPSIMD state restore to the last possible moment, i.e., right before the task returns to userland. If a task does not return to userland at all (for any reason), the existing FPSIMD state is preserved and may be reused by the owning task if it gets scheduled in again on the same CPU. This patch adds two more functions to abstract away from straight FPSIMD register file saves and restores: - fpsimd_restore_current_state -> ensure current's FPSIMD state is loaded - fpsimd_flush_task_state -> invalidate live copies of a task's FPSIMD state Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
2014-05-08arm64: add abstractions for FPSIMD state manipulationArd Biesheuvel
There are two tacit assumptions in the FPSIMD handling code that will no longer hold after the next patch that optimizes away some FPSIMD state restores: . the FPSIMD registers of this CPU contain the userland FPSIMD state of task 'current'; . when switching to a task, its FPSIMD state will always be restored from memory. This patch adds the following functions to abstract away from straight FPSIMD register file saves and restores: - fpsimd_preserve_current_state -> ensure current's FPSIMD state is saved - fpsimd_update_current_state -> replace current's FPSIMD state Where necessary, the signal handling and fork code are updated to use the above wrappers instead of poking into the FPSIMD registers directly. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
2013-02-14arm64: switch to generic sigaltstackAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-11-23arm64: signal: return struct rt_sigframe from get_sigframeWill Deacon
We only have one type of frame (rt_sigframe) for arm64, so just return that type directly and dispense with the framesize argument, which is presumably a hangover from code copied from arch/arm/. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2012-11-23arm64: signal: push the unwinding prologue on the signal stackWill Deacon
To allow debuggers to unwind through signal frames, we create a fake stack unwinding prologue containing the link register and frame pointer of the interrupted context. The signal frame is then offset by 16 bytes to make room for the two saved registers which are pushed onto the frame of the *interrupted* context, rather than placed directly above the signal stack. This doesn't work when an alternative signal stack is set up for a SEGV handler, which is raised in response to RLIMIT_STACK being reached. In this case, we try to push the unwinding prologue onto the full stack and subsequently take a fault which we fail to resolve, causing setup_return to return -EFAULT and handle_signal to force_sigsegv on the current task. This patch fixes the problem by including the unwinding prologue as part of the rt_sigframe definition, which is populated during setup_sigframe, ensuring that it always ends up on the signal stack. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: <stable@vger.kernel.org>
2012-09-17arm64: Signal handling supportCatalin Marinas
This patch adds support for signal handling. The sigreturn is done via VDSO, introduced by a previous patch. The SA_RESTORER is still defined as it is required for 32-bit (compat) support but it is not to be used for 64-bit applications. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Tony Lindgren <tony@atomide.com> Acked-by: Nicolas Pitre <nico@linaro.org> Acked-by: Olof Johansson <olof@lixom.net> Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>