aboutsummaryrefslogtreecommitdiffstats
path: root/arch/x86/entry
AgeCommit message (Collapse)Author
2021-02-02x86/entry: Emit a symbol for register restoring thunkNick Desaulniers
Arnd found a randconfig that produces the warning: arch/x86/entry/thunk_64.o: warning: objtool: missing symbol for insn at offset 0x3e when building with LLVM_IAS=1 (Clang's integrated assembler). Josh notes: With the LLVM assembler not generating section symbols, objtool has no way to reference this code when it generates ORC unwinder entries, because this code is outside of any ELF function. The limitation now being imposed by objtool is that all code must be contained in an ELF symbol. And .L symbols don't create such symbols. So basically, you can use an .L symbol *inside* a function or a code segment, you just can't use the .L symbol to contain the code using a SYM_*_START/END annotation pair. Fangrui notes that this optimization is helpful for reducing image size when compiling with -ffunction-sections and -fdata-sections. I have observed on the order of tens of thousands of symbols for the kernel images built with those flags. A patch has been authored against GNU binutils to match this behavior of not generating unused section symbols ([1]), so this will also become a problem for users of GNU binutils once they upgrade to 2.36. Omit the .L prefix on a label so that the assembler will emit an entry into the symbol table for the label, with STB_LOCAL binding. This enables objtool to generate proper unwind info here with LLVM_IAS=1 or GNU binutils 2.36+. [ bp: Massage commit message. ] Reported-by: Arnd Bergmann <arnd@arndb.de> Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com> Suggested-by: Borislav Petkov <bp@alien8.de> Suggested-by: Mark Brown <broonie@kernel.org> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lkml.kernel.org/r/20210112194625.4181814-1-ndesaulniers@google.com Link: https://github.com/ClangBuiltLinux/linux/issues/1209 Link: https://reviews.llvm.org/D93783 Link: https://sourceware.org/binutils/docs/as/Symbol-Names.html Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=d1bcae833b32f1408485ce69f844dcd7ded093a8 [1]
2021-01-17x86/asm/32: Add ENDs to some functions and relabel with SYM_CODE_*Jiri Slaby
commit 78762b0e79bc1dd01347be061abdf505202152c9 upstream. All these are functions which are invoked from elsewhere but they are not typical C functions. So annotate them using the new SYM_CODE_START. All these were not balanced with any END, so mark their ends by SYM_CODE_END, appropriately. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> [xen bits] Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [hibernate] Cc: Andy Lutomirski <luto@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Len Brown <len.brown@intel.com> Cc: linux-arch@vger.kernel.org Cc: linux-pm@vger.kernel.org Cc: Pavel Machek <pavel@ucw.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Pingfan Liu <kernelfans@gmail.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Cc: xen-devel@lists.xenproject.org Link: https://lkml.kernel.org/r/20191011115108.12392-26-jslaby@suse.cz Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-05-14x86/unwind/orc: Fix premature unwind stoppage due to IRET framesJosh Poimboeuf
commit 81b67439d147677d844d492fcbd03712ea438f42 upstream. The following execution path is possible: fsnotify() [ realign the stack and store previous SP in R10 ] <IRQ> [ only IRET regs saved ] common_interrupt() interrupt_entry() <NMI> [ full pt_regs saved ] ... [ unwind stack ] When the unwinder goes through the NMI and the IRQ on the stack, and then sees fsnotify(), it doesn't have access to the value of R10, because it only has the five IRET registers. So the unwind stops prematurely. However, because the interrupt_entry() code is careful not to clobber R10 before saving the full regs, the unwinder should be able to read R10 from the previously saved full pt_regs associated with the NMI. Handle this case properly. When encountering an IRET regs frame immediately after a full pt_regs frame, use the pt_regs as a backup which can be used to get the C register values. Also, note that a call frame resets the 'prev_regs' value, because a function is free to clobber the registers. For this fix to work, the IRET and full regs frames must be adjacent, with no FUNC frames in between. So replace the FUNC hint in interrupt_entry() with an IRET_REGS hint. Fixes: ee9f8fce9964 ("x86/unwind: Add the ORC unwinder") Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Jones <dsj@fb.com> Cc: Jann Horn <jannh@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lore.kernel.org/r/97a408167cc09f1cfa0de31a7b70dd88868d743f.1587808742.git.jpoimboe@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-14x86/entry/64: Fix unwind hints in rewind_stack_do_exit()Jann Horn
commit f977df7b7ca45a4ac4b66d30a8931d0434c394b1 upstream. The LEAQ instruction in rewind_stack_do_exit() moves the stack pointer directly below the pt_regs at the top of the task stack before calling do_exit(). Tell the unwinder to expect pt_regs. Fixes: 8c1f75587a18 ("x86/entry/64: Add unwind hint annotations") Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Jones <dsj@fb.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lore.kernel.org/r/68c33e17ae5963854916a46f522624f8e1d264f2.1587808742.git.jpoimboe@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-14x86/entry/64: Fix unwind hints in kernel exit pathJosh Poimboeuf
commit 1fb143634a38095b641a3a21220774799772dc4c upstream. In swapgs_restore_regs_and_return_to_usermode, after the stack is switched to the trampoline stack, the existing UNWIND_HINT_REGS hint is no longer valid, which can result in the following ORC unwinder warning: WARNING: can't dereference registers at 000000003aeb0cdd for ip swapgs_restore_regs_and_return_to_usermode+0x93/0xa0 For full correctness, we could try to add complicated unwind hints so the unwinder could continue to find the registers, but when when it's this close to kernel exit, unwind hints aren't really needed anymore and it's fine to just use an empty hint which tells the unwinder to stop. For consistency, also move the UNWIND_HINT_EMPTY in entry_SYSCALL_64_after_hwframe to a similar location. Fixes: 3e3b9293d392 ("x86/entry/64: Return to userspace from the trampoline stack") Reported-by: Vince Weaver <vincent.weaver@maine.edu> Reported-by: Dave Jones <dsj@fb.com> Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reported-by: Joe Mario <jmario@redhat.com> Reported-by: Jann Horn <jannh@google.com> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/60ea8f562987ed2d9ace2977502fe481c0d7c9a0.1587808742.git.jpoimboe@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-14x86/entry/64: Fix unwind hints in register clearing codeJosh Poimboeuf
commit 06a9750edcffa808494d56da939085c35904e618 upstream. The PUSH_AND_CLEAR_REGS macro zeroes each register immediately after pushing it. If an NMI or exception hits after a register is cleared, but before the UNWIND_HINT_REGS annotation, the ORC unwinder will wrongly think the previous value of the register was zero. This can confuse the unwinding process and cause it to exit early. Because ORC is simpler than DWARF, there are a limited number of unwind annotation states, so it's not possible to add an individual unwind hint after each push/clear combination. Instead, the register clearing instructions need to be consolidated and moved to after the UNWIND_HINT_REGS annotation. Fixes: 3f01daecd545 ("x86/entry/64: Introduce the PUSH_AND_CLEAN_REGS macro") Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Jones <dsj@fb.com> Cc: Jann Horn <jannh@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lore.kernel.org/r/68fd3d0bc92ae2d62ff7879d15d3684217d51f08.1587808742.git.jpoimboe@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-17x86/entry/32: Add missing ASM_CLAC to general_protection entryThomas Gleixner
commit 3d51507f29f2153a658df4a0674ec5b592b62085 upstream. All exception entry points must have ASM_CLAC right at the beginning. The general_protection entry is missing one. Fixes: e59d1b0a2419 ("x86-32, smap: Add STAC/CLAC instructions to 32-bit kernel entry") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20200225220216.219537887@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-24x86/vdso: Provide missing include fileValdis Klētnieks
[ Upstream commit bff47c2302cc249bcd550b17067f8dddbd4b6f77 ] When building with C=1, sparse issues a warning: CHECK arch/x86/entry/vdso/vdso32-setup.c arch/x86/entry/vdso/vdso32-setup.c:28:28: warning: symbol 'vdso32_enabled' was not declared. Should it be static? Provide the missing header file. Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/36224.1575599767@turing-police Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-17syscalls/x86: Use the correct function type for sys_ni_syscallSami Tolvanen
commit f48f01a92cca09e86d46c91d8edf9d5a71c61727 upstream. Use the correct function type for sys_ni_syscall() in system call tables to fix indirect call mismatches with Control-Flow Integrity (CFI) checking. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H . Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20191008224049.115427-5-samitolvanen@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-17syscalls/x86: Use COMPAT_SYSCALL_DEFINE0 for IA32 (rt_)sigreturnSami Tolvanen
commit 00198a6eaf66609de5e4de9163bb42c7ca9dd7b7 upstream. Use COMPAT_SYSCALL_DEFINE0 to define (rt_)sigreturn() syscalls to replace sys32_sigreturn() and sys32_rt_sigreturn(). This fixes indirect call mismatches with Control-Flow Integrity (CFI) checking. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H . Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20191008224049.115427-4-samitolvanen@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-29x86/entry/32: Fix FIXUP_ESPFIX_STACK with user CR3Andy Lutomirski
commit 4a13b0e3e10996b9aa0b45a764ecfe49f6fcd360 upstream. UNWIND_ESPFIX_STACK needs to read the GDT, and the GDT mapping that can be accessed via %fs is not mapped in the user pagetables. Use SGDT to find the cpu_entry_area mapping and read the espfix offset from that instead. Reported-and-tested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: <stable@vger.kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-29x86/entry/32: Fix NMI vs ESPFIXPeter Zijlstra
commit 895429076512e9d1cf5428181076299c90713159 upstream. When the NMI lands on an ESPFIX_SS, we are on the entry stack and must swizzle, otherwise we'll run do_nmi() on the entry stack, which is BAD. Also, similar to the normal exception path, we need to correct the ESPFIX magic before leaving the entry stack, otherwise pt_regs will present a non-flat stack pointer. Tested by running sigreturn_32 concurrent with perf-record. Fixes: e5862d0515ad ("x86/entry/32: Leave the kernel via trampoline stack") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-29x86/entry/32: Unwind the ESPFIX stack earlier on exception entryAndy Lutomirski
commit a1a338e5b6fe9e0a39c57c232dc96c198bb53e47 upstream. Right now, we do some fancy parts of the exception entry path while SS might have a nonzero base: we fill in regs->ss and regs->sp, and we consider switching to the kernel stack. This results in regs->ss and regs->sp referring to a non-flat stack and it may result in overflowing the entry stack. The former issue means that we can try to call iret_exc on a non-flat stack, which doesn't work. Tested with selftests/x86/sigreturn_32. Fixes: 45d7b255747c ("x86/entry/32: Enter the kernel via trampoline stack") Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-29x86/entry/32: Move FIXUP_FRAME after pushing %fs in SAVE_ALLAndy Lutomirski
commit 82cb8a0b1d8d07817b5d59f7fa1438e1fceafab2 upstream. This will allow us to get percpu access working before FIXUP_FRAME, which will allow us to unwind ESPFIX earlier. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-29x86/entry/32: Use %ss segment where requiredAndy Lutomirski
commit 4c4fd55d3d59a41ddfa6ecba7e76928921759f43 upstream. When re-building the IRET frame we use %eax as an destination %esp, make sure to then also match the segment for when there is a nonzero SS base (ESPFIX). [peterz: Changelog and minor edits] Fixes: 3c88c692c287 ("x86/stackframe/32: Provide consistent pt_regs") Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-29x86/entry/32: Fix IRET exceptionPeter Zijlstra
commit 40ad2199580e248dce2a2ebb722854180c334b9e upstream. As reported by Lai, the commit 3c88c692c287 ("x86/stackframe/32: Provide consistent pt_regs") wrecked the IRET EXTABLE entry by making .Lirq_return not point at IRET. Fix this by placing IRET_FRAME in RESTORE_REGS, to mirror how FIXUP_FRAME is part of SAVE_ALL. Fixes: 3c88c692c287 ("x86/stackframe/32: Provide consistent pt_regs") Reported-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-29x86/xen/32: Make xen_iret_crit_fixup() independent of frame layoutJan Beulich
commit 29b810f5a5ec127d3143770098e05981baa3eb77 upstream. Now that SS:ESP always get saved by SAVE_ALL, this also needs to be accounted for in xen_iret_crit_fixup(). Otherwise the old_ax value gets interpreted as EFLAGS, and hence VM86 mode appears to be active all the time, leading to random "vm86_32: no user_vm86: BAD" log messages alongside processes randomly crashing. Since following the previous model (sitting after SAVE_ALL) would further complicate the code _and_ retain the dependency of xen_iret_crit_fixup() on frame manipulations done by entry_32.S, switch things around and do the adjustment ahead of SAVE_ALL. Fixes: 3c88c692c287 ("x86/stackframe/32: Provide consistent pt_regs") Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Stable Team <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/32d8713d-25a7-84ab-b74b-aa3e88abce6b@suse.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-29x86/stackframe/32: Repair 32-bit Xen PVJan Beulich
commit 81ff2c37f9e5d77593928df0536d86443195fd64 upstream. Once again RPL checks have been introduced which don't account for a 32-bit kernel living in ring 1 when running in a PV Xen domain. The case in FIXUP_FRAME has been preventing boot. Adjust BUG_IF_WRONG_CR3 as well to guard against future uses of the macro on a code path reachable when running in PV mode under Xen; I have to admit that I stopped at a certain point trying to figure out whether there are present ones. Fixes: 3c88c692c287 ("x86/stackframe/32: Provide consistent pt_regs") Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Stable Team <stable@vger.kernel.org> Link: https://lore.kernel.org/r/0fad341f-b7f5-f859-d55d-f0084ee7087e@suse.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-20Merge tag 'kbuild-v5.4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - add modpost warn exported symbols marked as 'static' because 'static' and EXPORT_SYMBOL is an odd combination - break the build early if gold linker is used - optimize the Bison rule to produce .c and .h files by a single pattern rule - handle PREEMPT_RT in the module vermagic and UTS_VERSION - warn CONFIG options leaked to the user-space except existing ones - make single targets work properly - rebuild modules when module linker scripts are updated - split the module final link stage into scripts/Makefile.modfinal - fix the missed error code in merge_config.sh - improve the error message displayed on the attempt of the O= build in unclean source tree - remove 'clean-dirs' syntax - disable -Wimplicit-fallthrough warning for Clang - add CONFIG_CC_OPTIMIZE_FOR_SIZE_O3 for ARC - remove ARCH_{CPP,A,C}FLAGS variables - add $(BASH) to run bash scripts - change *CFLAGS_<basetarget>.o to take the relative path to $(obj) instead of the basename - stop suppressing Clang's -Wunused-function warnings when W=1 - fix linux/export.h to avoid genksyms calculating CRC of trimmed exported symbols - misc cleanups * tag 'kbuild-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (63 commits) genksyms: convert to SPDX License Identifier for lex.l and parse.y modpost: use __section in the output to *.mod.c modpost: use MODULE_INFO() for __module_depends export.h, genksyms: do not make genksyms calculate CRC of trimmed symbols export.h: remove defined(__KERNEL__), which is no longer needed kbuild: allow Clang to find unused static inline functions for W=1 build kbuild: rename KBUILD_ENABLE_EXTRA_GCC_CHECKS to KBUILD_EXTRA_WARN kbuild: refactor scripts/Makefile.extrawarn merge_config.sh: ignore unwanted grep errors kbuild: change *FLAGS_<basetarget>.o to take the path relative to $(obj) modpost: add NOFAIL to strndup modpost: add guid_t type definition kbuild: add $(BASH) to run scripts with bash-extension kbuild: remove ARCH_{CPP,A,C}FLAGS kbuild,arc: add CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE_O3 for ARC kbuild: Do not enable -Wimplicit-fallthrough for clang for now kbuild: clean up subdir-ymn calculation in Makefile.clean kbuild: remove unneeded '+' marker from cmd_clean kbuild: remove clean-dirs syntax kbuild: check clean srctree even earlier ...
2019-09-17Merge branch 'timers-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core timer updates from Thomas Gleixner: "Timers and timekeeping updates: - A large overhaul of the posix CPU timer code which is a preparation for moving the CPU timer expiry out into task work so it can be properly accounted on the task/process. An update to the bogus permission checks will come later during the merge window as feedback was not complete before heading of for travel. - Switch the timerqueue code to use cached rbtrees and get rid of the homebrewn caching of the leftmost node. - Consolidate hrtimer_init() + hrtimer_init_sleeper() calls into a single function - Implement the separation of hrtimers to be forced to expire in hard interrupt context even when PREEMPT_RT is enabled and mark the affected timers accordingly. - Implement a mechanism for hrtimers and the timer wheel to protect RT against priority inversion and live lock issues when a (hr)timer which should be canceled is currently executing the callback. Instead of infinitely spinning, the task which tries to cancel the timer blocks on a per cpu base expiry lock which is held and released by the (hr)timer expiry code. - Enable the Hyper-V TSC page based sched_clock for Hyper-V guests resulting in faster access to timekeeping functions. - Updates to various clocksource/clockevent drivers and their device tree bindings. - The usual small improvements all over the place" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (101 commits) posix-cpu-timers: Fix permission check regression posix-cpu-timers: Always clear head pointer on dequeue hrtimer: Add a missing bracket and hide `migration_base' on !SMP posix-cpu-timers: Make expiry_active check actually work correctly posix-timers: Unbreak CONFIG_POSIX_TIMERS=n build tick: Mark sched_timer to expire in hard interrupt context hrtimer: Add kernel doc annotation for HRTIMER_MODE_HARD x86/hyperv: Hide pv_ops access for CONFIG_PARAVIRT=n posix-cpu-timers: Utilize timerqueue for storage posix-cpu-timers: Move state tracking to struct posix_cputimers posix-cpu-timers: Deduplicate rlimit handling posix-cpu-timers: Remove pointless comparisons posix-cpu-timers: Get rid of 64bit divisions posix-cpu-timers: Consolidate timer expiry further posix-cpu-timers: Get rid of zero checks rlimit: Rewrite non-sensical RLIMIT_CPU comment posix-cpu-timers: Respect INFINITY for hard RTTIME limit posix-cpu-timers: Switch thread group sampling to array posix-cpu-timers: Restructure expiry array posix-cpu-timers: Remove cputime_expires ...
2019-09-16Merge branch 'x86-entry-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 entry updates from Ingo Molnar: "This contains x32 and compat syscall improvements, the biggest one of which splits x32 syscalls into their own table, which allows new syscalls to share the x32 and x86-64 number - which turns the 512-547 special syscall numbers range into a legacy wart that won't be extended going forward" * 'x86-entry-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/syscalls: Split the x32 syscalls into their own table x86/syscalls: Disallow compat entries for all types of 64-bit syscalls x86/syscalls: Use the compat versions of rt_sigsuspend() and rt_sigprocmask() x86/syscalls: Make __X32_SYSCALL_BIT be unsigned long
2019-09-16Merge branch 'x86-asm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 asm updates from Ingo Molnar: - Add UMIP emulation/spoofing for 64-bit processes as well, because of Wine based gaming. - Clean up symbols/labels in low level asm code - Add an assembly optimized mul_u64_u32_div() implementation on x86-64. * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/umip: Add emulation (spoofing) for UMIP covered instructions in 64-bit processes as well x86/asm: Make some functions local labels x86/asm/suspend: Get rid of bogus_64_magic x86/math64: Provide a sane mul_u64_u32_div() implementation for x86_64
2019-09-16Merge branch 'sched-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: - MAINTAINERS: Add Mark Rutland as perf submaintainer, Juri Lelli and Vincent Guittot as scheduler submaintainers. Add Dietmar Eggemann, Steven Rostedt, Ben Segall and Mel Gorman as scheduler reviewers. As perf and the scheduler is getting bigger and more complex, document the status quo of current responsibilities and interests, and spread the review pain^H^H^H^H fun via an increase in the Cc: linecount generated by scripts/get_maintainer.pl. :-) - Add another series of patches that brings the -rt (PREEMPT_RT) tree closer to mainline: split the monolithic CONFIG_PREEMPT dependencies into a new CONFIG_PREEMPTION category that will allow the eventual introduction of CONFIG_PREEMPT_RT. Still a few more hundred patches to go though. - Extend the CPU cgroup controller with uclamp.min and uclamp.max to allow the finer shaping of CPU bandwidth usage. - Micro-optimize energy-aware wake-ups from O(CPUS^2) to O(CPUS). - Improve the behavior of high CPU count, high thread count applications running under cpu.cfs_quota_us constraints. - Improve balancing with SCHED_IDLE (SCHED_BATCH) tasks present. - Improve CPU isolation housekeeping CPU allocation NUMA locality. - Fix deadline scheduler bandwidth calculations and logic when cpusets rebuilds the topology, or when it gets deadline-throttled while it's being offlined. - Convert the cpuset_mutex to percpu_rwsem, to allow it to be used from setscheduler() system calls without creating global serialization. Add new synchronization between cpuset topology-changing events and the deadline acceptance tests in setscheduler(), which were broken before. - Rework the active_mm state machine to be less confusing and more optimal. - Rework (simplify) the pick_next_task() slowpath. - Improve load-balancing on AMD EPYC systems. - ... and misc cleanups, smaller fixes and improvements - please see the Git log for more details. * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits) sched/psi: Correct overly pessimistic size calculation sched/fair: Speed-up energy-aware wake-ups sched/uclamp: Always use 'enum uclamp_id' for clamp_id values sched/uclamp: Update CPU's refcount on TG's clamp changes sched/uclamp: Use TG's clamps to restrict TASK's clamps sched/uclamp: Propagate system defaults to the root group sched/uclamp: Propagate parent clamps sched/uclamp: Extend CPU's cgroup controller sched/topology: Improve load balancing on AMD EPYC systems arch, ia64: Make NUMA select SMP sched, perf: MAINTAINERS update, add submaintainers and reviewers sched/fair: Use rq_lock/unlock in online_fair_sched_group cpufreq: schedutil: fix equation in comment sched: Rework pick_next_task() slow-path sched: Allow put_prev_task() to drop rq->lock sched/fair: Expose newidle_balance() sched: Add task_struct pointer to sched_class::set_curr_task sched: Rework CPU hotplug task selection sched/{rt,deadline}: Fix set_next_task vs pick_next_task sched: Fix kerneldoc comment for ia64_set_curr_task ...
2019-09-06x86/asm: Make some functions local labelsJiri Slaby
Boris suggests to make a local label (prepend ".L") to these functions to eliminate them from the symbol table. These are functions with very local names and really should not be visible anywhere. Note that objtool won't see these functions anymore (to generate ORC debug info). But all the functions are not annotated with ENDPROC, so they won't have objtool's attention anyway. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Cao jin <caoj.fnst@cn.fujitsu.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steve Winslow <swinslow@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Huang <wei@redhat.com> Cc: x86-ml <x86@kernel.org> Cc: Xiaoyao Li <xiaoyao.li@linux.intel.com> Link: https://lkml.kernel.org/r/20190906075550.23435-2-jslaby@suse.cz
2019-09-04kbuild: change *FLAGS_<basetarget>.o to take the path relative to $(obj)Masahiro Yamada
Kbuild provides per-file compiler flag addition/removal: CFLAGS_<basetarget>.o CFLAGS_REMOVE_<basetarget>.o AFLAGS_<basetarget>.o AFLAGS_REMOVE_<basetarget>.o CPPFLAGS_<basetarget>.lds HOSTCFLAGS_<basetarget>.o HOSTCXXFLAGS_<basetarget>.o The <basetarget> is the filename of the target with its directory and suffix stripped. This syntax comes into a trouble when two files with the same basename appear in one Makefile, for example: obj-y += foo.o obj-y += dir/foo.o CFLAGS_foo.o := <some-flags> Here, the <some-flags> applies to both foo.o and dir/foo.o The real world problem is: scripts/kconfig/util.c scripts/kconfig/lxdialog/util.c Both files are compiled into scripts/kconfig/mconf, but only the latter should be given with the ncurses flags. It is more sensible to use the relative path to the Makefile, like this: obj-y += foo.o CFLAGS_foo.o := <some-flags> obj-y += dir/foo.o CFLAGS_dir/foo.o := <other-flags> At first, I attempted to replace $(basetarget) with $*. The $* variable is replaced with the stem ('%') part in a pattern rule. This works with most of cases, but does not for explicit rules. For example, arch/ia64/lib/Makefile reuses rule_as_o_S in its own explicit rules, so $* will be empty, resulting in ignoring the per-file AFLAGS. I introduced a new variable, target-stem, which can be used also from explicit rules. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Acked-by: Marc Zyngier <maz@kernel.org>
2019-08-23clocksource/drivers/hyperv: Allocate Hyper-V TSC page staticallyTianyu Lan
Prepare to add Hyper-V sched clock callback and move Hyper-V Reference TSC initialization much earlier in the boot process. Earlier initialization is needed so that it happens while the timestamp value is still 0 and no discontinuity in the timestamp will occur when pv_ops.time.sched_clock calculates its offset. The earlier initialization requires that the Hyper-V TSC page be allocated statically instead of with vmalloc(), so fixup the references to the TSC page and the method of getting its physical address. Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lkml.kernel.org/r/20190814123216.32245-2-Tianyu.Lan@microsoft.com
2019-07-31x86: Use CONFIG_PREEMPTIONThomas Gleixner
CONFIG_PREEMPTION is selected by CONFIG_PREEMPT and by CONFIG_PREEMPT_RT. Both PREEMPT and PREEMPT_RT require the same functionality which today depends on CONFIG_PREEMPT. Switch the entry code, preempt and kprobes conditionals over to CONFIG_PREEMPTION. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul E. McKenney <paulmck@linux.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20190726212124.608488448@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-28Merge branch master from ↵Thomas Gleixner
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git Pick up the spectre documentation so the Grand Schemozzle can be added.
2019-07-24x86/entry/32: Pass cr2 to do_async_page_fault()Matt Mullins
Commit a0d14b8909de ("x86/mm, tracing: Fix CR2 corruption") added the address parameter to do_async_page_fault(), but does not pass it from the 32-bit entry point. To plumb it through, factor-out common_exception_read_cr2 in the same fashion as common_exception, and uses it from both page_fault and async_page_fault. For a 32-bit KVM guest, this fixes: Run /sbin/init as init process Starting init: /sbin/init exists but couldn't execute it (error -14) Fixes: a0d14b8909de ("x86/mm, tracing: Fix CR2 corruption") Signed-off-by: Matt Mullins <mmullins@fb.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190724042058.24506-1-mmullins@fb.com
2019-07-22x86/syscalls: Split the x32 syscalls into their own tableAndy Lutomirski
For unfortunate historical reasons, the x32 syscalls and the x86_64 syscalls are not all numbered the same. As an example, ioctl() is nr 16 on x86_64 but 514 on x32. This has potentially nasty consequences, since it means that there are two valid RAX values to do ioctl(2) and two invalid RAX values. The valid values are 16 (i.e. ioctl(2) using the x86_64 ABI) and (514 | 0x40000000) (i.e. ioctl(2) using the x32 ABI). The invalid values are 514 and (16 | 0x40000000). 514 will enter the "COMPAT_SYSCALL_DEFINE3(ioctl, ...)" entry point with in_compat_syscall() and in_x32_syscall() returning false, whereas (16 | 0x40000000) will enter the native entry point with in_compat_syscall() and in_x32_syscall() returning true. Both are bogus, and both will exercise code paths in the kernel and in any running seccomp filters that really ought to be unreachable. Splitting out the x32 syscalls into their own tables, allows both bogus invocations to return -ENOSYS. I've checked glibc, musl, and Bionic, and all of them appear to call syscalls with their correct numbers, so this change should have no effect on them. There is an added benefit going forward: new syscalls that need special handling on x32 can share the same number on x32 and x86_64. This means that the special syscall range 512-547 can be treated as a legacy wart instead of something that may need to be extended in the future. Also add a selftest to verify the new behavior. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/208024256b764312598f014ebfb0a42472c19354.1562185330.git.luto@kernel.org
2019-07-22x86/syscalls: Disallow compat entries for all types of 64-bit syscallsAndy Lutomirski
A "compat" entry in the syscall tables means to use a different entry on 32-bit and 64-bit builds. This only makes sense for syscalls that exist in the first place in 32-bit builds, so disallow it for anything other than i386. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/4b7565954c5a06530ac01d98cb1592538fd8ae51.1562185330.git.luto@kernel.org
2019-07-22x86/syscalls: Use the compat versions of rt_sigsuspend() and rt_sigprocmask()Andy Lutomirski
I'm working on some code that detects at build time if there's a COMPAT_SYSCALL_DEFINE() that is not referenced in the x86 syscall tables. It catches three offenders: rt_sigsuspend(), rt_sigprocmask(), and sendfile64(). For rt_sigsuspend() and rt_sigprocmask(), the only potential difference between the native and compat versions is that the compat version converts the sigset_t, but, on little endian architectures, the conversion is a no-op. This is why they both currently work on x86. To make the code more consistent, and to make the upcoming patches work, rewire x86 to use the compat vesions. sendfile64() is more complicated, and will be addressed separately. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/51643ac3157b5921eae0e172a8a0b1d953e68ebb.1562185330.git.luto@kernel.org
2019-07-20Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A set of x86 specific fixes and updates: - The CR2 corruption fixes which store CR2 early in the entry code and hand the stored address to the fault handlers. - Revert a forgotten leftover of the dropped FSGSBASE series. - Plug a memory leak in the boot code. - Make the Hyper-V assist functionality robust by zeroing the shadow page. - Remove a useless check for dead processes with LDT - Update paravirt and VMware maintainers entries. - A few cleanup patches addressing various compiler warnings" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/entry/64: Prevent clobbering of saved CR2 value x86/hyper-v: Zero out the VP ASSIST PAGE on allocation x86, boot: Remove multiple copy of static function sanitize_boot_params() x86/boot/compressed/64: Remove unused variable x86/boot/efi: Remove unused variables x86/mm, tracing: Fix CR2 corruption x86/entry/64: Update comments and sanity tests for create_gap x86/entry/64: Simplify idtentry a little x86/entry/32: Simplify common_exception x86/paravirt: Make read_cr2() CALLEE_SAVE MAINTAINERS: Update PARAVIRT_OPS_INTERFACE and VMWARE_HYPERVISOR_INTERFACE x86/process: Delete useless check for dead process with LDT x86: math-emu: Hide clang warnings for 16-bit overflow x86/e820: Use proper booleans instead of 0/1 x86/apic: Silence -Wtype-limits compiler warnings x86/mm: Free sme_early_buffer after init x86/boot: Fix memory leak in default_get_smp_config() Revert "x86/ptrace: Prevent ptrace from clearing the FS/GS selector" and fix the test
2019-07-20Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core fixes from Thomas Gleixner: - A collection of objtool fixes which address recent fallout partially exposed by newer toolchains, clang, BPF and general code changes. - Force USER_DS for user stack traces [ Note: the "objtool fixes" are not all to objtool itself, but for kernel code that triggers objtool warnings. Things like missing function size annotations, or code that confuses the unwinder etc. - Linus] * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits) objtool: Support conditional retpolines objtool: Convert insn type to enum objtool: Fix seg fault on bad switch table entry objtool: Support repeated uses of the same C jump table objtool: Refactor jump table code objtool: Refactor sibling call detection logic objtool: Do frame pointer check before dead end check objtool: Change dead_end_function() to return boolean objtool: Warn on zero-length functions objtool: Refactor function alias logic objtool: Track original function across branches objtool: Add mcsafe_handle_tail() to the uaccess safe list bpf: Disable GCC -fgcse optimization for ___bpf_prog_run() x86/uaccess: Remove redundant CLACs in getuser/putuser error paths x86/uaccess: Don't leak AC flag into fentry from mcsafe_handle_tail() x86/uaccess: Remove ELF function annotation from copy_user_handle_tail() x86/head/64: Annotate start_cpu0() as non-callable x86/entry: Fix thunk function ELF sizes x86/kvm: Don't call kvm_spurious_fault() from .fixup x86/kvm: Replace vmx_vmenter()'s call to kvm_spurious_fault() with UD2 ...
2019-07-20Merge tag 'kbuild-v5.3-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull more Kbuild updates from Masahiro Yamada: - match the directory structure of the linux-libc-dev package to that of Debian-based distributions - fix incorrect include/config/auto.conf generation when Kconfig creates it along with the .config file - remove misleading $(AS) from documents - clean up precious tag files by distclean instead of mrproper - add a new coccinelle patch for devm_platform_ioremap_resource migration - refactor module-related scripts to read modules.order instead of $(MODVERDIR)/*.mod files to get the list of created modules - remove MODVERDIR - update list of header compile-test - add -fcf-protection=none flag to avoid conflict with the retpoline flags when CONFIG_RETPOLINE=y - misc cleanups * tag 'kbuild-v5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (25 commits) kbuild: add -fcf-protection=none when using retpoline flags kbuild: update compile-test header list for v5.3-rc1 kbuild: split out *.mod out of {single,multi}-used-m rules kbuild: remove 'prepare1' target kbuild: remove the first line of *.mod files kbuild: create *.mod with full directory path and remove MODVERDIR kbuild: export_report: read modules.order instead of .tmp_versions/*.mod kbuild: modpost: read modules.order instead of $(MODVERDIR)/*.mod kbuild: modsign: read modules.order instead of $(MODVERDIR)/*.mod kbuild: modinst: read modules.order instead of $(MODVERDIR)/*.mod scsi: remove pointless $(MODVERDIR)/$(obj)/53c700.ver kbuild: remove duplication from modules.order in sub-directories kbuild: get rid of kernel/ prefix from in-tree modules.{order,builtin} kbuild: do not create empty modules.order in the prepare stage coccinelle: api: add devm_platform_ioremap_resource script kbuild: compile-test headers listed in header-test-m as well kbuild: remove unused hostcc-option kbuild: remove tag files by distclean instead of mrproper kbuild: add --hash-style= and --build-id unconditionally kbuild: get rid of misleading $(AS) from documents ...
2019-07-20x86/entry/64: Prevent clobbering of saved CR2 valueThomas Gleixner
The recent fix for CR2 corruption introduced a new way to reliably corrupt the saved CR2 value. CR2 is saved early in the entry code in RDX, which is the third argument to the fault handling functions. But it missed that between saving and invoking the fault handler enter_from_user_mode() can be called. RDX is a caller saved register so the invoked function can freely clobber it with the obvious consequences. The TRACE_IRQS_OFF call is safe as it calls through the thunk which preserves RDX, but TRACE_IRQS_OFF_DEBUG is not because it also calls into C-code outside of the thunk. Store CR2 in R12 instead which is a callee saved register and move R12 to RDX just before calling the fault handler. Fixes: a0d14b8909de ("x86/mm, tracing: Fix CR2 corruption") Reported-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1907201020540.1782@nanos.tec.linutronix.de
2019-07-19Merge tag 'for-linus-5.3a-rc1-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from Juergen Gross: "Fixes and features: - A series to introduce a common command line parameter for disabling paravirtual extensions when running as a guest in virtualized environment - A fix for int3 handling in Xen pv guests - Removal of the Xen-specific tmem driver as support of tmem in Xen has been dropped (and it was experimental only) - A security fix for running as Xen dom0 (XSA-300) - A fix for IRQ handling when offlining cpus in Xen guests - Some small cleanups" * tag 'for-linus-5.3a-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: let alloc_xenballooned_pages() fail if not enough memory free xen/pv: Fix a boot up hang revealed by int3 self test x86/xen: Add "nopv" support for HVM guest x86/paravirt: Remove const mark from x86_hyper_xen_hvm variable xen: Map "xen_nopv" parameter to "nopv" and mark it obsolete x86: Add "nopv" parameter to disable PV extensions x86/xen: Mark xen_hvm_need_lapic() and xen_x2apic_para_available() as __init xen: remove tmem driver Revert "x86/paravirt: Set up the virt_spin_lock_key after static keys get initialized" xen/events: fix binding user event channels to cpus
2019-07-18proc/sysctl: add shared variables for range checkMatteo Croce
In the sysctl code the proc_dointvec_minmax() function is often used to validate the user supplied value between an allowed range. This function uses the extra1 and extra2 members from struct ctl_table as minimum and maximum allowed value. On sysctl handler declaration, in every source file there are some readonly variables containing just an integer which address is assigned to the extra1 and extra2 members, so the sysctl range is enforced. The special values 0, 1 and INT_MAX are very often used as range boundary, leading duplication of variables like zero=0, one=1, int_max=INT_MAX in different source files: $ git grep -E '\.extra[12].*&(zero|one|int_max)' |wc -l 248 Add a const int array containing the most commonly used values, some macros to refer more easily to the correct array member, and use them instead of creating a local one for every object file. This is the bloat-o-meter output comparing the old and new binary compiled with the default Fedora config: # scripts/bloat-o-meter -d vmlinux.o.old vmlinux.o add/remove: 2/2 grow/shrink: 0/2 up/down: 24/-188 (-164) Data old new delta sysctl_vals - 12 +12 __kstrtab_sysctl_vals - 12 +12 max 14 10 -4 int_max 16 - -16 one 68 - -68 zero 128 28 -100 Total: Before=20583249, After=20583085, chg -0.00% [mcroce@redhat.com: tipc: remove two unused variables] Link: http://lkml.kernel.org/r/20190530091952.4108-1-mcroce@redhat.com [akpm@linux-foundation.org: fix net/ipv6/sysctl_net_ipv6.c] [arnd@arndb.de: proc/sysctl: make firmware loader table conditional] Link: http://lkml.kernel.org/r/20190617130014.1713870-1-arnd@arndb.de [akpm@linux-foundation.org: fix fs/eventpoll.c] Link: http://lkml.kernel.org/r/20190430180111.10688-1-mcroce@redhat.com Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Kees Cook <keescook@chromium.org> Reviewed-by: Aaron Tomlin <atomlin@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-18x86/entry: Fix thunk function ELF sizesJosh Poimboeuf
Fix the following warnings: arch/x86/entry/thunk_64.o: warning: objtool: trace_hardirqs_on_thunk() is missing an ELF size annotation arch/x86/entry/thunk_64.o: warning: objtool: trace_hardirqs_off_thunk() is missing an ELF size annotation arch/x86/entry/thunk_64.o: warning: objtool: lockdep_sys_exit_thunk() is missing an ELF size annotation Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/89c97adc9f6cc44a0f5d03cde6d0357662938909.1563413318.git.jpoimboe@redhat.com
2019-07-17x86/mm, tracing: Fix CR2 corruptionPeter Zijlstra
Despite the current efforts to read CR2 before tracing happens there still exist a number of possible holes: idtentry page_fault do_page_fault has_error_code=1 call error_entry TRACE_IRQS_OFF call trace_hardirqs_off* #PF // modifies CR2 CALL_enter_from_user_mode __context_tracking_exit() trace_user_exit(0) #PF // modifies CR2 call do_page_fault address = read_cr2(); /* whoopsie */ And similar for i386. Fix it by pulling the CR2 read into the entry code, before any of that stuff gets a chance to run and ruin things. Reported-by: He Zhe <zhe.he@windriver.com> Reported-by: Eiichi Tsukata <devel@etsukata.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: bp@alien8.de Cc: rostedt@goodmis.org Cc: torvalds@linux-foundation.org Cc: hpa@zytor.com Cc: dave.hansen@linux.intel.com Cc: jgross@suse.com Cc: joel@joelfernandes.org Link: https://lkml.kernel.org/r/20190711114336.116812491@infradead.org Debugged-by: Steven Rostedt <rostedt@goodmis.org>
2019-07-17x86/entry/64: Update comments and sanity tests for create_gapPeter Zijlstra
Commit 2700fefdb2d9 ("x86_64: Add gap to int3 to allow for call emulation") forgot to update the comment, do so now. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: bp@alien8.de Cc: torvalds@linux-foundation.org Cc: hpa@zytor.com Cc: dave.hansen@linux.intel.com Cc: jgross@suse.com Cc: zhe.he@windriver.com Cc: joel@joelfernandes.org Cc: devel@etsukata.com Link: https://lkml.kernel.org/r/20190711114336.059780563@infradead.org
2019-07-17x86/entry/64: Simplify idtentry a littlePeter Zijlstra
There's a bunch of duplication in idtentry, namely the .Lfrom_usermode_switch_stack is a paranoid=0 copy of the normal flow. Make this explicit by creating a idtentry_part helper macro. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: bp@alien8.de Cc: torvalds@linux-foundation.org Cc: hpa@zytor.com Cc: dave.hansen@linux.intel.com Cc: jgross@suse.com Cc: zhe.he@windriver.com Cc: joel@joelfernandes.org Cc: devel@etsukata.com Link: https://lkml.kernel.org/r/20190711114336.002429503@infradead.org
2019-07-17x86/entry/32: Simplify common_exceptionPeter Zijlstra
Adding one more option to SAVE_ALL can be used in common_exception to simplify things. This also saves duplication later where page_fault will no longer use common_exception. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: bp@alien8.de Cc: torvalds@linux-foundation.org Cc: hpa@zytor.com Cc: dave.hansen@linux.intel.com Cc: jgross@suse.com Cc: zhe.he@windriver.com Cc: joel@joelfernandes.org Cc: devel@etsukata.com Link: https://lkml.kernel.org/r/20190711114335.945136187@infradead.org
2019-07-17x86/paravirt: Make read_cr2() CALLEE_SAVEPeter Zijlstra
The one paravirt read_cr2() implementation (Xen) is actually quite trivial and doesn't need to clobber anything other than the return register. Making read_cr2() CALLEE_SAVE avoids all the PUSH/POP nonsense and allows more convenient use from assembly. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: bp@alien8.de Cc: rostedt@goodmis.org Cc: luto@kernel.org Cc: torvalds@linux-foundation.org Cc: hpa@zytor.com Cc: dave.hansen@linux.intel.com Cc: zhe.he@windriver.com Cc: joel@joelfernandes.org Cc: devel@etsukata.com Link: https://lkml.kernel.org/r/20190711114335.887392493@infradead.org
2019-07-17kbuild: add --hash-style= and --build-id unconditionallyMasahiro Yamada
As commit 1e0221374e30 ("mips: vdso: drop unnecessary cc-ldoption") explained, these flags are supported by the minimal required version of binutils. They are supported by ld.lld too. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Tested-by: Nathan Chancellor <natechancellor@gmail.com>
2019-07-17x86/entry/64: Use JMP instead of JMPQJosh Poimboeuf
Somehow the swapgs mitigation entry code patch ended up with a JMPQ instruction instead of JMP, where only the short jump is needed. Some assembler versions apparently fail to optimize JMPQ into a two-byte JMP when possible, instead always using a 7-byte JMP with relocation. For some reason that makes the entry code explode with a #GP during boot. Change it back to "JMP" as originally intended. Fixes: 18ec54fdd6d1 ("x86/speculation: Prepare entry code for Spectre v1 swapgs mitigations") Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2019-07-17xen/pv: Fix a boot up hang revealed by int3 self testZhenzhong Duan
Commit 7457c0da024b ("x86/alternatives: Add int3_emulate_call() selftest") is used to ensure there is a gap setup in int3 exception stack which could be used for inserting call return address. This gap is missed in XEN PV int3 exception entry path, then below panic triggered: [ 0.772876] general protection fault: 0000 [#1] SMP NOPTI [ 0.772886] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.2.0+ #11 [ 0.772893] RIP: e030:int3_magic+0x0/0x7 [ 0.772905] RSP: 3507:ffffffff82203e98 EFLAGS: 00000246 [ 0.773334] Call Trace: [ 0.773334] alternative_instructions+0x3d/0x12e [ 0.773334] check_bugs+0x7c9/0x887 [ 0.773334] ? __get_locked_pte+0x178/0x1f0 [ 0.773334] start_kernel+0x4ff/0x535 [ 0.773334] ? set_init_arg+0x55/0x55 [ 0.773334] xen_start_kernel+0x571/0x57a For 64bit PV guests, Xen's ABI enters the kernel with using SYSRET, with %rcx/%r11 on the stack. To convert back to "normal" looking exceptions, the xen thunks do 'xen_*: pop %rcx; pop %r11; jmp *'. E.g. Extracting 'xen_pv_trap xenint3' we have: xen_xenint3: pop %rcx; pop %r11; jmp xenint3 As xenint3 and int3 entry code are same except xenint3 doesn't generate a gap, we can fix it by using int3 and drop useless xenint3. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Juergen Gross <jgross@suse.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2019-07-12x86/vdso: Fix flip/flop vdso build bugNaohiro Aota
Two consecutive "make" on an already compiled kernel tree will show different behavior: $ make CALL scripts/checksyscalls.sh CALL scripts/atomic/check-atomics.sh DESCEND objtool CHK include/generated/compile.h VDSOCHK arch/x86/entry/vdso/vdso64.so.dbg VDSOCHK arch/x86/entry/vdso/vdso32.so.dbg Kernel: arch/x86/boot/bzImage is ready (#3) Building modules, stage 2. MODPOST 12 modules $ make make CALL scripts/checksyscalls.sh CALL scripts/atomic/check-atomics.sh DESCEND objtool CHK include/generated/compile.h VDSO arch/x86/entry/vdso/vdso64.so.dbg OBJCOPY arch/x86/entry/vdso/vdso64.so VDSO2C arch/x86/entry/vdso/vdso-image-64.c CC arch/x86/entry/vdso/vdso-image-64.o VDSO arch/x86/entry/vdso/vdso32.so.dbg OBJCOPY arch/x86/entry/vdso/vdso32.so VDSO2C arch/x86/entry/vdso/vdso-image-32.c CC arch/x86/entry/vdso/vdso-image-32.o AR arch/x86/entry/vdso/built-in.a AR arch/x86/entry/built-in.a AR arch/x86/built-in.a GEN .version CHK include/generated/compile.h UPD include/generated/compile.h CC init/version.o AR init/built-in.a LD vmlinux.o <snip> This is causing "LD vmlinux" once every two times even without any modifications. This is the same bug fixed in commit 92a4728608a8 ("x86/boot: Fix if_changed build flip/flop bug"). Two "if_changed" cannot be used in one target. Fix this merging two commands into one function. Fixes: 7ac870747988 ("x86/vdso: Switch to generic vDSO implementation") Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com> Link: https://lkml.kernel.org/r/20190712101556.17833-1-naohiro.aota@wdc.com
2019-07-11Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A collection of assorted fixes: - Fix for the pinned cr0/4 fallout which escaped all testing efforts because the kvm-intel module was never loaded when the kernel was compiled with CONFIG_PARAVIRT=n. The cr0/4 accessors are moved out of line and static key is now solely used in the core code and therefore can stay in the RO after init section. So the kvm-intel and other modules do not longer reference the (read only) static key which the module loader tried to update. - Prevent an infinite loop in arch_stack_walk_user() by breaking out of the loop once the return address is detected to be 0. - Prevent the int3_emulate_call() selftest from corrupting the stack when KASAN is enabled. KASASN clobbers more registers than covered by the emulated call implementation. Convert the int3_magic() selftest to a ASM function so the compiler cannot KASANify it. - Unbreak the build with old GCC versions and with the Gold linker by reverting the 'Move of _etext to the actual end of .text'. In both cases the build fails with 'Invalid absolute R_X86_64_32S relocation: _etext' - Initialize the context lock for init_mm, which was never an issue until the alternatives code started to use a temporary mm for patching. - Fix a build warning vs. the LOWMEM_PAGES constant where clang complains rightfully about a signed integer overflow in the shift operation by converting the operand to an ULL. - Adjust the misnamed ENDPROC() of common_spurious in the 32bit entry code" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/stacktrace: Prevent infinite loop in arch_stack_walk_user() x86/asm: Move native_write_cr0/4() out of line x86/pgtable/32: Fix LOWMEM_PAGES constant x86/alternatives: Fix int3_emulate_call() selftest stack corruption x86/entry/32: Fix ENDPROC of common_spurious Revert "x86/build: Move _etext to actual end of .text" x86/ldt: Initialize the context lock for init_mm
2019-07-11Merge tag 'clone3-v5.3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull clone3 system call from Christian Brauner: "This adds the clone3 syscall which is an extensible successor to clone after we snagged the last flag with CLONE_PIDFD during the 5.2 merge window for clone(). It cleanly supports all of the flags from clone() and thus all legacy workloads. There are few user visible differences between clone3 and clone. First, CLONE_DETACHED will cause EINVAL with clone3 so we can reuse this flag. Second, the CSIGNAL flag is deprecated and will cause EINVAL to be reported. It is superseeded by a dedicated "exit_signal" argument in struct clone_args thus freeing up even more flags. And third, clone3 gives CLONE_PIDFD a dedicated return argument in struct clone_args instead of abusing CLONE_PARENT_SETTID's parent_tidptr argument. The clone3 uapi is designed to be easy to handle on 32- and 64 bit: /* uapi */ struct clone_args { __aligned_u64 flags; __aligned_u64 pidfd; __aligned_u64 child_tid; __aligned_u64 parent_tid; __aligned_u64 exit_signal; __aligned_u64 stack; __aligned_u64 stack_size; __aligned_u64 tls; }; and a separate kernel struct is used that uses proper kernel typing: /* kernel internal */ struct kernel_clone_args { u64 flags; int __user *pidfd; int __user *child_tid; int __user *parent_tid; int exit_signal; unsigned long stack; unsigned long stack_size; unsigned long tls; }; The system call comes with a size argument which enables the kernel to detect what version of clone_args userspace is passing in. clone3 validates that any additional bytes a given kernel does not know about are set to zero and that the size never exceeds a page. A nice feature is that this patchset allowed us to cleanup and simplify various core kernel codepaths in kernel/fork.c by making the internal _do_fork() function take struct kernel_clone_args even for legacy clone(). This patch also unblocks the time namespace patchset which wants to introduce a new CLONE_TIMENS flag. Note, that clone3 has only been wired up for x86{_32,64}, arm{64}, and xtensa. These were the architectures that did not require special massaging. Other architectures treat fork-like system calls individually and after some back and forth neither Arnd nor I felt confident that we dared to add clone3 unconditionally to all architectures. We agreed to leave this up to individual architecture maintainers. This is why there's an additional patch that introduces __ARCH_WANT_SYS_CLONE3 which any architecture can set once it has implemented support for clone3. The patch also adds a cond_syscall(clone3) for architectures such as nios2 or h8300 that generate their syscall table by simply including asm-generic/unistd.h. The hope is to get rid of __ARCH_WANT_SYS_CLONE3 and cond_syscall() rather soon" * tag 'clone3-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: arch: handle arches who do not yet define clone3 arch: wire-up clone3() syscall fork: add clone3