aboutsummaryrefslogtreecommitdiffstats
path: root/arch/x86/include/asm/alternative.h
AgeCommit message (Collapse)Author
2019-09-15x86: alternative.h: use asm_inline for all alternative variantsRasmus Villemoes
Most, if not all, uses of the alternative* family just provide one or two instructions in .text, but the string literal can be quite large, causing gcc to overestimate the size of the generated code. That in turn affects its decisions about inlining of the function containing the alternative() asm statement. New enough versions of gcc allow one to overrule the estimated size by using "asm inline" instead of just "asm". So replace asm by the helper asm_inline, which for older gccs just expands to asm. Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
2019-04-03x86/nospec, objtool: Introduce ANNOTATE_IGNORE_ALTERNATIVEPeter Zijlstra
To facillitate other usage of ignoring alternatives; rename ANNOTATE_NOSPEC_IGNORE to ANNOTATE_IGNORE_ALTERNATIVE. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-01-16x86/alternatives: Add an ALTERNATIVE_3() macroBorislav Petkov
Similar to ALTERNATIVE_2(), ALTERNATIVE_3() selects between 3 possible variants. Will be used for adding RDTSCP to the rdtsc_ordered() alternatives. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: X86 ML <x86@kernel.org> Link: https://lkml.kernel.org/r/20181211222326.14581-4-bp@alien8.de
2019-01-16x86/alternatives: Add macro commentsBorislav Petkov
... so that when one stares at the .s output, one can find her way around the resulting asm magic. With it, ALTERNATIVE looks like this now: # ALT: oldnstr 661: ... 662: # ALT: padding .skip ... 663: .pushsection .altinstructions,"a" ... .popsection .pushsection .altinstr_replacement, "ax" # ALT: replacement 1 6641: ... 6651: .popsection Merge __OLDINSTR() into OLDINSTR(), while at it. No functional changes. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: X86 ML <x86@kernel.org> Link: https://lkml.kernel.org/r/20181211222326.14581-2-bp@alien8.de
2018-12-26Merge branch 'x86-cleanups-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: "Misc cleanups" * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/kprobes: Remove trampoline_handler() prototype x86/kernel: Fix more -Wmissing-prototypes warnings x86: Fix various typos in comments x86/headers: Fix -Wmissing-prototypes warning x86/process: Avoid unnecessary NULL check in get_wchan() x86/traps: Complete prototype declarations x86/mce: Fix -Wmissing-prototypes warnings x86/gart: Rewrite early_gart_iommu_check() comment
2018-12-19Revert "x86/alternatives: Macrofy lock prefixes to work around GCC inlining ↵Ingo Molnar
bugs" This reverts commit 77f48ec28e4ccff94d2e5f4260a83ac27a7f3099. See this commit for details about the revert: e769742d3584 ("Revert "x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs"") Reported-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Borislav Petkov <bp@alien8.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Juergen Gross <jgross@suse.com> Cc: Richard Biener <rguenther@suse.de> Cc: Kees Cook <keescook@chromium.org> Cc: Segher Boessenkool <segher@kernel.crashing.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Nadav Amit <namit@vmware.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-03x86: Fix various typos in commentsIngo Molnar
Go over arch/x86/ and fix common typos in comments, and a typo in an actual function argument name. No change in functionality intended. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-04x86/alternatives: Macrofy lock prefixes to work around GCC inlining bugsNadav Amit
As described in: 77b0bf55bc67: ("kbuild/Makefile: Prepare for using macros in inline assembly code to work around asm() related GCC inlining bugs") GCC's inlining heuristics are broken with common asm() patterns used in kernel code, resulting in the effective disabling of inlining. The workaround is to set an assembly macro and call it from the inline assembly block - i.e. to macrify the affected block. As a result GCC considers the inline assembly block as a single instruction. This patch handles the LOCK prefix, allowing more aggresive inlining: text data bss dec hex filename 18140140 10225284 2957312 31322736 1ddf270 ./vmlinux before 18146889 10225380 2957312 31329581 1de0d2d ./vmlinux after (+6845) This is the reduction in non-inlined functions: Before: 40286 After: 40218 (-68) Tested-by: Kees Cook <keescook@chromium.org> Signed-off-by: Nadav Amit <namit@vmware.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20181003213100.189959-6-namit@vmware.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-27x86/alternatives: Fixup alternative_call_2Alexey Dobriyan
The following pattern fails to compile while the same pattern with alternative_call() does: if (...) alternative_call_2(...); else alternative_call_2(...); as it expands into if (...) { }; <=== else { }; Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20180114120504.GA11368@avx2
2018-01-05x86/alternatives: Add missing '\n' at end of ALTERNATIVE inline asmDavid Woodhouse
Where an ALTERNATIVE is used in the middle of an inline asm block, this would otherwise lead to the following instruction being appended directly to the trailing ".popsection", and a failed compile. Fixes: 9cebed423c84 ("x86, alternative: Use .pushsection/.popsection") Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: gnomes@lxorguk.ukuu.org.uk Cc: Rik van Riel <riel@redhat.com> Cc: ak@linux.intel.com Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul Turner <pjt@google.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Kees Cook <keescook@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180104143710.8961-8-dwmw@amazon.co.uk
2017-11-02License cleanup: add SPDX GPL-2.0 license identifier to files with no licenseGreg Kroah-Hartman
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-09x86/alternatives: Fix alt_max_short macro to really be a max()Mathias Krause
The alt_max_short() macro in asm/alternative.h does not work as intended, leading to nasty bugs. E.g. alt_max_short("1", "3") evaluates to 3, but alt_max_short("3", "1") evaluates to 1 -- not exactly the maximum of 1 and 3. In fact, I had to learn it the hard way by crashing my kernel in not so funny ways by attempting to make use of the ALTENATIVE_2 macro with alternatives where the first one was larger than the second one. According to [1] and commit dbe4058a6a44 ("x86/alternatives: Fix ALTERNATIVE_2 padding generation properly") the right handed side should read "-(-(a < b))" not "-(-(a - b))". Fix that, to make the macro work as intended. While at it, fix up the comments regarding the additional "-", too. It's not about gas' usage of s32 but brain dead logic of having a "true" value of -1 for the < operator ... *sigh* Btw., the one in asm/alternative-asm.h is correct. And, apparently, all current users of ALTERNATIVE_2() pass same sized alternatives, avoiding to hit the bug. [1] http://graphics.stanford.edu/~seander/bithacks.html#IntegerMinOrMax Reviewed-and-tested-by: Borislav Petkov <bp@suse.de> Fixes: dbe4058a6a44 ("x86/alternatives: Fix ALTERNATIVE_2 padding generation properly") Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1507228213-13095-1-git-send-email-minipli@googlemail.com
2017-09-23x86/asm: Fix inline asm call constraints for ClangJosh Poimboeuf
For inline asm statements which have a CALL instruction, we list the stack pointer as a constraint to convince GCC to ensure the frame pointer is set up first: static inline void foo() { register void *__sp asm(_ASM_SP); asm("call bar" : "+r" (__sp)) } Unfortunately, that pattern causes Clang to corrupt the stack pointer. The fix is easy: convert the stack pointer register variable to a global variable. It should be noted that the end result is different based on the GCC version. With GCC 6.4, this patch has exactly the same result as before: defconfig defconfig-nofp distro distro-nofp before 9820389 9491555 8816046 8516940 after 9820389 9491555 8816046 8516940 With GCC 7.2, however, GCC's behavior has changed. It now changes its behavior based on the conversion of the register variable to a global. That somehow convinces it to *always* set up the frame pointer before inserting *any* inline asm. (Therefore, listing the variable as an output constraint is a no-op and is no longer necessary.) It's a bit overkill, but the performance impact should be negligible. And in fact, there's a nice improvement with frame pointers disabled: defconfig defconfig-nofp distro distro-nofp before 9796316 9468236 9076191 8790305 after 9796957 9464267 9076381 8785949 So in summary, while listing the stack pointer as an output constraint is no longer necessary for newer versions of GCC, it's still needed for older versions. Suggested-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reported-by: Matthias Kaehlcke <mka@chromium.org> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Miguel Bernal Marin <miguel.bernal.marin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/3db862e970c432ae823cf515c52b54fec8270e0e.1505942196.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-24x86/alternatives: Add stack frame dependency to alternative_call_2()Josh Poimboeuf
Linus reported the following objtool warning: kernel/signal.o: warning: objtool: .altinstr_replacement+0x54: call without frame pointer save/setup The warning is valid. It's caused by the fact that gcc placed the call instruction in alternative_call_2()'s inline asm before the frame pointer setup, which breaks frame pointer convention and can result in a bad stack trace. Force a stack frame to be created before the call instruction by listing the stack pointer as an output operand in the inline asm statement. Reported-and-tested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20160923214939.j5o7c67nhepzmh3t@treble Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-29x86/asm: Make asm/alternative.h safe from assemblyAndy Lutomirski
asm/alternative.h isn't directly useful from assembly, but it shouldn't break the build. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/e5b693fcef99fe6e80341c9e97a002fb23871e91.1461698311.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-29x86/asm: Stop depending on ptrace.h in alternative.hAndy Lutomirski
alternative.h pulls in ptrace.h, which means that alternatives can't be used in anything referenced from ptrace.h, which is a mess. Break the dependency by pulling text patching helpers into their own header. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/99b93b13f2c9eb671f5c98bba4c2cbdc061293a2.1461698311.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-30x86/cpufeature: Carve out X86_FEATURE_*Borislav Petkov
Move them to a separate header and have the following dependency: x86/cpufeatures.h <- x86/processor.h <- x86/cpufeature.h This makes it easier to use the header in asm code and not include the whole cpufeature.h and add guards for asm. Suggested-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1453842730-28463-5-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19x86/alternatives, x86/fpu: Add 'alternatives_patched' debug flag and use it ↵Ingo Molnar
in xsave_state() We'd like to use xsave_state() earlier, but its SYSTEM_BOOTING check is too imprecise. The real condition that xsave_state() would like to check is whether alternative XSAVE instructions were patched into the kernel image already. Add such a (read-mostly) debug flag and use it in xsave_state(). Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-04x86/alternatives: Fix ALTERNATIVE_2 padding generation properlyBorislav Petkov
Quentin caught a corner case with the generation of instruction padding in the ALTERNATIVE_2 macro: if len(orig_insn) < len(alt1) < len(alt2), then not enough padding gets added and that is not good(tm) as we could overwrite the beginning of the next instruction. Luckily, at the time of this writing, we don't have ALTERNATIVE_2() invocations which have that problem and even if we did, a simple fix would be to prepend the instructions with enough prefixes so that that corner case doesn't happen. However, best it would be if we fixed it properly. See below for a simple, abstracted example of what we're doing. So what we ended up doing is, we compute the max(len(alt1), len(alt2)) - len(orig_insn) and feed that value to the .skip gas directive. The max() cannot have conditionals due to gas limitations, thus the fancy integer math. With this patch, all ALTERNATIVE_2 sites get padded correctly; generating obscure test cases pass too: #define alt_max_short(a, b) ((a) ^ (((a) ^ (b)) & -(-((a) < (b))))) #define gen_skip(orig, alt1, alt2, marker) \ .skip -((alt_max_short(alt1, alt2) - (orig)) > 0) * \ (alt_max_short(alt1, alt2) - (orig)),marker .pushsection .text, "ax" .globl main main: gen_skip(1, 2, 4, 0x09) gen_skip(4, 1, 2, 0x10) ... .popsection Thanks to Quentin for catching it and double-checking the fix! Reported-by: Quentin Casasnovas <quentin.casasnovas@oracle.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150404133443.GE21152@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-23x86/alternatives: Add instruction paddingBorislav Petkov
Up until now we have always paid attention to make sure the length of the new instruction replacing the old one is at least less or equal to the length of the old instruction. If the new instruction is longer, at the time it replaces the old instruction it will overwrite the beginning of the next instruction in the kernel image and cause your pants to catch fire. So instead of having to pay attention, teach the alternatives framework to pad shorter old instructions with NOPs at buildtime - but only in the case when len(old instruction(s)) < len(new instruction(s)) and add nothing in the >= case. (In that case we do add_nops() when patching). This way the alternatives user shouldn't have to care about instruction sizes and simply use the macros. Add asm ALTERNATIVE* flavor macros too, while at it. Also, we need to save the pad length in a separate struct alt_instr member for NOP optimization and the way to do that reliably is to carry the pad length instead of trying to detect whether we're looking at single-byte NOPs or at pathological instruction offsets like e9 90 90 90 90, for example, which is a valid instruction. Thanks to Michael Matz for the great help with toolchain questions. Signed-off-by: Borislav Petkov <bp@suse.de>
2014-05-29x86/alternative: Add alternative_input_2 to support alternative with two ↵Fenghua Yu
features and input alternative_input_2() replaces old instruction with new instructions with input based on two features. In alternative_input_2(oldinstr, newinstr1, feature1, newinstr2, feature2, input...), feature2 has higher priority to replace oldinstr than feature1. If CPU has feature2, newinstr2 replaces oldinstr and newinstr2 is executed during run time. If CPU doesn't have feature2, but it has feature1, newinstr1 replaces oldinstr and newinstr1 is executed during run time. If CPU doesn't have feature2 and feature1, oldinstr is executed during run time. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1401387164-43416-5-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-07-23kprobes/x86: Call out into INT3 handler directly instead of using notifierJiri Kosina
In fd4363fff3d96 ("x86: Introduce int3 (breakpoint)-based instruction patching"), the mechanism that was introduced for notifying alternatives code from int3 exception handler that and exception occured was die_notifier. This is however problematic, as early code might be using jump labels even before the notifier registration has been performed, which will then lead to an oops due to unhandled exception. One of such occurences has been encountered by Fengguang: int3: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Modules linked in: CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.11.0-rc1-01429-g04bf576 #8 task: ffff88000da1b040 ti: ffff88000da1c000 task.ti: ffff88000da1c000 RIP: 0010:[<ffffffff811098cc>] [<ffffffff811098cc>] ttwu_do_wakeup+0x28/0x225 RSP: 0000:ffff88000dd03f10 EFLAGS: 00000006 RAX: 0000000000000000 RBX: ffff88000dd12940 RCX: ffffffff81769c40 RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000000001 RBP: ffff88000dd03f28 R08: ffffffff8176a8c0 R09: 0000000000000002 R10: ffffffff810ff484 R11: ffff88000dd129e8 R12: ffff88000dbc90c0 R13: ffff88000dbc90c0 R14: ffff88000da1dfd8 R15: ffff88000da1dfd8 FS: 0000000000000000(0000) GS:ffff88000dd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000000ffffffff CR3: 0000000001c88000 CR4: 00000000000006e0 Stack: ffff88000dd12940 ffff88000dbc90c0 ffff88000da1dfd8 ffff88000dd03f48 ffffffff81109e2b ffff88000dd12940 0000000000000000 ffff88000dd03f68 ffffffff81109e9e 0000000000000000 0000000000012940 ffff88000dd03f98 Call Trace: <IRQ> [<ffffffff81109e2b>] ttwu_do_activate.constprop.56+0x6d/0x79 [<ffffffff81109e9e>] sched_ttwu_pending+0x67/0x84 [<ffffffff8110c845>] scheduler_ipi+0x15a/0x2b0 [<ffffffff8104dfb4>] smp_reschedule_interrupt+0x38/0x41 [<ffffffff8173bf5d>] reschedule_interrupt+0x6d/0x80 <EOI> [<ffffffff810ff484>] ? __atomic_notifier_call_chain+0x5/0xc1 [<ffffffff8105cc30>] ? native_safe_halt+0xd/0x16 [<ffffffff81015f10>] default_idle+0x147/0x282 [<ffffffff81017026>] arch_cpu_idle+0x3d/0x5d [<ffffffff81127d6a>] cpu_idle_loop+0x46d/0x5db [<ffffffff81127f5c>] cpu_startup_entry+0x84/0x84 [<ffffffff8104f4f8>] start_secondary+0x3c8/0x3d5 [...] Fix this by directly calling poke_int3_handler() from the int3 exception handler (analogically to what ftrace has been doing already), instead of relying on notifier, registration of which might not have yet been finalized by the time of the first trap. Reported-and-tested-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/alpine.LNX.2.00.1307231007490.14024@pobox.suse.cz Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-19kprobes/x86: Remove unused text_poke_smp() and text_poke_smp_batch() functionsMasami Hiramatsu
Since introducing the text_poke_bp() for all text_poke_smp*() callers, text_poke_smp*() are now unused. This patch basically reverts: 3d55cc8a058e ("x86: Add text_poke_smp for SMP cross modifying code") 7deb18dcf047 ("x86: Introduce text_poke_smp_batch() for batch-code modifying") and related commits. This patch also fixes a Kconfig dependency issue on STOP_MACHINE in the case of CONFIG_SMP && !CONFIG_MODULE_UNLOAD. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Jiri Kosina <jkosina@suse.cz> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Jason Baron <jbaron@akamai.com> Cc: yrl.pp-manager.tt@hitachi.com Cc: Borislav Petkov <bpetkov@suse.de> Link: http://lkml.kernel.org/r/20130718114753.26675.18714.stgit@mhiramat-M0-7522 Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-16x86: Introduce int3 (breakpoint)-based instruction patchingJiri Kosina
Introduce a method for run-time instruction patching on a live SMP kernel based on int3 breakpoint, completely avoiding the need for stop_machine(). The way this is achieved: - add a int3 trap to the address that will be patched - sync cores - update all but the first byte of the patched range - sync cores - replace the first byte (int3) by the first byte of replacing opcode - sync cores According to http://lkml.indiana.edu/hypermail/linux/kernel/1001.1/01530.html synchronization after replacing "all but first" instructions should not be necessary (on Intel hardware), as the syncing after the subsequent patching of the first byte provides enough safety. But there's not only Intel HW out there, and we'd rather be on a safe side. If any CPU instruction execution would collide with the patching, it'd be trapped by the int3 breakpoint and redirected to the provided "handler" (which would typically mean just skipping over the patched region, acting as "nop" has been there, in case we are doing nop -> jump and jump -> nop transitions). Ftrace has been using this very technique since 08d636b ("ftrace/x86: Have arch x86_64 use breakpoints instead of stop machine") for ages already, and jump labels are another obvious potential user of this. Based on activities of Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> a few years ago. Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Link: http://lkml.kernel.org/r/alpine.LNX.2.00.1307121102440.29788@pobox.suse.cz Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-10-01Merge branch 'x86-smap-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/smap support from Ingo Molnar: "This adds support for the SMAP (Supervisor Mode Access Prevention) CPU feature on Intel CPUs: a hardware feature that prevents unintended user-space data access from kernel privileged code. It's turned on automatically when possible. This, in combination with SMEP, makes it even harder to exploit kernel bugs such as NULL pointer dereferences." Fix up trivial conflict in arch/x86/kernel/entry_64.S due to newly added includes right next to each other. * 'x86-smap-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, smep, smap: Make the switching functions one-way x86, suspend: On wakeup always initialize cr4 and EFER x86-32: Start out eflags and cr4 clean x86, smap: Do not abuse the [f][x]rstor_checking() functions for user space x86-32, smap: Add STAC/CLAC instructions to 32-bit kernel entry x86, smap: Reduce the SMAP overhead for signal handling x86, smap: A page fault due to SMAP is an oops x86, smap: Turn on Supervisor Mode Access Prevention x86, smap: Add STAC and CLAC instructions to control user space access x86, uaccess: Merge prototypes for clear_user/__clear_user x86, smap: Add a header file with macros for STAC/CLAC x86, alternative: Add header guards to <asm/alternative-asm.h> x86, alternative: Use .pushsection/.popsection x86, smap: Add CR4 bit for SMAP x86-32, mm: The WP test should be done on a kernel page
2012-09-21x86, alternative: Use .pushsection/.popsectionH. Peter Anvin
.section/.previous doesn't nest. Use .pushsection/.popsection in <asm/alternative.h> so that they can be properly nested. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/1348256595-29119-5-git-send-email-hpa@linux.intel.com
2012-08-23x86/smp: Don't ever patch back to UP if we unplug cpusRusty Russell
We still patch SMP instructions to UP variants if we boot with a single CPU, but not at any other time. In particular, not if we unplug CPUs to return to a single cpu. Paul McKenney points out: mean offline overhead is 6251/48=130.2 milliseconds. If I remove the alternatives_smp_switch() from the offline path [...] the mean offline overhead is 550/42=13.1 milliseconds Basically, we're never going to get those 120ms back, and the code is pretty messy. We get rid of: 1) The "smp-alt-once" boot option. It's actually "smp-alt-boot", the documentation is wrong. It's now the default. 2) The skip_smp_alternatives flag used by suspend. 3) arch_disable_nonboot_cpus_begin() and arch_disable_nonboot_cpus_end() which were only used to set this one flag. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Paul McKenney <paul.mckenney@us.ibm.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/87vcgwwive.fsf@rustcorp.com.au Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-29x86/copy_user_generic: Optimize copy_user_generic with CPU erms featureFenghua Yu
According to Intel 64 and IA-32 SDM and Optimization Reference Manual, beginning with Ivybridge, REG string operation using MOVSB and STOSB can provide both flexible and high-performance REG string operations in cases like memory copy. Enhancement availability is indicated by CPUID.7.0.EBX[9] (Enhanced REP MOVSB/ STOSB). If CPU erms feature is detected, patch copy_user_generic with enhanced fast string version of copy_user_generic. A few new macros are defined to reduce duplicate code in ALTERNATIVE and ALTERNATIVE_2. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1337908785-14015-1-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-01-20x86: Adjust asm constraints in atomic64 wrappersJan Beulich
Eric pointed out overly restrictive constraints in atomic64_set(), but there are issues throughout the file. In the cited case, %ebx and %ecx are inputs only (don't get changed by either of the two low level implementations). This was also the case elsewhere. Further in many cases early-clobber indicators were missing. Finally, the previous implementation rolled a custom alternative instruction macro from scratch, rather than using alternative_call() (which was introduced with the commit that the description of the change in question actually refers to). Adjusting has the benefit of not hiding referenced symbols from the compiler, which however requires them to be declared not just in the exporting source file (which, as a desirable side effect, in turn allows that exporting file to become a real 5-line stub). This patch does not eliminate the overly restrictive memory clobbers, however: Doing so would occasionally make the compiler set up a second register for accessing the memory object (to satisfy the added "m" constraint), and it's not clear which of the two non-optimal alternatives is better. v2: Re-do the declaration and exporting of the internal symbols. Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Link: http://lkml.kernel.org/r/4F19A2A5020000780006E0D9@nat28.tlf.novell.com Cc: Luca Barbieri <luca@luca-barbieri.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-09-15asm alternatives: remove incorrect alignment notesLinus Torvalds
On x86-64, they were just wasteful: with the explicitly added (now unnecessary) padding, the size of the alternatives structure was 16 bytes, and an alignment of 8 bytes didn't hurt much. However, it was still silly, since the natural size and alignment for the structure is actually just 12 bytes, 4-byte aligned since commit 59e97e4d6fbc ("x86: Make alternative instruction pointers relative"). So removing the padding, and removing the extra alignment is just a good idea. On x86-32, the alignment of 4 bytes was correct, but was incorrectly hardcoded as 8 bytes in <asm/alternative-asm.h>. That header file had used to be an x86-64 only header file, but various unification efforts have made it be used for x86-32 too (ie the unification of rwlock and rwsem). That in turn caused x86-32 boot failures, because the extra alignment would result in random zero-filled words in the altinstructions section, causing oopses early at boot when doing alternative instruction replacement. So just remove all the alignment noise entirely. It's wrong, and it's unnecessary. The section itself is already properly aligned by the linker scripts, and all additions to the section had better be of the proper 12-byte format, keeping it aligned. So if the align directive were to ever make a difference, that would be an indication of a serious bug to begin with. Reported-by: Werner Landgraf <w.landgraf@ru.r> Acked-by: Andrew Lutomirski <luto@mit.edu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-13x86: Make alternative instruction pointers relativeAndy Lutomirski
This save a few bytes on x86-64 and means that future patches can apply alternatives to unrelocated code. Signed-off-by: Andy Lutomirski <luto@mit.edu> Link: http://lkml.kernel.org/r/ff64a6b9a1a3860ca4a7b8b6dc7b4754f9491cd7.1310563276.git.luto@mit.edu Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-19Merge branch 'x86-cpu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, cpu: Fix detection of Celeron Covington stepping A1 and B0 Documentation, ABI: Update L3 cache index disable text x86, AMD, cacheinfo: Fix L3 cache index disable checks x86, AMD, cacheinfo: Fix fallout caused by max3 conversion x86, cpu: Change NOP selection for certain Intel CPUs x86, cpu: Clean up and unify the NOP selection infrastructure x86, percpu: Use ASM_NOP4 instead of hardcoding P6_NOP4 x86, cpu: Move AMD Elan Kconfig under "Processor family" Fix up trivial conflicts in alternative handling (commit dc326fca2b64 "x86, cpu: Clean up and unify the NOP selection infrastructure" removed some hacky 5-byte instruction stuff, while commit d430d3d7e646 "jump label: Introduce static_branch() interface" renamed HAVE_JUMP_LABEL to CONFIG_JUMP_LABEL in the code that went away)
2011-04-18x86, cpu: Clean up and unify the NOP selection infrastructureH. Peter Anvin
Clean up and unify the NOP selection infrastructure: - Make the atomic 5-byte NOP a part of the selection system. - Pick NOPs once during early boot and then be done with it. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Tejun Heo <tj@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jason Baron <jbaron@redhat.com> Link: http://lkml.kernel.org/r/1303166160-10315-3-git-send-email-hpa@linux.intel.com
2011-04-04jump label: Introduce static_branch() interfaceJason Baron
Introduce: static __always_inline bool static_branch(struct jump_label_key *key); instead of the old JUMP_LABEL(key, label) macro. In this way, jump labels become really easy to use: Define: struct jump_label_key jump_key; Can be used as: if (static_branch(&jump_key)) do unlikely code enable/disale via: jump_label_inc(&jump_key); jump_label_dec(&jump_key); that's it! For the jump labels disabled case, the static_branch() becomes an atomic_read(), and jump_label_inc()/dec() are simply atomic_inc(), atomic_dec() operations. We show testing results for this change below. Thanks to H. Peter Anvin for suggesting the 'static_branch()' construct. Since we now require a 'struct jump_label_key *key', we can store a pointer into the jump table addresses. In this way, we can enable/disable jump labels, in basically constant time. This change allows us to completely remove the previous hashtable scheme. Thanks to Peter Zijlstra for this re-write. Testing: I ran a series of 'tbench 20' runs 5 times (with reboots) for 3 configurations, where tracepoints were disabled. jump label configured in avg: 815.6 jump label *not* configured in (using atomic reads) avg: 800.1 jump label *not* configured in (regular reads) avg: 803.4 Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20110316212947.GA8792@redhat.com> Signed-off-by: Jason Baron <jbaron@redhat.com> Suggested-by: H. Peter Anvin <hpa@linux.intel.com> Tested-by: David Daney <ddaney@caviumnetworks.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-01-06Merge branches 'x86-alternatives-for-linus', 'x86-fpu-for-linus', ↵Linus Torvalds
'x86-hwmon-for-linus', 'x86-paravirt-for-linus', 'core-locking-for-linus' and 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-alternatives-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, suspend: Avoid unnecessary smp alternatives switch during suspend/resume * 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86-64, asm: Use fxsaveq/fxrestorq in more places * 'x86-hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, hwmon: Add core threshold notification to therm_throt.c * 'x86-paravirt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, paravirt: Use native_halt on a halt, not native_safe_halt * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: locking, lockdep: Convert sprintf_symbol to %pS * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: irq: Better struct irqaction layout
2010-12-13x86, suspend: Avoid unnecessary smp alternatives switch during suspend/resumeSuresh Siddha
During suspend, we disable all the non boot cpus. And during resume we bring them all back again. So no need to do alternatives_smp_switch() in between. On my core 2 based laptop, this speeds up the suspend path by 15msec and the resume path by 5 msec (suspend/resume speed up differences can be attributed to the different P-states that the cpu is in during suspend/resume). Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <1290557500.4946.8.camel@sbsiddha-MOBL3.sc.intel.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-12-06x86: Introduce text_poke_smp_batch() for batch-code modifyingMasami Hiramatsu
Introduce text_poke_smp_batch(). This function modifies several text areas with one stop_machine() on SMP. Because calling stop_machine() is heavy task, it is better to aggregate text_poke requests. ( Note: I've talked with Rusty about this interface, and he would not like to expand stop_machine() interface, since it is not for generic use. ) Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Jan Beulich <jbeulich@novell.com> Cc: 2nddept-manager@sdl.hitachi.co.jp LKML-Reference: <20101203095422.2961.51217.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-22jump label: Base patch for jump labelJason Baron
base patch to implement 'jump labeling'. Based on a new 'asm goto' inline assembly gcc mechanism, we can now branch to labels from an 'asm goto' statment. This allows us to create a 'no-op' fastpath, which can subsequently be patched with a jump to the slowpath code. This is useful for code which might be rarely used, but which we'd like to be able to call, if needed. Tracepoints are the current usecase that these are being implemented for. Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jason Baron <jbaron@redhat.com> LKML-Reference: <ee8b3595967989fdaf84e698dc7447d315ce972a.1284733808.git.jbaron@redhat.com> [ cleaned up some formating ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-09-20jump label: Make text_poke_early() globally visibleJason Baron
Make text_poke_early available outside of alternative.c. The jump label patchset wants to make use of it in order to set up the optimal no-op sequences at run-time. Signed-off-by: Jason Baron <jbaron@redhat.com> LKML-Reference: <04cfddf2ba77bcabfc3e524f1849d871d6a1cf9d.1284733808.git.jbaron@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-09-20jump label: Make dynamic no-op selection available outside of ftraceJason Baron
Move Steve's code for finding the best 5-byte no-op from ftrace.c to alternative.c. The idea is that other consumers (in this case jump label) want to make use of that code. Signed-off-by: Jason Baron <jbaron@redhat.com> LKML-Reference: <96259ae74172dcac99c0020c249743c523a92e18.1284733808.git.jbaron@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-07-07x86, alternatives: Use 16-bit numbers for cpufeature indexH. Peter Anvin
We already have cpufeature indicies above 255, so use a 16-bit number for the alternatives index. This consumes a padding field and so doesn't add any size, but it means that abusing the padding field to create assembly errors on overflow no longer works. We can retain the test simply by redirecting it to the .discard section, however. [ v3: updated to include open-coded locations ] Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> LKML-Reference: <tip-f88731e3068f9d1392ba71cc9f50f035d26a0d4f@git.kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-05-18Merge branch 'core-hweight-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-hweight-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, hweight: Use a 32-bit popcnt for __arch_hweight32() arch, hweight: Fix compilation errors x86: Add optimized popcnt variants bitops: Optimize hweight() by making use of compile-time evaluation
2010-04-29Merge branch 'x86/asm' into x86/atomicH. Peter Anvin
Merge reason: Conflict between LOCK_PREFIX_HERE and relative alternatives pointers Resolved Conflicts: arch/x86/include/asm/alternative.h arch/x86/kernel/alternative.c Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-04-29x86: Fix LOCK_PREFIX_HERE for uniprocessor buildH. Peter Anvin
Checkin b3ac891b67bd4b1fc728d1c784cad1212dea433d: x86: Add support for lock prefix in alternatives ... did not define LOCK_PREFIX_HERE in the case of a uniprocessor build. As a result, it would cause any of the usages of this macro to fail on a uniprocessor build. Fix this by defining LOCK_PREFIX_HERE as a null string. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Luca Barbieri <luca@luca-barbieri.com> LKML-Reference: <1267005265-27958-2-git-send-email-luca@luca-barbieri.com>
2010-04-28x86-64: Reduce SMP locks table sizeJan Beulich
Reduce the SMP locks table size by using relative pointers instead of absolute ones, thus cutting the table size by half. Signed-off-by: Jan Beulich <jbeulich@novell.com> LKML-Reference: <4BCF30FE020000780003B3B6@vpn.id2.novell.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-04-06x86: Add optimized popcnt variantsBorislav Petkov
Add support for the hardware version of the Hamming weight function, popcnt, present in CPUs which advertize it under CPUID, Function 0x0000_0001_ECX[23]. On CPUs which don't support it, we fallback to the default lib/hweight.c sw versions. A synthetic benchmark comparing popcnt with __sw_hweight64 showed almost a 3x speedup on a F10h machine. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <20100318112015.GC11152@aftab> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-03-05Merge branch 'perf-probes-for-linus-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-probes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Issue at least one memory barrier in stop_machine_text_poke() perf probe: Correct probe syntax on command line help perf probe: Add lazy line matching support perf probe: Show more lines after last line perf probe: Check function address range strictly in line finder perf probe: Use libdw callback routines perf probe: Use elfutils-libdw for analyzing debuginfo perf probe: Rename probe finder functions perf probe: Fix bugs in line range finder perf probe: Update perf probe document perf probe: Do not show --line option without dwarf support kprobes: Add documents of jump optimization kprobes/x86: Support kprobes jump optimization on x86 x86: Add text_poke_smp for SMP cross modifying code kprobes/x86: Cleanup save/restore registers kprobes/x86: Boost probes when reentering kprobes: Jump optimization sysctl interface kprobes: Introduce kprobes jump optimization kprobes: Introduce generic insn_slot framework kprobes/x86: Cleanup RELATIVEJUMP_INSTRUCTION to RELATIVEJUMP_OPCODE
2010-02-28Merge branch 'x86-asm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Mark atomic irq ops raw for 32bit legacy x86: Merge show_regs() x86: Macroise x86 cache descriptors x86-32: clean up rwsem inline asm statements x86: Merge asm/atomic_{32,64}.h x86: Sync asm/atomic_32.h and asm/atomic_64.h x86: Split atomic64_t functions into seperate headers x86-64: Modify memcpy()/memset() alternatives mechanism x86-64: Modify copy_user_generic() alternatives mechanism x86: Lift restriction on the location of FIX_BTMAP_* x86, core: Optimize hweight32()
2010-02-25x86: Add support for lock prefix in alternativesLuca Barbieri
The current lock prefix UP/SMP alternative code doesn't allow LOCK_PREFIX to be used in alternatives code. This patch solves the problem by adding a new LOCK_PREFIX_ALTERNATIVE_PATCH macro that only records the lock prefix location but does not emit the prefix. The user of this macro can then start any alternative sequence with "lock" and have it UP/SMP patched. To make this work, the UP/SMP alternative code is changed to do the lock/DS prefix switching only if the byte actually contains a lock or DS prefix. Thus, if an alternative without the "lock" is selected, it will now do nothing instead of clobbering the code. Changes in v2: - Naming change - Change label to not conflict with alternatives Signed-off-by: Luca Barbieri <luca@luca-barbieri.com> LKML-Reference: <1267005265-27958-2-git-send-email-luca@luca-barbieri.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-25x86: Add text_poke_smp for SMP cross modifying codeMasami Hiramatsu
Add generic text_poke_smp for SMP which uses stop_machine() to synchronize modifying code. This stop_machine() method is officially described at "7.1.3 Handling Self- and Cross-Modifying Code" on the intel's software developer's manual 3A. Since stop_machine() can't protect code against NMI/MCE, this function can not modify those handlers. And also, this function is basically for modifying multibyte-single-instruction. For modifying multibyte-multi-instructions, we need another special trap & detour code. This code originaly comes from immediate values with stop_machine() version. Thanks Jason and Mathieu! Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Cc: systemtap <systemtap@sources.redhat.com> Cc: DLE <dle-develop@lists.sourceforge.net> Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Jim Keniston <jkenisto@us.ibm.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Anders Kaseorg <andersk@ksplice.com> Cc: Tim Abbott <tabbott@ksplice.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Jason Baron <jbaron@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> LKML-Reference: <20100225133438.6725.80273.stgit@localhost6.localdomain6> Signed-off-by: Ingo Molnar <mingo@elte.hu>