aboutsummaryrefslogtreecommitdiffstats
path: root/arch/x86/kernel/dumpstack.c
AgeCommit message (Collapse)Author
2019-07-31x86/dumpstack: Indicate PREEMPT_RT in dumpsThomas Gleixner
Stack dumps print whether the kernel has preemption enabled or not. Extend it so a PREEMPT_RT enabled kernel can be identified. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul E. McKenney <paulmck@linux.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20190726212124.699136351@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-09-06x86/process: Don't mix user/kernel regs in 64bit __show_regs()Jann Horn
When the kernel.print-fatal-signals sysctl has been enabled, a simple userspace crash will cause the kernel to write a crash dump that contains, among other things, the kernel gsbase into dmesg. As suggested by Andy, limit output to pt_regs, FS_BASE and KERNEL_GS_BASE in this case. This also moves the bitness-specific logic from show_regs() into process_{32,64}.c. Fixes: 45807a1df9f5 ("vdso: print fatal signals") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180831194151.123586-1-jannh@google.com
2018-08-31x86/dumpstack: Don't dump kernel memory based on usermode RIPJann Horn
show_opcodes() is used both for dumping kernel instructions and for dumping user instructions. If userspace causes #PF by jumping to a kernel address, show_opcodes() can be reached with regs->ip controlled by the user, pointing to kernel code. Make sure that userspace can't trick us into dumping kernel memory into dmesg. Fixes: 7cccf0725cf7 ("x86/dumpstack: Add a show_ip() function") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: security@kernel.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180828154901.112726-1-jannh@google.com
2018-08-30x86/entry/64: Wipe KASAN stack shadow before rewind_stack_do_exit()Jann Horn
Reset the KASAN shadow state of the task stack before rewinding RSP. Without this, a kernel oops will leave parts of the stack poisoned, and code running under do_exit() can trip over such poisoned regions and cause nonsensical false-positive KASAN reports about stack-out-of-bounds bugs. This does not wipe the exception stacks; if an oops happens on an exception stack, it might result in random KASAN false-positives from other tasks afterwards. This is probably relatively uninteresting, since if the kernel oopses on an exception stack, there are most likely bigger things to worry about. It'd be more interesting if vmapped stacks and KASAN were compatible, since then handle_stack_overflow() would oops from exception stack context. Fixes: 2deb4be28077 ("x86/dumpstack: When OOPSing, rewind the stack before do_exit()") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: kasan-dev@googlegroups.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180828184033.93712-1-jannh@google.com
2018-07-19x86: Avoid pr_cont() in show_opcodes()Rasmus Villemoes
If concurrent printk() messages are emitted, then pr_cont() is making it extremly hard to decode which part of the output belongs to what. See the convoluted example at: https://syzkaller.appspot.com/text?tag=CrashReport&x=139d342c400000 Avoid this by using a proper prefix for each line and by using %ph format in show_opcodes() which emits the 'Code:' line in one go. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: joe@perches.com Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@suse.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lkml.kernel.org/r/1532009278-5953-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
2018-04-26x86/dumpstack: Explain the reasoning for the prologue and buffer sizeBorislav Petkov
The whole reasoning behind the amount of opcode bytes dumped and prologue length isn't very clear so write down some of the reasons for why it is done the way it is. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Link: https://lkml.kernel.org/r/20180417161124.5294-10-bp@alien8.de
2018-04-26x86/dumpstack: Save first regs set for the executive summaryBorislav Petkov
Save the regs set when __die() is onvoked for the first time and print it in oops_end(). Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Link: https://lkml.kernel.org/r/20180417161124.5294-9-bp@alien8.de
2018-04-26x86/dumpstack: Add a show_ip() functionBorislav Petkov
... which shows the Instruction Pointer along with the insn bytes around it. Use it whenever rIP is printed. Drop the rIP < PAGE_OFFSET check since probe_kernel_read() can handle any address properly. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Link: https://lkml.kernel.org/r/20180417161124.5294-8-bp@alien8.de
2018-04-26x86/dumpstack: Add loglevel argument to show_opcodes()Borislav Petkov
Will be used in the next patch. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Link: https://lkml.kernel.org/r/20180417161124.5294-6-bp@alien8.de
2018-04-26x86/dumpstack: Improve opcodes dumping in the code sectionBorislav Petkov
The code used to iterate byte-by-byte over the bytes around RIP and that is expensive: disabling pagefaults around it, copy_from_user, etc... Make it read the whole buffer of OPCODE_BUFSIZE size in one go. Use a statically allocated 64 bytes buffer so that concurrent show_opcodes() do not interleave in the output even though in the majority of the cases it's serialized via die_lock. Except the #PF path which doesn't... Also, do the PAGE_OFFSET check outside of the function because latter will be reused in other context. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Link: https://lkml.kernel.org/r/20180417161124.5294-5-bp@alien8.de
2018-04-26x86/dumpstack: Carve out code-dumping into a functionBorislav Petkov
No functionality change, carve it out into a separate function for later changes. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Link: https://lkml.kernel.org/r/20180417161124.5294-4-bp@alien8.de
2018-04-26x86/dumpstack: Unexport oops_begin()Borislav Petkov
The only user outside of arch/ is not a module since 86cd47334b00 ("ACPI, APEI, GHES, Prevent GHES to be built as module") No functional changes. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Link: https://lkml.kernel.org/r/20180417161124.5294-3-bp@alien8.de
2018-04-26x86/dumpstack: Remove code_bytesBorislav Petkov
This was added by 86c418374223 ("[PATCH] i386: add option to show more code in oops reports") long time ago but experience shows that 64 instruction bytes are plenty when deciphering an oops. So get rid of it. Removing it will simplify further enhancements to the opcodes dumping machinery coming in the following patches. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Link: https://lkml.kernel.org/r/20180417161124.5294-2-bp@alien8.de
2018-03-08x86/dumpstack: Unify show_regs()Borislav Petkov
The 32-bit version uses KERN_EMERG and commit b0f4c4b32c8e ("bugs, x86: Fix printk levels for panic, softlockups and stack dumps") changed the 64-bit version to KERN_DEFAULT. The same justification in that commit that those messages do not belong in the terminal, holds true for 32-bit also, so make it so. Make code_bytes static, while at it. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Link: https://lkml.kernel.org/r/20180306094920.16917-4-bp@alien8.de
2018-02-02x86/dumpstack: Avoid uninitlized variableArnd Bergmann
In some configurations, 'partial' does not get initialized, as shown by this gcc-8 warning: arch/x86/kernel/dumpstack.c: In function 'show_trace_log_lvl': arch/x86/kernel/dumpstack.c:156:4: error: 'partial' may be used uninitialized in this function [-Werror=maybe-uninitialized] show_regs_if_on_stack(&stack_info, regs, partial); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This initializes it to false, to get the previous behavior in this case. Fixes: a9cdbe72c4e8 ("x86/dumpstack: Fix partial register dumps") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andi Kleen <ak@linux.intel.com> Cc: Nicolas Pitre <nico@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Link: https://lkml.kernel.org/r/20180202145634.200291-1-arnd@arndb.de
2018-01-03x86/dumpstack: Print registers for first stack frameJosh Poimboeuf
In the stack dump code, if the frame after the starting pt_regs is also a regs frame, the registers don't get printed. Fix that. Reported-by: Andy Lutomirski <luto@amacapital.net> Tested-by: Alexander Tsoy <alexander@tsoy.me> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toralf Förster <toralf.foerster@gmx.de> Cc: stable@vger.kernel.org Fixes: 3b3fa11bc700 ("x86/dumpstack: Print any pt_regs found on the stack") Link: http://lkml.kernel.org/r/396f84491d2f0ef64eda4217a2165f5712f6a115.1514736742.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-03x86/dumpstack: Fix partial register dumpsJosh Poimboeuf
The show_regs_safe() logic is wrong. When there's an iret stack frame, it prints the entire pt_regs -- most of which is random stack data -- instead of just the five registers at the end. show_regs_safe() is also poorly named: the on_stack() checks aren't for safety. Rename the function to show_regs_if_on_stack() and add a comment to explain why the checks are needed. These issues were introduced with the "partial register dump" feature of the following commit: b02fcf9ba121 ("x86/unwinder: Handle stack overflows more gracefully") That patch had gone through a few iterations of development, and the above issues were artifacts from a previous iteration of the patch where 'regs' pointed directly to the iret frame rather than to the (partially empty) pt_regs. Tested-by: Alexander Tsoy <alexander@tsoy.me> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toralf Förster <toralf.foerster@gmx.de> Cc: stable@vger.kernel.org Fixes: b02fcf9ba121 ("x86/unwinder: Handle stack overflows more gracefully") Link: http://lkml.kernel.org/r/5b05b8b344f59db2d3d50dbdeba92d60f2304c54.1514736742.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-23x86/dumpstack: Indicate in Oops whether PTI is configured and enabledVlastimil Babka
CONFIG_PAGE_TABLE_ISOLATION is relatively new and intrusive feature that may still have some corner cases which could take some time to manifest and be fixed. It would be useful to have Oops messages indicate whether it was enabled for building the kernel, and whether it was disabled during boot. Example of fully enabled: Oops: 0001 [#1] SMP PTI Example of enabled during build, but disabled during boot: Oops: 0001 [#1] SMP NOPTI We can decide to remove this after the feature has been tested in the field long enough. [ tglx: Made it use boot_cpu_has() as requested by Borislav ] Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Eduardo Valentin <eduval@amazon.com> Acked-by: Dave Hansen <dave.hansen@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirsky <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Laight <David.Laight@aculab.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will.deacon@arm.com> Cc: aliguori@amazon.com Cc: bpetkov@suse.de Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: jkosina@suse.cz Cc: keescook@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-22x86/cpu_entry_area: Move it out of the fixmapThomas Gleixner
Put the cpu_entry_area into a separate P4D entry. The fixmap gets too big and 0-day already hit a case where the fixmap PTEs were cleared by cleanup_highmap(). Aside of that the fixmap API is a pain as it's all backwards. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-22x86/entry: Rename SYSENTER_stack to CPU_ENTRY_AREA_entry_stackDave Hansen
If the kernel oopses while on the trampoline stack, it will print "<SYSENTER>" even if SYSENTER is not involved. That is rather confusing. The "SYSENTER" stack is used for a lot more than SYSENTER now. Give it a better string to display in stack dumps, and rename the kernel code to match. Also move the 32-bit code over to the new naming even though it still uses the entry stack only for SYSENTER. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-17x86/entry: Clean up the SYSENTER_stack codeAndy Lutomirski
The existing code was a mess, mainly because C arrays are nasty. Turn SYSENTER_stack into a struct, add a helper to find it, and do all the obvious cleanups this enables. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bpetkov@suse.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Laight <David.Laight@aculab.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Eduardo Valentin <eduval@amazon.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Link: https://lkml.kernel.org/r/20171204150606.653244723@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-17x86/entry/64: Remove the SYSENTER stack canaryAndy Lutomirski
Now that the SYSENTER stack has a guard page, there's no need for a canary to detect overflow after the fact. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Laight <David.Laight@aculab.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Eduardo Valentin <eduval@amazon.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Link: https://lkml.kernel.org/r/20171204150606.572577316@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-17x86/entry: Remap the TSS into the CPU entry areaAndy Lutomirski
This has a secondary purpose: it puts the entry stack into a region with a well-controlled layout. A subsequent patch will take advantage of this to streamline the SYSCALL entry code to be able to find it more easily. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bpetkov@suse.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Laight <David.Laight@aculab.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Eduardo Valentin <eduval@amazon.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Link: https://lkml.kernel.org/r/20171204150605.962042855@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-17x86/dumpstack: Handle stack overflow on all stacksAndy Lutomirski
We currently special-case stack overflow on the task stack. We're going to start putting special stacks in the fixmap with a custom layout, so they'll have guard pages, too. Teach the unwinder to be able to unwind an overflow of any of the stacks. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Laight <David.Laight@aculab.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Eduardo Valentin <eduval@amazon.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Link: https://lkml.kernel.org/r/20171204150605.802057305@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-17x86/dumpstack: Add get_stack_info() support for the SYSENTER stackAndy Lutomirski
get_stack_info() doesn't currently know about the SYSENTER stack, so unwinding will fail if we entered the kernel on the SYSENTER stack and haven't fully switched off. Teach get_stack_info() about the SYSENTER stack. With future patches applied that run part of the entry code on the SYSENTER stack and introduce an intentional BUG(), I would get: PANIC: double fault, error_code: 0x0 ... RIP: 0010:do_error_trap+0x33/0x1c0 ... Call Trace: Code: ... With this patch, I get: PANIC: double fault, error_code: 0x0 ... Call Trace: <SYSENTER> ? async_page_fault+0x36/0x60 ? invalid_op+0x22/0x40 ? async_page_fault+0x36/0x60 ? sync_regs+0x3c/0x40 ? sync_regs+0x2e/0x40 ? error_entry+0x6c/0xd0 ? async_page_fault+0x36/0x60 </SYSENTER> Code: ... which is a lot more informative. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Laight <David.Laight@aculab.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Eduardo Valentin <eduval@amazon.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Link: https://lkml.kernel.org/r/20171204150605.392711508@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-17x86/unwinder: Handle stack overflows more gracefullyJosh Poimboeuf
There are at least two unwinder bugs hindering the debugging of stack-overflow crashes: - It doesn't deal gracefully with the case where the stack overflows and the stack pointer itself isn't on a valid stack but the to-be-dereferenced data *is*. - The ORC oops dump code doesn't know how to print partial pt_regs, for the case where if we get an interrupt/exception in *early* entry code before the full pt_regs have been saved. Fix both issues. http://lkml.kernel.org/r/20171126024031.uxi4numpbjm5rlbr@treble Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bpetkov@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Laight <David.Laight@aculab.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Eduardo Valentin <eduval@amazon.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Link: https://lkml.kernel.org/r/20171204150605.071425003@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-30x86/asm/32: Remove a bunch of '& 0xffff' from pt_regs segment readsAndy Lutomirski
Now that pt_regs properly defines segment fields as 16-bit on 32-bit CPUs, there's no need to mask off the high word. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-18x86/dumpstack: Fix occasionally missing registersJosh Poimboeuf
If two consecutive stack frames have pt_regs, the oops dump code fails to print the second frame's registers. Fix that. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: live-patching@vger.kernel.org Fixes: 3b3fa11bc700 ("x86/dumpstack: Print any pt_regs found on the stack") Link: http://lkml.kernel.org/r/269c5c00c7d45c699f3dcea42a3a594c6cf7a9a3.1499786555.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-18x86/unwind: Ensure stack pointer is alignedJosh Poimboeuf
With frame pointers disabled, on some older versions of GCC (like 4.8.3), it's possible for the stack pointer to get aligned at a half-word boundary: 00000000000004d0 <fib_table_lookup>: 4d0: 41 57 push %r15 4d2: 41 56 push %r14 4d4: 41 55 push %r13 4d6: 41 54 push %r12 4d8: 55 push %rbp 4d9: 53 push %rbx 4da: 48 83 ec 24 sub $0x24,%rsp In such a case, the unwinder ends up reading the entire stack at the wrong alignment. Then the last read goes past the end of the stack, hitting the stack guard page: BUG: stack guard page was hit at ffffc900217c4000 (stack is ffffc900217c0000..ffffc900217c3fff) kernel stack overflow (page fault): 0000 [#1] SMP ... Fix it by ensuring the stack pointer is properly aligned before unwinding. Reported-by: Jirka Hladky <jhladky@redhat.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: 7c7900f89770 ("x86/unwind: Add new unwind interface and implementations") Link: http://lkml.kernel.org/r/cff33847cc9b02fa548625aa23268ac574460d8d.1492436590.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-27x86/debug: Implement __WARN() using UD0Peter Zijlstra
By using "UD0" for WARN()s we remove the function call and its possible __FILE__ and __LINE__ immediate arguments from the instruction stream. Total image size will not change much, what we win in the instruction stream we'll lose because of the __bug_table entries. Still, saves on I$ footprint and the total image size does go down a bit. text data filename 10702123 4530992 defconfig-build/vmlinux.orig 10682460 4530992 defconfig-build/vmlinux.patched (UML didn't seem to use GENERIC_BUG at all, so remove it) Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Weinberger <richard.weinberger@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02sched/headers: Prepare for new header dependencies before moving code to ↵Ingo Molnar
<linux/sched/task_stack.h> We are going to split <linux/sched/task_stack.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/task_stack.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02sched/headers: Prepare for new header dependencies before moving code to ↵Ingo Molnar
<linux/sched/debug.h> We are going to split <linux/sched/debug.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/debug.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-12-12Merge branch 'x86-asm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 asm updates from Ingo Molnar: "The main changes in this development cycle were: - a large number of call stack dumping/printing improvements: higher robustness, better cross-context dumping, improved output, etc. (Josh Poimboeuf) - vDSO getcpu() performance improvement for future Intel CPUs with the RDPID instruction (Andy Lutomirski) - add two new Intel AVX512 features and the CPUID support infrastructure for it: AVX512IFMA and AVX512VBMI. (Gayatri Kammela, He Chen) - more copy-user unification (Borislav Petkov) - entry code assembly macro simplifications (Alexander Kuleshov) - vDSO C/R support improvements (Dmitry Safonov) - misc fixes and cleanups (Borislav Petkov, Paul Bolle)" * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits) scripts/decode_stacktrace.sh: Fix address line detection on x86 x86/boot/64: Use defines for page size x86/dumpstack: Make stack name tags more comprehensible selftests/x86: Add test_vdso to test getcpu() x86/vdso: Use RDPID in preference to LSL when available x86/dumpstack: Handle NULL stack pointer in show_trace_log_lvl() x86/cpufeatures: Enable new AVX512 cpu features x86/cpuid: Provide get_scattered_cpuid_leaf() x86/cpuid: Cleanup cpuid_regs definitions x86/copy_user: Unify the code by removing the 64-bit asm _copy_*_user() variants x86/unwind: Ensure stack grows down x86/vdso: Set vDSO pointer only after success x86/prctl/uapi: Remove #ifdef for CHECKPOINT_RESTORE x86/unwind: Detect bad stack return address x86/dumpstack: Warn on stack recursion x86/unwind: Warn on bad frame pointer x86/decoder: Use stderr if insn sanity test fails x86/decoder: Use stdout if insn decoder test is successful mm/page_alloc: Remove kernel address exposure in free_reserved_area() x86/dumpstack: Remove raw stack dump ...
2016-11-21x86/dumpstack: Make stack name tags more comprehensibleJosh Poimboeuf
NMI stack dumps are bracketed by the following tags: <NMI> ... <EOE> The ending tag is kind of confusing if you don't already know what "EOE" means (end of exception). The same ending tag is also used to mark the end of all other exceptions' stacks. For example: <#DF> ... <EOE> And similarly, "EOI" is used as the ending tag for interrupts: <IRQ> ... <EOI> Change the tags to be more comprehensible by making them symmetrical and more XML-esque: <NMI> ... </NMI> <#DF> ... </#DF> <IRQ> ... </IRQ> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/180196e3754572540b595bc56b947d43658979a7.1479491159.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-18x86/dumpstack: Prevent KASAN false positive warningsJosh Poimboeuf
The oops stack dump code scans the entire stack, which can cause KASAN "stack-out-of-bounds" false positive warnings. Tell KASAN to ignore it. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: davej@codemonkey.org.uk Cc: dvyukov@google.com Link: http://lkml.kernel.org/r/5f6e80c4b0c7f7f0b6211900847a247cdaad753c.1479398226.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-17x86/dumpstack: Handle NULL stack pointer in show_trace_log_lvl()Josh Poimboeuf
When show_trace_log_lvl() is called from show_regs(), it completely fails to dump the stack. This bug was introduced when show_stack_log_lvl() was removed with the following commit: 0ee1dd9f5e7e ("x86/dumpstack: Remove raw stack dump") Previous callers of that function now call show_trace_log_lvl() directly. That resulted in a subtle change, in that the 'stack' argument can now be NULL in certain cases. A NULL 'stack' pointer means that the stack dump should start from the topmost stack frame unless 'regs' is valid, in which case it should start from 'regs->sp'. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 0ee1dd9f5e7e ("x86/dumpstack: Remove raw stack dump") Link: http://lkml.kernel.org/r/c551842302a9c222d96a14e42e4003f059509f69.1479362652.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-25x86/dumpstack: Remove raw stack dumpJosh Poimboeuf
For mostly historical reasons, the x86 oops dump shows the raw stack values: ... [registers] Stack: ffff880079af7350 ffff880079905400 0000000000000000 ffffc900008f3ae0 ffffffffa0196610 0000000000000001 00010000ffffffff 0000000087654321 0000000000000002 0000000000000000 0000000000000000 0000000000000000 Call Trace: ... This seems to be an artifact from long ago, and probably isn't needed anymore. It generally just adds noise to the dump, and it can be actively harmful because it leaks kernel addresses. Linus says: "The stack dump actually goes back to forever, and it used to be useful back in 1992 or so. But it used to be useful mainly because stacks were simpler and we didn't have very good call traces anyway. I definitely remember having used them - I just do not remember having used them in the last ten+ years. Of course, it's still true that if you can trigger an oops, you've likely already lost the security game, but since the stack dump is so useless, let's aim to just remove it and make games like the above harder." This also removes the related 'kstack=' cmdline option and the 'kstack_depth_to_print' sysctl. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/e83bd50df52d8fe88e94d2566426ae40d813bf8f.1477405374.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-25x86/dumpstack: Remove kernel text addresses from stack dumpJosh Poimboeuf
Printing kernel text addresses in stack dumps is of questionable value, especially now that address randomization is becoming common. It can be a security issue because it leaks kernel addresses. It also affects the usefulness of the stack dump. Linus says: "I actually spend time cleaning up commit messages in logs, because useless data that isn't actually information (random hex numbers) is actively detrimental. It makes commit logs less legible. It also makes it harder to parse dumps. It's not useful. That makes it actively bad. I probably look at more oops reports than most people. I have not found the hex numbers useful for the last five years, because they are just randomized crap. The stack content thing just makes code scroll off the screen etc, for example." The only real downside to removing these addresses is that they can be used to disambiguate duplicate symbol names. However such cases are rare, and the context of the stack dump should be enough to be able to figure it out. There's now a 'faddr2line' script which can be used to convert a function address to a file name and line: $ ./scripts/faddr2line ~/k/vmlinux write_sysrq_trigger+0x51/0x60 write_sysrq_trigger+0x51/0x60: write_sysrq_trigger at drivers/tty/sysrq.c:1098 Or gdb can be used: $ echo "list *write_sysrq_trigger+0x51" |gdb ~/k/vmlinux |grep "is in" (gdb) 0xffffffff815b5d83 is in driver_probe_device (/home/jpoimboe/git/linux/drivers/base/dd.c:378). (But note that when there are duplicate symbol names, gdb will only show the first symbol it finds. faddr2line is recommended over gdb because it handles duplicates and it also does function size checking.) Here's an example of what a stack dump looks like after this change: BUG: unable to handle kernel NULL pointer dereference at (null) IP: sysrq_handle_crash+0x45/0x80 PGD 36bfa067 [ 29.650644] PUD 7aca3067 Oops: 0002 [#1] PREEMPT SMP Modules linked in: ... CPU: 1 PID: 786 Comm: bash Tainted: G E 4.9.0-rc1+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-1.fc24 04/01/2014 task: ffff880078582a40 task.stack: ffffc90000ba8000 RIP: 0010:sysrq_handle_crash+0x45/0x80 RSP: 0018:ffffc90000babdc8 EFLAGS: 00010296 RAX: ffff880078582a40 RBX: 0000000000000063 RCX: 0000000000000001 RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000292 RBP: ffffc90000babdc8 R08: 0000000b31866061 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000007 R14: ffffffff81ee8680 R15: 0000000000000000 FS: 00007ffb43869700(0000) GS:ffff88007d400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000007a3e9000 CR4: 00000000001406e0 Stack: ffffc90000babe00 ffffffff81572d08 ffffffff81572bd5 0000000000000002 0000000000000000 ffff880079606600 00007ffb4386e000 ffffc90000babe20 ffffffff81573201 ffff880036a3fd00 fffffffffffffffb ffffc90000babe40 Call Trace: __handle_sysrq+0x138/0x220 ? __handle_sysrq+0x5/0x220 write_sysrq_trigger+0x51/0x60 proc_reg_write+0x42/0x70 __vfs_write+0x37/0x140 ? preempt_count_sub+0xa1/0x100 ? __sb_start_write+0xf5/0x210 ? vfs_write+0x183/0x1a0 vfs_write+0xb8/0x1a0 SyS_write+0x58/0xc0 entry_SYSCALL_64_fastpath+0x1f/0xc2 RIP: 0033:0x7ffb42f55940 RSP: 002b:00007ffd33bb6b18 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000046 RCX: 00007ffb42f55940 RDX: 0000000000000002 RSI: 00007ffb4386e000 RDI: 0000000000000001 RBP: 0000000000000011 R08: 00007ffb4321ea40 R09: 00007ffb43869700 R10: 00007ffb43869700 R11: 0000000000000246 R12: 0000000000778a10 R13: 00007ffd33bb5c00 R14: 0000000000000007 R15: 0000000000000010 Code: 34 e8 d0 34 bc ff 48 c7 c2 3b 2b 57 81 be 01 00 00 00 48 c7 c7 e0 dd e5 81 e8 a8 55 ba ff c7 05 0e 3f de 00 01 00 00 00 0f ae f8 <c6> 04 25 00 00 00 00 01 5d c3 e8 4c 49 bc ff 84 c0 75 c3 48 c7 RIP: sysrq_handle_crash+0x45/0x80 RSP: ffffc90000babdc8 CR2: 0000000000000000 Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/69329cb29b8f324bb5fcea14d61d224807fb6488.1477405374.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-21x86/dumpstack: Print any pt_regs found on the stackJosh Poimboeuf
Now that we can find pt_regs registers on the stack, print them. Here's an example of what it looks like: Call Trace: <IRQ> [<ffffffff8144b793>] dump_stack+0x86/0xc3 [<ffffffff81142c73>] hrtimer_interrupt+0xb3/0x1c0 [<ffffffff8105eb86>] local_apic_timer_interrupt+0x36/0x60 [<ffffffff818b27cd>] smp_apic_timer_interrupt+0x3d/0x50 [<ffffffff818b06ee>] apic_timer_interrupt+0x9e/0xb0 RIP: 0010:[<ffffffff818aef43>] [<ffffffff818aef43>] _raw_spin_unlock_irq+0x33/0x60 RSP: 0018:ffff880079c4f760 EFLAGS: 00000202 RAX: ffff880078738000 RBX: ffff88007d3da0c0 RCX: 0000000000000007 RDX: 0000000000006d78 RSI: ffff8800787388f0 RDI: ffff880078738000 RBP: ffff880079c4f768 R08: 0000002199088f38 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff81e0d540 R13: ffff8800369fb700 R14: 0000000000000000 R15: ffff880078738000 <EOI> [<ffffffff810e1f14>] finish_task_switch+0xb4/0x250 [<ffffffff810e1ed6>] ? finish_task_switch+0x76/0x250 [<ffffffff818a7b61>] __schedule+0x3e1/0xb20 ... [<ffffffff810759c8>] trace_do_page_fault+0x58/0x2c0 [<ffffffff8106f7dc>] do_async_page_fault+0x2c/0xa0 [<ffffffff818b1dd8>] async_page_fault+0x28/0x30 RIP: 0010:[<ffffffff8145b062>] [<ffffffff8145b062>] __clear_user+0x42/0x70 RSP: 0018:ffff880079c4fd38 EFLAGS: 00010202 RAX: 0000000000000000 RBX: 0000000000000138 RCX: 0000000000000138 RDX: 0000000000000000 RSI: 0000000000000008 RDI: 000000000061b640 RBP: ffff880079c4fd48 R08: 0000002198feefd7 R09: ffffffff82a40928 R10: 0000000000000001 R11: 0000000000000000 R12: 000000000061b640 R13: 0000000000000000 R14: ffff880079c50000 R15: ffff8800791d7400 [<ffffffff8145b043>] ? __clear_user+0x23/0x70 [<ffffffff8145b0fb>] clear_user+0x2b/0x40 [<ffffffff812fbda2>] load_elf_binary+0x1472/0x1750 [<ffffffff8129a591>] search_binary_handler+0xa1/0x200 [<ffffffff8129b69b>] do_execveat_common.isra.36+0x6cb/0x9f0 [<ffffffff8129b5f3>] ? do_execveat_common.isra.36+0x623/0x9f0 [<ffffffff8129bcaa>] SyS_execve+0x3a/0x50 [<ffffffff81003f5c>] do_syscall_64+0x6c/0x1e0 [<ffffffff818afa3f>] entry_SYSCALL64_slow_path+0x25/0x25 RIP: 0033:[<00007fd2e2f2e537>] [<00007fd2e2f2e537>] 0x7fd2e2f2e537 RSP: 002b:00007ffc449c5fc8 EFLAGS: 00000246 RAX: ffffffffffffffda RBX: 00007ffc449c8860 RCX: 00007fd2e2f2e537 RDX: 000000000127cc40 RSI: 00007ffc449c8860 RDI: 00007ffc449c6029 RBP: 00007ffc449c60b0 R08: 65726f632d667265 R09: 00007ffc449c5e20 R10: 00000000000005a7 R11: 0000000000000246 R12: 000000000127cc40 R13: 000000000127ce05 R14: 00007ffc449c6029 R15: 000000000127ce01 Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/5cc2c512ec82cfba00dd22467644d4ed751a48c0.1476973742.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-21x86/dumpstack: Print stack identifier on its own lineJosh Poimboeuf
show_trace_log_lvl() prints the stack id (e.g. "<IRQ>") without a newline so that any stack address printed after it will appear on the same line. That causes the first stack address to be vertically misaligned with the rest, making it visually cluttered and slightly confusing: Call Trace: <IRQ> [<ffffffff814431c3>] dump_stack+0x86/0xc3 [<ffffffff8100828b>] perf_callchain_kernel+0x14b/0x160 [<ffffffff811e915f>] get_perf_callchain+0x15f/0x2b0 ... <EOI> [<ffffffff8189c6c3>] ? _raw_spin_unlock_irq+0x33/0x60 [<ffffffff810e1c84>] finish_task_switch+0xb4/0x250 [<ffffffff8106f7dc>] do_async_page_fault+0x2c/0xa0 It will look worse once we start printing pt_regs registers found in the middle of the stack: <IRQ> RIP: 0010:[<ffffffff8189c6c3>] [<ffffffff8189c6c3>] _raw_spin_unlock_irq+0x33/0x60 RSP: 0018:ffff88007876f720 EFLAGS: 00000206 RAX: ffff8800786caa40 RBX: ffff88007d5da140 RCX: 0000000000000007 ... Improve readability by adding a newline to the stack name: Call Trace: <IRQ> [<ffffffff814431c3>] dump_stack+0x86/0xc3 [<ffffffff8100828b>] perf_callchain_kernel+0x14b/0x160 [<ffffffff811e915f>] get_perf_callchain+0x15f/0x2b0 ... <EOI> [<ffffffff8189c6c3>] ? _raw_spin_unlock_irq+0x33/0x60 [<ffffffff810e1c84>] finish_task_switch+0xb4/0x250 [<ffffffff8106f7dc>] do_async_page_fault+0x2c/0xa0 Now that "continued" lines are no longer needed, we can also remove the hack of using the empty string (aka KERN_CONT) and replace it with KERN_DEFAULT. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/9bdd6dee2c74555d45500939fcc155997dc7889e.1476973742.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-20x86/dumpstack: Fix show_stack() task pointer regressionJosh Poimboeuf
With the following commit: e18bcccd1a4e ("x86/dumpstack: Convert show_trace_log_lvl() to use the new unwinder") The task pointer argument to show_stack_log_lvl() in show_stack() was inadvertently changed to 'current'. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: byungchul.park@lge.com Cc: fweisbec@gmail.com Cc: keescook@chromium.org Cc: linux-tip-commits@vger.kernel.org Cc: luto@amacapital.net Cc: nilayvaish@gmail.com Cc: rostedt@goodmis.org Cc: tip-bot for Josh Poimboeuf <tipbot@zytor.com> Fixes: e18bcccd1a4e ("x86/dumpstack: Convert show_trace_log_lvl() to use the new unwinder") Link: http://lkml.kernel.org/r/20160920155340.yhewlx7vmgmov5fb@treble Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-20x86/dumpstack: Remove dump_trace() and related callbacksJosh Poimboeuf
All previous users of dump_trace() have been converted to use the new unwind interfaces, so we can remove it and the related print_context_stack() and print_context_stack_bp() callback functions. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nilay Vaish <nilayvaish@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/5b97da3572b40b5a4d8e185cf2429308d0987a13.1474045023.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-20x86/dumpstack: Convert show_trace_log_lvl() to use the new unwinderJosh Poimboeuf
Convert show_trace_log_lvl() to use the new unwinder. dump_trace() has been deprecated. show_trace_log_lvl() is special compared to other users of the unwinder. It's the only place where both reliable *and* unreliable addresses are needed. With frame pointers enabled, most callers of the unwinder don't want to know about unreliable addresses. But in this case, when we're dumping the stack to the console because something presumably went wrong, the unreliable addresses are useful: - They show stale data on the stack which can provide useful clues. - If something goes wrong with the unwinder, or if frame pointers are corrupt or missing, all the stack addresses still get shown. So in order to show all addresses on the stack, and at the same time figure out which addresses are reliable, we have to do the scanning and the unwinding in parallel. The scanning is done with the help of get_stack_info() to traverse the stacks. The unwinding is done separately by the new unwinder. In theory we could simplify show_trace_log_lvl() by instead pushing some of this logic into the unwind code. But then we would need some kind of "fake" frame logic in the unwinder which would add a lot of complexity and wouldn't be worth it in order to support only one user. Another benefit of this approach is that once we have a DWARF unwinder, we should be able to just plug it in with minimal impact to this code. Another change here is that callers of show_trace_log_lvl() don't need to provide the 'bp' argument. The unwinder already finds the relevant frame pointer by unwinding until it reaches the first frame after the provided stack pointer. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nilay Vaish <nilayvaish@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/703b5998604c712a1f801874b43f35d6dac52ede.1474045023.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-16x86/dumpstack: Remove NULL task pointer conventionJosh Poimboeuf
show_stack_log_lvl() and friends allow a NULL pointer for the task_struct to indicate the current task. This creates confusion and can cause sneaky bugs. Instead require the caller to pass 'current' directly. This only changes the internal workings of the dumpstack code. The dump_trace() and show_stack() interfaces still allow a NULL task pointer. Those interfaces should also probably be fixed as well. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-15x86/dumpstack: Add get_stack_info() interfaceJosh Poimboeuf
valid_stack_ptr() is buggy: it assumes that all stacks are of size THREAD_SIZE, which is not true for exception stacks. So the walk_stack() callbacks will need to know the location of the beginning of the stack as well as the end. Another issue is that in general the various features of a stack (type, size, next stack pointer, description string) are scattered around in various places throughout the stack dump code. Encapsulate all that information in a single place with a new stack_info struct and a get_stack_info() interface. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nilay Vaish <nilayvaish@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/8164dd0db96b7e6a279fa17ae5e6dc375eecb4a9.1473905218.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-08x86/dumpstack: Remove unnecessary stack pointer argumentsJosh Poimboeuf
When calling show_stack_log_lvl() or dump_trace() with a regs argument, providing a stack pointer or frame pointer is redundant. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>d Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Brian Gerst <brgerst@gmail.com> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nilay Vaish <nilayvaish@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1694e2e955e3b9a73a3c3d5ba2634344014dd550.1472057064.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-08x86/dumpstack: Add get_stack_pointer() and get_frame_pointer()Josh Poimboeuf
The various functions involved in dumping the stack all do similar things with regard to getting the stack pointer and the frame pointer based on the regs and task arguments. Create helper functions to do that instead. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Brian Gerst <brgerst@gmail.com> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nilay Vaish <nilayvaish@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/f448914885a35f333fe04da1b97a6c2cc1f80974.1472057064.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-08x86/dumpstack: Make printk_stack_address() more generally usefulJosh Poimboeuf
Change printk_stack_address() to be useful when called by an unwinder outside the context of dump_trace(). Specifically: - printk_stack_address()'s 'data' argument is always used as the log level string. Make that explicit. - Call touch_nmi_watchdog(). Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Brian Gerst <brgerst@gmail.com> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nilay Vaish <nilayvaish@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/9fbe0db05bacf66d337c162edbf61450d0cff1e2.1472057064.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-24x86/dumpstack/ftrace: Don't print unreliable addresses in ↵Josh Poimboeuf
print_context_stack_bp() When function graph tracing is enabled, print_context_stack_bp() can report return_to_handler() as an unreliable address, which is confusing and misleading: return_to_handler() is really only useful as a hint for debugging, whereas print_context_stack_bp() users only care about the actual 'reliable' call path. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nilay Vaish <nilayvaish@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/c51aef578d8027791b38d2ad9bac0c7f499fde91.1471607358.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-24x86/dumpstack/ftrace: Mark function graph handler function as unreliableJosh Poimboeuf
When function graph tracing is enabled for a function, its return address on the stack is replaced with the address of an ftrace handler (return_to_handler). Currently 'return_to_handler' can be reported as reliable. That's not ideal, and can actually be misleading. When saving or dumping the stack, you normally only care about what led up to that point (the call path), rather than what will happen in the future (the return path). That's especially true in the non-oops stack trace case, which isn't used for debugging. For example, in a perf profiling operation, reporting return_to_handler() in the trace would just be confusing. And in the oops case, where debugging is important, "unreliable" is also more appropriate there because it serves as a hint that graph tracing was involved, instead of trying to imply that return_to_handler() was the real caller. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nilay Vaish <nilayvaish@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/f8af15749c7d632d3e7f815995831d5b7f82950d.1471607358.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>