summaryrefslogtreecommitdiffstats
path: root/arch/x86/kernel
AgeCommit message (Collapse)Author
2015-08-14Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Misc fixes: PMU driver corner cases, tooling fixes, and an 'AUX' (Intel PT) race related core fix" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel/cqm: Do not access cpu_data() from CPU_UP_PREPARE handler perf/x86/intel: Fix memory leak on hot-plug allocation fail perf: Fix PERF_EVENT_IOC_PERIOD migration race perf: Fix double-free of the AUX buffer perf: Fix fasync handling on inherited events perf tools: Fix test build error when bindir contains double slash perf stat: Fix transaction lenght metrics perf: Fix running time accounting
2015-08-13Revert x86 sigcontext cleanupsLinus Torvalds
This reverts commits 9a036b93a344 ("x86/signal/64: Remove 'fs' and 'gs' from sigcontext") and c6f2062935c8 ("x86/signal/64: Fix SS handling for signals delivered to 64-bit programs"). They were cleanups, but they break dosemu by changing the signal return behavior (and removing 'fs' and 'gs' from the sigcontext struct - while not actually changing any behavior - causes build problems). Reported-and-tested-by: Stas Sergeev <stsp@list.ru> Acked-by: Andy Lutomirski <luto@amacapital.net> Cc: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-12perf/x86/intel/cqm: Do not access cpu_data() from CPU_UP_PREPARE handlerMatt Fleming
Tony reports that booting his 144-cpu machine with maxcpus=10 triggers the following WARN_ON(): [ 21.045727] WARNING: CPU: 8 PID: 647 at arch/x86/kernel/cpu/perf_event_intel_cqm.c:1267 intel_cqm_cpu_prepare+0x75/0x90() [ 21.045744] CPU: 8 PID: 647 Comm: systemd-udevd Not tainted 4.2.0-rc4 #1 [ 21.045745] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0066.R00.1506021730 06/02/2015 [ 21.045747] 0000000000000000 0000000082771b09 ffff880856333ba8 ffffffff81669b67 [ 21.045748] 0000000000000000 0000000000000000 ffff880856333be8 ffffffff8107b02a [ 21.045750] ffff88085b789800 ffff88085f68a020 ffffffff819e2470 000000000000000a [ 21.045750] Call Trace: [ 21.045757] [<ffffffff81669b67>] dump_stack+0x45/0x57 [ 21.045759] [<ffffffff8107b02a>] warn_slowpath_common+0x8a/0xc0 [ 21.045761] [<ffffffff8107b15a>] warn_slowpath_null+0x1a/0x20 [ 21.045762] [<ffffffff81036725>] intel_cqm_cpu_prepare+0x75/0x90 [ 21.045764] [<ffffffff81036872>] intel_cqm_cpu_notifier+0x42/0x160 [ 21.045767] [<ffffffff8109a33d>] notifier_call_chain+0x4d/0x80 [ 21.045769] [<ffffffff8109a44e>] __raw_notifier_call_chain+0xe/0x10 [ 21.045770] [<ffffffff8107b538>] _cpu_up+0xe8/0x190 [ 21.045771] [<ffffffff8107b65a>] cpu_up+0x7a/0xa0 [ 21.045774] [<ffffffff8165e920>] cpu_subsys_online+0x40/0x90 [ 21.045777] [<ffffffff81433b37>] device_online+0x67/0x90 [ 21.045778] [<ffffffff81433bea>] online_store+0x8a/0xa0 [ 21.045782] [<ffffffff81430e78>] dev_attr_store+0x18/0x30 [ 21.045785] [<ffffffff8126b6ba>] sysfs_kf_write+0x3a/0x50 [ 21.045786] [<ffffffff8126ad40>] kernfs_fop_write+0x120/0x170 [ 21.045789] [<ffffffff811f0b77>] __vfs_write+0x37/0x100 [ 21.045791] [<ffffffff811f38b8>] ? __sb_start_write+0x58/0x110 [ 21.045795] [<ffffffff81296d2d>] ? security_file_permission+0x3d/0xc0 [ 21.045796] [<ffffffff811f1279>] vfs_write+0xa9/0x190 [ 21.045797] [<ffffffff811f2075>] SyS_write+0x55/0xc0 [ 21.045800] [<ffffffff81067300>] ? do_page_fault+0x30/0x80 [ 21.045804] [<ffffffff816709ae>] entry_SYSCALL_64_fastpath+0x12/0x71 [ 21.045805] ---[ end trace fe228b836d8af405 ]--- The root cause is that CPU_UP_PREPARE is completely the wrong notifier action from which to access cpu_data(), because smp_store_cpu_info() won't have been executed by the target CPU at that point, which in turn means that ->x86_cache_max_rmid and ->x86_cache_occ_scale haven't been filled out. Instead let's invoke our handler from CPU_STARTING and rename it appropriately. Reported-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Kanaka Juvva <kanaka.d.juvva@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vikas Shivappa <vikas.shivappa@intel.com> Link: http://lkml.kernel.org/r/1438863163-14083-1-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-12perf/x86/intel: Fix memory leak on hot-plug allocation failPeter Zijlstra
We fail to free the shared_regs allocation if the constraint_list allocation fails. Cure this and be more consistent in NULL-ing the pointers after free. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-31x86/ldt: Make modify_ldt synchronousAndy Lutomirski
modify_ldt() has questionable locking and does not synchronize threads. Improve it: redesign the locking and synchronize all threads' LDTs using an IPI on all modifications. This will dramatically slow down modify_ldt in multithreaded programs, but there shouldn't be any multithreaded programs that care about modify_ldt's performance in the first place. This fixes some fallout from the CVE-2015-5157 fixes. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jan Beulich <jbeulich@suse.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: security@kernel.org <security@kernel.org> Cc: <stable@vger.kernel.org> Cc: xen-devel <xen-devel@lists.xen.org> Link: http://lkml.kernel.org/r/4c6978476782160600471bd865b318db34c7b628.1438291540.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-30x86/irq: Use the caller provided polarity setting in mp_check_pin_attr()Jiang Liu
Commit d32932d02e18 ("x86/irq: Convert IOAPIC to use hierarchical irqdomain interfaces") introduced a regression which causes malfunction of interrupt lines. The reason is that the conversion of mp_check_pin_attr() missed to update the polarity selection of the interrupt pin with the caller provided setting and instead uses a stale attribute value. That in turn results in chosing the wrong interrupt flow handler. Use the caller supplied setting to configure the pin correctly which also choses the correct interrupt flow handler. This restores the original behaviour and on the affected machine/driver (Surface Pro 3, i2c controller) all IOAPIC IRQ configuration are identical to v4.1. Fixes: d32932d02e18 ("x86/irq: Convert IOAPIC to use hierarchical irqdomain interfaces") Reported-and-tested-by: Matt Fleming <matt@codeblueprint.co.uk> Reported-and-tested-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Chen Yu <yu.c.chen@intel.com> Cc: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1438242695-23531-1-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-26Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fix from Thomas Gleixner: "A single fix for the intel cqm perf facility to prevent IPIs from interrupt context" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel/cqm: Return cached counter value from IRQ context
2015-07-26perf/x86/intel/cqm: Return cached counter value from IRQ contextMatt Fleming
Peter reported the following potential crash which I was able to reproduce with his test program, [ 148.765788] ------------[ cut here ]------------ [ 148.765796] WARNING: CPU: 34 PID: 2840 at kernel/smp.c:417 smp_call_function_many+0xb6/0x260() [ 148.765797] Modules linked in: [ 148.765800] CPU: 34 PID: 2840 Comm: perf Not tainted 4.2.0-rc1+ #4 [ 148.765803] ffffffff81cdc398 ffff88085f105950 ffffffff818bdfd5 0000000000000007 [ 148.765805] 0000000000000000 ffff88085f105990 ffffffff810e413a 0000000000000000 [ 148.765807] ffffffff82301080 0000000000000022 ffffffff8107f640 ffffffff8107f640 [ 148.765809] Call Trace: [ 148.765810] <NMI> [<ffffffff818bdfd5>] dump_stack+0x45/0x57 [ 148.765818] [<ffffffff810e413a>] warn_slowpath_common+0x8a/0xc0 [ 148.765822] [<ffffffff8107f640>] ? intel_cqm_stable+0x60/0x60 [ 148.765824] [<ffffffff8107f640>] ? intel_cqm_stable+0x60/0x60 [ 148.765825] [<ffffffff810e422a>] warn_slowpath_null+0x1a/0x20 [ 148.765827] [<ffffffff811613f6>] smp_call_function_many+0xb6/0x260 [ 148.765829] [<ffffffff8107f640>] ? intel_cqm_stable+0x60/0x60 [ 148.765831] [<ffffffff81161748>] on_each_cpu_mask+0x28/0x60 [ 148.765832] [<ffffffff8107f6ef>] intel_cqm_event_count+0x7f/0xe0 [ 148.765836] [<ffffffff811cdd35>] perf_output_read+0x2a5/0x400 [ 148.765839] [<ffffffff811d2e5a>] perf_output_sample+0x31a/0x590 [ 148.765840] [<ffffffff811d333d>] ? perf_prepare_sample+0x26d/0x380 [ 148.765841] [<ffffffff811d3497>] perf_event_output+0x47/0x60 [ 148.765843] [<ffffffff811d36c5>] __perf_event_overflow+0x215/0x240 [ 148.765844] [<ffffffff811d4124>] perf_event_overflow+0x14/0x20 [ 148.765847] [<ffffffff8107e7f4>] intel_pmu_handle_irq+0x1d4/0x440 [ 148.765849] [<ffffffff811d07a6>] ? __perf_event_task_sched_in+0x36/0xa0 [ 148.765853] [<ffffffff81219bad>] ? vunmap_page_range+0x19d/0x2f0 [ 148.765854] [<ffffffff81219d11>] ? unmap_kernel_range_noflush+0x11/0x20 [ 148.765859] [<ffffffff814ce6fe>] ? ghes_copy_tofrom_phys+0x11e/0x2a0 [ 148.765863] [<ffffffff8109e5db>] ? native_apic_msr_write+0x2b/0x30 [ 148.765865] [<ffffffff8109e44d>] ? x2apic_send_IPI_self+0x1d/0x20 [ 148.765869] [<ffffffff81065135>] ? arch_irq_work_raise+0x35/0x40 [ 148.765872] [<ffffffff811c8d86>] ? irq_work_queue+0x66/0x80 [ 148.765875] [<ffffffff81075306>] perf_event_nmi_handler+0x26/0x40 [ 148.765877] [<ffffffff81063ed9>] nmi_handle+0x79/0x100 [ 148.765879] [<ffffffff81064422>] default_do_nmi+0x42/0x100 [ 148.765880] [<ffffffff81064563>] do_nmi+0x83/0xb0 [ 148.765884] [<ffffffff818c7c0f>] end_repeat_nmi+0x1e/0x2e [ 148.765886] [<ffffffff811d07a6>] ? __perf_event_task_sched_in+0x36/0xa0 [ 148.765888] [<ffffffff811d07a6>] ? __perf_event_task_sched_in+0x36/0xa0 [ 148.765890] [<ffffffff811d07a6>] ? __perf_event_task_sched_in+0x36/0xa0 [ 148.765891] <<EOE>> [<ffffffff8110ab66>] finish_task_switch+0x156/0x210 [ 148.765898] [<ffffffff818c1671>] __schedule+0x341/0x920 [ 148.765899] [<ffffffff818c1c87>] schedule+0x37/0x80 [ 148.765903] [<ffffffff810ae1af>] ? do_page_fault+0x2f/0x80 [ 148.765905] [<ffffffff818c1f4a>] schedule_user+0x1a/0x50 [ 148.765907] [<ffffffff818c666c>] retint_careful+0x14/0x32 [ 148.765908] ---[ end trace e33ff2be78e14901 ]--- The CQM task events are not safe to be called from within interrupt context because they require performing an IPI to read the counter value on all sockets. And performing IPIs from within IRQ context is a "no-no". Make do with the last read counter value currently event in event->count when we're invoked in this context. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vikas Shivappa <vikas.shivappa@intel.com> Cc: Kanaka Juvva <kanaka.d.juvva@intel.com> Cc: Will Auld <will.auld@intel.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/1437490509-15373-1-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-21x86/fpu: Disable dependent CPU features on "noxsave"Jan Beulich
Complete the set of dependent features that need disabling at once: XSAVEC, AVX-512 and all currently known to the kernel extensions to it, as well as MPX need to be disabled too. Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/55ACC40D0200007800092E6C@mail.emea.novell.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-18Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Two families of fixes: - Fix an FPU context related boot crash on newer x86 hardware with larger context sizes than what most people test. To fix this without ugly kludges or extensive reverts we had to touch core task allocator, to allow x86 to determine the task size dynamically, at boot time. I've tested it on a number of x86 platforms, and I cross-built it to a handful of architectures: (warns) (warns) testing x86-64: -git: pass ( 0), -tip: pass ( 0) testing x86-32: -git: pass ( 0), -tip: pass ( 0) testing arm: -git: pass ( 1359), -tip: pass ( 1359) testing cris: -git: pass ( 1031), -tip: pass ( 1031) testing m32r: -git: pass ( 1135), -tip: pass ( 1135) testing m68k: -git: pass ( 1471), -tip: pass ( 1471) testing mips: -git: pass ( 1162), -tip: pass ( 1162) testing mn10300: -git: pass ( 1058), -tip: pass ( 1058) testing parisc: -git: pass ( 1846), -tip: pass ( 1846) testing sparc: -git: pass ( 1185), -tip: pass ( 1185) ... so I hope the cross-arch impact 'none', as intended. (by Dave Hansen) - Fix various NMI handling related bugs unearthed by the big asm code rewrite and generally make the NMI code more robust and more maintainable while at it. These changes are a bit late in the cycle, I hope they are still acceptable. (by Andy Lutomirski)" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/fpu, sched: Introduce CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT and use it on x86 x86/fpu, sched: Dynamically allocate 'struct fpu' x86/entry/64, x86/nmi/64: Add CONFIG_DEBUG_ENTRY NMI testing code x86/nmi/64: Make the "NMI executing" variable more consistent x86/nmi/64: Minor asm simplification x86/nmi/64: Use DF to avoid userspace RSP confusing nested NMI detection x86/nmi/64: Reorder nested NMI checks x86/nmi/64: Improve nested NMI comments x86/nmi/64: Switch stacks on userspace NMI entry x86/nmi/64: Remove asm code that saves CR2 x86/nmi: Enable nested do_nmi() handling for 64-bit kernels
2015-07-18x86/fpu, sched: Introduce CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT and use it ↵Ingo Molnar
on x86 Don't burden architectures without dynamic task_struct sizing with the overhead of dynamic sizing. Also optimize the x86 code a bit by caching task_struct_size. Acked-and-Tested-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave@sr71.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1437128892-9831-3-git-send-email-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-18x86/fpu, sched: Dynamically allocate 'struct fpu'Dave Hansen
The FPU rewrite removed the dynamic allocations of 'struct fpu'. But, this potentially wastes massive amounts of memory (2k per task on systems that do not have AVX-512 for instance). Instead of having a separate slab, this patch just appends the space that we need to the 'task_struct' which we dynamically allocate already. This saves from doing an extra slab allocation at fork(). The only real downside here is that we have to stick everything and the end of the task_struct. But, I think the BUILD_BUG_ON()s I stuck in there should keep that from being too fragile. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave@sr71.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1437128892-9831-2-git-send-email-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-17x86/nmi/64: Improve nested NMI commentsAndy Lutomirski
I found the nested NMI documentation to be difficult to follow. Improve the comments. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Cc: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-17x86/nmi: Enable nested do_nmi() handling for 64-bit kernelsAndy Lutomirski
32-bit kernels handle nested NMIs in C. Enable the exact same handling on 64-bit kernels as well. This isn't currently necessary, but it will become necessary once the asm code starts allowing limited nesting. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Cc: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-15genirq: Revert sparse irq locking around __cpu_up() and move it to x86 for nowThomas Gleixner
Boris reported that the sparse_irq protection around __cpu_up() in the generic code causes a regression on Xen. Xen allocates interrupts and some more in the xen_cpu_up() function, so it deadlocks on the sparse_irq_lock. There is no simple fix for this and we really should have the protection for all architectures, but for now the only solution is to move it to x86 where actual wreckage due to the lack of protection has been observed. Reported-and-tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Fixes: a89941816726 'hotplug: Prevent alloc/free of irq descriptors during cpu up/down' Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: xiao jin <jin.xiao@intel.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Borislav Petkov <bp@suse.de> Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com> Cc: xen-devel <xen-devel@lists.xenproject.org>
2015-07-07x86/irq: Retrieve irq data after locking irq_descThomas Gleixner
irq_data is protected by irq_desc->lock, so retrieving the irq chip from irq_data outside the lock is racy vs. an concurrent update. Move it into the lock held region. While at it add a comment why the vector walk does not require vector_lock. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: xiao jin <jin.xiao@intel.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Borislav Petkov <bp@suse.de> Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com> Link: http://lkml.kernel.org/r/20150705171102.331320612@linutronix.de
2015-07-07x86/irq: Use proper locking in check_irq_vectors_for_cpu_disable()Thomas Gleixner
It's unsafe to examine fields in the irq descriptor w/o holding the descriptor lock. Add proper locking. While at it add a comment why the vector check can run lock less Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: xiao jin <jin.xiao@intel.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Borislav Petkov <bp@suse.de> Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com> Link: http://lkml.kernel.org/r/20150705171102.236544164@linutronix.de
2015-07-07x86/irq: Plug irq vector hotplug raceThomas Gleixner
Jin debugged a nasty cpu hotplug race which results in leaking a irq vector on the newly hotplugged cpu. cpu N cpu M native_cpu_up device_shutdown do_boot_cpu free_msi_irqs start_secondary arch_teardown_msi_irqs smp_callin default_teardown_msi_irqs setup_vector_irq arch_teardown_msi_irq __setup_vector_irq native_teardown_msi_irq lock(vector_lock) destroy_irq install vectors unlock(vector_lock) lock(vector_lock) ---> __clear_irq_vector unlock(vector_lock) lock(vector_lock) set_cpu_online unlock(vector_lock) This leaves the irq vector(s) which are torn down on CPU M stale in the vector array of CPU N, because CPU M does not see CPU N online yet. There is a similar issue with concurrent newly setup interrupts. The alloc/free protection of irq descriptors does not prevent the above race, because it merily prevents interrupt descriptors from going away or changing concurrently. Prevent this by moving the call to setup_vector_irq() into the vector_lock held region which protects set_cpu_online(): cpu N cpu M native_cpu_up device_shutdown do_boot_cpu free_msi_irqs start_secondary arch_teardown_msi_irqs smp_callin default_teardown_msi_irqs lock(vector_lock) arch_teardown_msi_irq setup_vector_irq() __setup_vector_irq native_teardown_msi_irq install vectors destroy_irq set_cpu_online unlock(vector_lock) lock(vector_lock) __clear_irq_vector unlock(vector_lock) So cpu M either sees the cpu N online before clearing the vector or cpu N installs the vectors after cpu M has cleared it. Reported-by: xiao jin <jin.xiao@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Joerg Roedel <jroedel@suse.de> Cc: Borislav Petkov <bp@suse.de> Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com> Link: http://lkml.kernel.org/r/20150705171102.141898931@linutronix.de
2015-07-06x86/earlyprintk: Allow early_printk() to use console style parameters like ↵Steven Rostedt
'115200n8' When I enable early_printk on a kernel, I cut and paste the console= input and add to earlyprintk parameter. But I notice recently that ktest has not been detecting triple faults. The way it detects it, is by seeing the kernel banner "Linux version .." with a different kernel version pop up. Then I noticed that early printk was no longer working on my console, which was why ktest was not seeing it. I bisected it down and it was added to 4.0 with this commit: ea9e9d802902 ("Specify PCI based UART for earlyprintk") because it converted the simple_strtoul() that converts the baud number into a kstrtoul(). The problem with this is, I had as my baud rate, 115200n8 (acceptable for console=ttyS0), but because of the "n8", the kstrtoul() doesn't parse the baud rate and returns an error, which sets the baud rate to the default 9600. This explains the garbage on my screen. Now, earlyprintk= kernel parameter does not say it accepts that format. Thus, one answer would simply be me changing my kernel parameters to remove the "n8" since it isn't parsed anyway. But I wonder if other people run into this, and it seems strange that the two consoles for serial accepts different input. I could also extend this to have earlyprintk do something with that "n8" or whatever it has and have it match the console parsing (which, BTW, still uses simple_strtoul(), as I guess it has to). This patch just makes my old kernel parameter parsing work like it use to. Although, simple_strtoull() is considered obsolete, it is the only standard string parsing function that parses a number that is attached to text. Ironically, commit ea9e9d802902 also added several calls to simple_strtoul()! Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: Alan Cox <alan@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Cohen <david.a.cohen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stuart R. Anderson <stuart.r.anderson@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150706101434.5f6a351b@gandalf.local.home [ Cleaned it up a bit. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06x86/espfix: Init espfix on the boot CPU sideZhu Guihua
As we alloc pages with GFP_KERNEL in init_espfix_ap() which is called before we enable local irqs, so the lockdep sub-system would (correctly) trigger a warning about the potentially blocking API. So we allocate them on the boot CPU side when the secondary CPU is brought up by the boot CPU, and hand them over to the secondary CPU. And we use alloc_pages_node() with the secondary CPU's node, to make sure the espfix stack is NUMA-local to the CPU that is going to use it. Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com> Cc: <bp@alien8.de> Cc: <luto@amacapital.net> Cc: <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/c97add2670e9abebb90095369f0cfc172373ac94.1435824469.git.zhugh.fnst@cn.fujitsu.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06x86/espfix: Add 'cpu' parameter to init_espfix_ap()Zhu Guihua
Add a CPU index parameter to init_espfix_ap(), so that the parameter could be propagated to the function for espfix page allocation. Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com> Cc: <bp@alien8.de> Cc: <luto@amacapital.net> Cc: <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/cde3fcf1b3211f3f03feb1a995bce3fee850f0fc.1435824469.git.zhugh.fnst@cn.fujitsu.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06x86/kasan: Fix KASAN shadow region page tablesAlexander Popov
Currently KASAN shadow region page tables created without respect of physical offset (phys_base). This causes kernel halt when phys_base is not zero. So let's initialize KASAN shadow region page tables in kasan_early_init() using __pa_nodebug() which considers phys_base. This patch also separates x86_64_start_kernel() from KASAN low level details by moving kasan_map_early_shadow(init_level4_pgt) into kasan_early_init(). Remove the comment before clear_bss() which stopped bringing much profit to the code readability. Otherwise describing all the new order dependencies would be too verbose. Signed-off-by: Alexander Popov <alpopov@ptsecurity.com> Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com> Cc: <stable@vger.kernel.org> # 4.0+ Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <adech.fo@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1435828178-10975-3-git-send-email-a.ryabinin@samsung.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06x86/init: Clear 'init_level4_pgt' earlierAndrey Ryabinin
Currently x86_64_start_kernel() has two KASAN related function calls. The first call maps shadow to early_level4_pgt, the second maps shadow to init_level4_pgt. If we move clear_page(init_level4_pgt) earlier, we could hide KASAN low level detail from generic x86_64 initialization code. The next patch will do it. Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com> Cc: <stable@vger.kernel.org> # 4.0+ Cc: Alexander Popov <alpopov@ptsecurity.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <adech.fo@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1435828178-10975-2-git-send-email-a.ryabinin@samsung.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06x86/tsc: Let high latency PIT fail fast in quick_pit_calibrate()Adrian Hunter
If it takes longer than 12us to read the PIT counter lsb/msb, then the error margin will never fall below 500ppm within 50ms, and Fast TSC calibration will always fail. This patch detects when that will happen and fails fast. Note the failure message is not printed in that case because: 1. it will always happen on that class of hardware 2. the absence of the message is more informative than its presence Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Len Brown <lenb@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/556EB717.9070607@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-04Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Two FPU rewrite related fixes. This addresses all known x86 regressions at this stage. Also some other misc fixes" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/fpu: Fix boot crash in the early FPU code x86/asm/entry/64: Update path names x86/fpu: Fix FPU related boot regression when CPUID masking BIOS feature is enabled x86/boot/setup: Clean up the e820_reserve_setup_data() code x86/kaslr: Fix typo in the KASLR_FLAG documentation
2015-07-04Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "This tree includes an x86 PMU scheduling fix, but most changes are late breaking tooling fixes and updates: User visible fixes: - Create config.detected into OUTPUT directory, fixing parallel builds sharing the same source directory (Aaro Kiskinen) - Allow to specify custom linker command, fixing some MIPS64 builds. (Aaro Kiskinen) - Fix to show proper convergence stats in 'perf bench numa' (Srikar Dronamraju) User visible changes: - Validate syscall list passed via -e argument to 'perf trace'. (Arnaldo Carvalho de Melo) - Introduce 'perf stat --per-thread' (Jiri Olsa) - Check access permission for --kallsyms and --vmlinux (Li Zhang) - Move toggling event logic from 'perf top' and into hists browser, allowing freeze/unfreeze with event lists with more than one entry (Namhyung Kim) - Add missing newlines when dumping PERF_RECORD_FINISHED_ROUND and showing the Aggregated stats in 'perf report -D' (Adrian Hunter) Infrastructure fixes: - Add missing break for PERF_RECORD_ITRACE_START, which caused those events samples to be parsed as well as PERF_RECORD_LOST_SAMPLES. ITRACE_START only appears when Intel PT or BTS are present, so.. (Jiri Olsa) - Call the perf_session destructor when bailing out in the inject, kmem, report, kvm and mem tools (Taeung Song) Infrastructure changes: - Move stuff out of 'perf stat' and into the lib for further use (Jiri Olsa) - Reference count the cpu_map and thread_map classes (Jiri Olsa) - Set evsel->{cpus,threads} from the evlist, if not set, allowing the generalization of some 'perf stat' functions that previously were accessing private static evlist variable (Jiri Olsa) - Delete an unnecessary check before the calling free_event_desc() (Markus Elfring) - Allow auxtrace data alignment (Adrian Hunter) - Allow events with dot (Andi Kleen) - Fix failure to 'perf probe' events on arm (He Kuang) - Add testing for Makefile.perf (Jiri Olsa) - Add test for make install with prefix (Jiri Olsa) - Fix single target build dependency check (Jiri Olsa) - Access thread_map entries via accessors, prep patch to hold more info per entry, for ongoing 'perf stat --per-thread' work (Jiri Olsa) - Use __weak definition from compiler.h (Sukadev Bhattiprolu) - Split perf_pmu__new_alias() (Sukadev Bhattiprolu)" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits) perf tools: Allow to specify custom linker command perf tools: Create config.detected into OUTPUT directory perf mem: Fill in the missing session freeing after an error occurs perf kvm: Fill in the missing session freeing after an error occurs perf report: Fill in the missing session freeing after an error occurs perf kmem: Fill in the missing session freeing after an error occurs perf inject: Fill in the missing session freeing after an error occurs perf tools: Add missing break for PERF_RECORD_ITRACE_START perf/x86: Fix 'active_events' imbalance perf symbols: Check access permission when reading symbol files perf stat: Introduce --per-thread option perf stat: Introduce print_counters function perf stat: Using init_stats instead of memset perf stat: Rename print_interval to process_interval perf stat: Remove perf_evsel__read_cb function perf stat: Move perf_stat initialization counter process code perf stat: Move zero_per_pkg into counter process code perf stat: Separate counters reading and processing perf stat: Introduce read_counters function perf stat: Introduce perf_evsel__read function ...
2015-07-04x86/fpu: Fix boot crash in the early FPU codeIngo Molnar
Jan Kara and Thomas Gleixner reported boot crashes in the FPU code: general protection fault: 0000 [#1] SMP RIP: 0010:[<ffffffff81048a6c>] [<ffffffff81048a6c>] mxcsr_feature_mask_init+0x1c/0x40 2b:* 0f ae 85 00 fe ff ff fxsave -0x200(%rbp) and bisected it down to the following FPU commit: 91a8c2a5b43f ("x86/fpu: Clean up and fix MXCSR handling") The reason is that the on-stack FPU registers state variable, used by the FXSAVE instruction, did not have the required minimum alignment of 16 bytes, causing the general protection fault. This is most likely a GCC bug in older GCC versions, but the offending commit also added a bogus extra 32-byte alignment (which GCC ignored too). So fix this bug by making the variable static again, but also mark it __initdata this time, because fpu__init_system_mxcsr() is now an __init function. Reported-and-bisected-by: Jan Kara <jack@suse.cz> Reported-bisected-and-tested-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jan Kara <jack@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150704075819.GA9201@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-02Merge tag 'module-misc-v4.1-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux Pull init.h/module.h fragility fixes from Paul Gortmaker: "Fixup various init.h misuses that are fragile wrt code moving to module.h What started as a removal of no longer required include <linux/init.h> due to the earlier __cpuinit and __devinit removal led to the observation that some module specfic support was living in init.h itself, thus preventing the full removal from introducing compile regressions. This series includes a few final fixups needed prior to the relocation of the modular init code from <init.h> to <module.h>. These are things that weren't easily categorized into any of the other previous series categories already requested for pull. That said, each fixup branch (including this one) is independent and there are no ordering constraints. Only the final code relocation (which is NOT in this pull) requires that all my cleanup branches be merged first" * tag 'module-misc-v4.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: tile: add init.h to usb.c to avoid compile failure arm: fix implicit #include <linux/init.h> in entry asm. x86: replace __init_or_module with __init in non-modular vsmp_64.c
2015-07-02Merge tag 'module_init-alternate_initcall-v4.1-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux Pull module_init replacement part two from Paul Gortmaker: "Replace module_init with appropriate alternate initcall in non modules. This series converts non-modular code that is using the module_init() call to hook itself into the system to instead use one of our alternate priority initcalls. Unlike the previous series that used device_initcall and hence was a runtime no-op, these commits change to one of the alternate initcalls, because (a) we have them and (b) it seems like the right thing to do. For example, it would seem logical to use arch_initcall for arch specific setup code and fs_initcall for filesystem setup code. This does mean however, that changes in the init ordering will be taking place, and so there is a small risk that some kind of implicit init ordering issue may lie uncovered. But I think it is still better to give these ones sensible priorities than to just assign them all to device_initcall in order to exactly preserve the old ordering. Thad said, we have already made similar changes in core kernel code in commit c96d6660dc65 ("kernel: audit/fix non-modular users of module_init in core code") without any regressions reported, so this type of change isn't without precedent. It has also got the same local testing and linux-next coverage as all the other pull requests that I'm sending for this merge window have got. Once again, there is an unused module_exit function removal that shows up as an outlier upon casual inspection of the diffstat" * tag 'module_init-alternate_initcall-v4.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: x86: perf_event_intel_pt.c: use arch_initcall to hook in enabling x86: perf_event_intel_bts.c: use arch_initcall to hook in enabling mm/page_owner.c: use late_initcall to hook in enabling lib/list_sort: use late_initcall to hook in self tests arm: use subsys_initcall in non-modular pl320 IPC code powerpc: don't use module_init for non-modular core hugetlb code powerpc: use subsys_initcall for Freescale Local Bus x86: don't use module_init for non-modular core bootflag code netfilter: don't use module_init/exit in core IPV4 code fs/notify: don't use module_init for non-modular inotify_user code mm: replace module_init usages with subsys_initcall in nommu.c
2015-07-02Merge tag 'module_init-device_initcall-v4.1-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux Pull module_init replacement part one from Paul Gortmaker: "Replace module_init with equivalent device_initcall in non modules. This series of commits converts non-modular code that is using the module_init() call to hook itself into the system to instead use device_initcall(). The conversion is a runtime no-op, since module_init actually becomes __initcall in the non-modular case, and that in turn gets mapped onto device_initcall. A couple files show a larger negative diffstat, representing ones that had a module_exit function that we remove here vs previously relying on the linker to dispose of it. We make this conversion now, so that we can relocate module_init from init.h into module.h in the future. The files changed here are just limited to those that would otherwise have to add module.h to obviously non-modular code, in order to avoid a compile fail, as testing has shown" * tag 'module_init-device_initcall-v4.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: MIPS: don't use module_init in non-modular cobalt/mtd.c file drivers/leds: don't use module_init in non-modular leds-cobalt-raq.c cris: don't use module_init for non-modular core eeprom.c code tty/metag_da: Avoid module_init/module_exit in non-modular code drivers/clk: don't use module_init in clk-nomadik.c which is non-modular xtensa: don't use module_init for non-modular core network.c code sh: don't use module_init in non-modular psw.c code mn10300: don't use module_init in non-modular flash.c code parisc64: don't use module_init for non-modular core perf code parisc: don't use module_init for non-modular core pdc_cons code cris: don't use module_init for non-modular core intmem.c code ia64: don't use module_init in non-modular sim/simscsi.c code ia64: don't use module_init for non-modular core kernel/mca.c code arm: don't use module_init in non-modular mach-vexpress/spc.c code powerpc: don't use module_init in non-modular 83xx suspend code powerpc: use device_initcall for registering rtc devices x86: don't use module_init in non-modular devicetree.c code x86: don't use module_init in non-modular intel_mid_vrtc.c
2015-06-30x86: opt into HAVE_COPY_THREAD_TLS, for both 32-bit and 64-bitJosh Triplett
For 32-bit userspace on a 64-bit kernel, this requires modifying stub32_clone to actually swap the appropriate arguments to match CONFIG_CLONE_BACKWARDS, rather than just leaving the C argument for tls broken. Patch co-authored by Josh Triplett and Thiago Macieira. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thiago Macieira <thiago.macieira@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-30x86/kexec: prepend elfcorehdr instead of appending it to the crash-kernel ↵KarimAllah Ahmed
command-line. Any parameter passed after '--' in the kernel command-line will not be parsed by the kernel at all, instead it will be passed directly to init process. Currently the kernel appends elfcorehdr=<paddr> to the cmdline passed from kexec load, and if this command-line is used to pass parameters to init process this means that 'elfcorehdr' will not be parsed as a kernel parameter at all which will be a problem for vmcore subsystem since it will know nothing about the location of the ELF structure! Prepending 'elfcorehdr' instead of appending it fixes this problem since it ensures that it always comes before '--' and so it's always parsed as a kernel command-line parameter. Even with this patch things can still go wrong if 'CONFIG_CMDLINE' was also used to embedd a command-line to the crash dump kernel and this command-line contains '--' since the current behavior of the kernel is to actually append the boot loader command-line to the embedded command-line. Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Cc: Haren Myneni <hbabu@us.ibm.com> Cc: Eric Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-30perf/x86: Fix 'active_events' imbalancePeter Zijlstra
Commit 1b7b938f1817 ("perf/x86/intel: Fix PMI handling for Intel PT") conditionally increments active_events in x86_add_exclusive() but unconditionally decrements in x86_del_exclusive(). These extra decrements can lead to the situation where active_events is zero and thus the PMI handler is 'disabled' while we have active events on the PMU generating PMIs. This leads to a truckload of: Uhhuh. NMI received for unknown reason 21 on CPU 28. Do you have a strange power saving mode enabled? Dazed and confused, but trying to continue messages and generally messes up perf. Remove the condition on the increment, double increment balanced by a double decrement is perfectly fine. Restructure the code a little bit to make the unconditional inc a bit more natural. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: alexander.shishkin@linux.intel.com Cc: brgerst@gmail.com Cc: dvlasenk@redhat.com Cc: luto@amacapital.net Cc: oleg@redhat.com Fixes: 1b7b938f1817 ("perf/x86/intel: Fix PMI handling for Intel PT") Link: http://lkml.kernel.org/r/20150624144750.GJ18673@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-30Merge branch 'x86/boot' into x86/urgentIngo Molnar
Merge branch that got ready. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-30x86/fpu: Fix FPU related boot regression when CPUID masking BIOS feature is ↵Ingo Molnar
enabled Mike Galbraith reported: " My i7-4790 box is having one hell of a time with this merge window, dead in the water. BIOS setting "Limit CPUID Maximum" upsets new fpu code mightily. " It turns out that Linux does a double workaround here, as per: 066941bd4eeb ("x86: unmask CPUID levels on Intel CPUs") it undoes the BIOS workaround - but as a side effect the CPUID state is not completely constant during early init anymore, and the new FPU init code did not take this into account. So what happened is that the xstate init code did not have full CPUID available, which broke subsequent attempts to use xstate features. Fix this by ordering the early FPU init code to after we've stabilized the CPUID state. Reported-bisected-and-tested-by: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150627082514.GA10894@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-29Merge tag 'libnvdimm-for-4.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm Pull libnvdimm subsystem from Dan Williams: "The libnvdimm sub-system introduces, in addition to the libnvdimm-core, 4 drivers / enabling modules: NFIT: Instantiates an "nvdimm bus" with the core and registers memory devices (NVDIMMs) enumerated by the ACPI 6.0 NFIT (NVDIMM Firmware Interface table). After registering NVDIMMs the NFIT driver then registers "region" devices. A libnvdimm-region defines an access mode and the boundaries of persistent memory media. A region may span multiple NVDIMMs that are interleaved by the hardware memory controller. In turn, a libnvdimm-region can be carved into a "namespace" device and bound to the PMEM or BLK driver which will attach a Linux block device (disk) interface to the memory. PMEM: Initially merged in v4.1 this driver for contiguous spans of persistent memory address ranges is re-worked to drive PMEM-namespaces emitted by the libnvdimm-core. In this update the PMEM driver, on x86, gains the ability to assert that writes to persistent memory have been flushed all the way through the caches and buffers in the platform to persistent media. See memcpy_to_pmem() and wmb_pmem(). BLK: This new driver enables access to persistent memory media through "Block Data Windows" as defined by the NFIT. The primary difference of this driver to PMEM is that only a small window of persistent memory is mapped into system address space at any given point in time. Per-NVDIMM windows are reprogrammed at run time, per-I/O, to access different portions of the media. BLK-mode, by definition, does not support DAX. BTT: This is a library, optionally consumed by either PMEM or BLK, that converts a byte-accessible namespace into a disk with atomic sector update semantics (prevents sector tearing on crash or power loss). The sinister aspect of sector tearing is that most applications do not know they have a atomic sector dependency. At least today's disk's rarely ever tear sectors and if they do one almost certainly gets a CRC error on access. NVDIMMs will always tear and always silently. Until an application is audited to be robust in the presence of sector-tearing the usage of BTT is recommended. Thanks to: Ross Zwisler, Jeff Moyer, Vishal Verma, Christoph Hellwig, Ingo Molnar, Neil Brown, Boaz Harrosh, Robert Elliott, Matthew Wilcox, Andy Rudoff, Linda Knippers, Toshi Kani, Nicholas Moulin, Rafael Wysocki, and Bob Moore" * tag 'libnvdimm-for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm: (33 commits) arch, x86: pmem api for ensuring durability of persistent memory updates libnvdimm: Add sysfs numa_node to NVDIMM devices libnvdimm: Set numa_node to NVDIMM devices acpi: Add acpi_map_pxm_to_online_node() libnvdimm, nfit: handle unarmed dimms, mark namespaces read-only pmem: flag pmem block devices as non-rotational libnvdimm: enable iostat pmem: make_request cleanups libnvdimm, pmem: fix up max_hw_sectors libnvdimm, blk: add support for blk integrity libnvdimm, btt: add support for blk integrity fs/block_dev.c: skip rw_page if bdev has integrity libnvdimm: Non-Volatile Devices tools/testing/nvdimm: libnvdimm unit test infrastructure libnvdimm, nfit, nd_blk: driver for BLK-mode access persistent memory nd_btt: atomic sector updates libnvdimm: infrastructure for btt devices libnvdimm: write blk label set libnvdimm: write pmem label set libnvdimm: blk labels and namespace instantiation ...
2015-06-26Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds
Pull drm updates from Dave Airlie: "This is the main drm pull request for v4.2. I've one other new driver from freescale on my radar, it's been posted and reviewed, I'd just like to get someone to give it a last look, so maybe I'll send it or maybe I'll leave it. There is no major nouveau changes in here, Ben was working on something big, and we agreed it was a bit late, there wasn't anything else he considered urgent to merge. There might be another msm pull for some bits that are waiting on arm-soc, I'll see how we time it. This touches some "of" stuff, acks are in place except for the fixes to the build in various configs,t hat I just applied. Summary: New drivers: - virtio-gpu: KMS only pieces of driver for virtio-gpu in qemu. This is just the first part of this driver, enough to run unaccelerated userspace on. As qemu merges more we'll start adding the 3D features for the virgl 3d work. - amdgpu: a new driver from AMD to driver their newer GPUs. (VI+) It contains a new cleaner userspace API, and is a clean break from radeon moving forward, that AMD are going to concentrate on. It also contains a set of register headers auto generated from AMD internal database. core: - atomic modesetting API completed, enabled by default now. - Add support for mode_id blob to atomic ioctl to complete interface. - bunch of Displayport MST fixes - lots of misc fixes. panel: - new simple panels - fix some long-standing build issues with bridge drivers radeon: - VCE1 support - add a GPU reset counter for userspace - lots of fixes. amdkfd: - H/W debugger support module - static user-mode queues - support killing all the waves when a process terminates - use standard DECLARE_BITMAP i915: - Add Broxton support - S3, rotation support for Skylake - RPS booting tuning - CPT modeset sequence fixes - ns2501 dither support - enable cmd parser on haswell - cdclk handling fixes - gen8 dynamic pte allocation - lots of atomic conversion work exynos: - Add atomic modesetting support - Add iommu support - Consolidate drm driver initialization - and MIC, DECON and MIPI-DSI support for exynos5433 omapdrm: - atomic modesetting support (fixes lots of things in rewrite) tegra: - DP aux transaction fixes - iommu support fix msm: - adreno a306 support - various dsi bits - various 64-bit fixes - NV12MT support rcar-du: - atomic and misc fixes sti: - fix HDMI timing complaince tilcdc: - use drm component API to access tda998x driver - fix module unloading qxl: - stability fixes" * 'drm-next' of git://people.freedesktop.org/~airlied/linux: (872 commits) drm/nouveau: Pause between setting gpu to D3hot and cutting the power drm/dp/mst: close deadlock in connector destruction. drm: Always enable atomic API drm/vgem: Set unique to "vgem" of: fix a build error to of_graph_get_endpoint_by_regs function drm/dp/mst: take lock around looking up the branch device on hpd irq drm/dp/mst: make sure mst_primary mstb is valid in work function of: add EXPORT_SYMBOL for of_graph_get_endpoint_by_regs ARM: dts: rename the clock of MIPI DSI 'pll_clk' to 'sclk_mipi' drm/atomic: Don't set crtc_state->enable manually drm/exynos: dsi: do not set TE GPIO direction by input drm/exynos: dsi: add support for MIC driver as a bridge drm/exynos: dsi: add support for Exynos5433 drm/exynos: dsi: make use of array for clock access drm/exynos: dsi: make use of driver data for static values drm/exynos: dsi: add macros for register access drm/exynos: dsi: rename pll_clk to sclk_clk drm/exynos: mic: add MIC driver of: add helper for getting endpoint node of specific identifiers drm/exynos: add Exynos5433 decon driver ...
2015-06-26libnvdimm: Set numa_node to NVDIMM devicesToshi Kani
ACPI NFIT table has System Physical Address Range Structure entries that describe a proximity ID of each range when ACPI_NFIT_PROXIMITY_VALID is set in the flags. Change acpi_nfit_register_region() to map a proximity ID to its node ID, and set it to a new numa_node field of nd_region_desc, which is then conveyed to the nd_region device. The device core arranges for btt and namespace devices to inherit their node from their parent region. Signed-off-by: Toshi Kani <toshi.kani@hp.com> [djbw: move set_dev_node() from region.c to bus.c] Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-24libnvdimm, pmem: add libnvdimm support to the pmem driverDan Williams
nd_pmem attaches to persistent memory regions and namespaces emitted by the libnvdimm subsystem, and, same as the original pmem driver, presents the system-physical-address range as a block device. The existing e820-type-12 to pmem setup is converted to an nvdimm_bus that emits an nd_namespace_io device. Note that the X in 'pmemX' is now derived from the parent region. This provides some stability to the pmem devices names from boot-to-boot. The minor numbers are also more predictable by passing 0 to alloc_disk(). Cc: Andy Lutomirski <luto@amacapital.net> Cc: Boaz Harrosh <boaz@plexistor.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jens Axboe <axboe@fb.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Acked-by: Christoph Hellwig <hch@lst.de> Tested-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-24x86, mirror: x86 enabling - find mirrored memory rangesTony Luck
UEFI GetMemoryMap() uses a new attribute bit to mark mirrored memory address ranges. See UEFI 2.5 spec pages 157-158: http://www.uefi.org/sites/default/files/resources/UEFI%202_5.pdf On EFI enabled systems scan the memory map and tell memblock about any mirrored ranges. Signed-off-by: Tony Luck <tony.luck@intel.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Xiexiuqi <xiexiuqi@huawei.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24mm/memblock: add extra "flags" to memblock to allow selection of memory ↵Tony Luck
based on attribute Some high end Intel Xeon systems report uncorrectable memory errors as a recoverable machine check. Linux has included code for some time to process these and just signal the affected processes (or even recover completely if the error was in a read only page that can be replaced by reading from disk). But we have no recovery path for errors encountered during kernel code execution. Except for some very specific cases were are unlikely to ever be able to recover. Enter memory mirroring. Actually 3rd generation of memory mirroing. Gen1: All memory is mirrored Pro: No s/w enabling - h/w just gets good data from other side of the mirror Con: Halves effective memory capacity available to OS/applications Gen2: Partial memory mirror - just mirror memory begind some memory controllers Pro: Keep more of the capacity Con: Nightmare to enable. Have to choose between allocating from mirrored memory for safety vs. NUMA local memory for performance Gen3: Address range partial memory mirror - some mirror on each memory controller Pro: Can tune the amount of mirror and keep NUMA performance Con: I have to write memory management code to implement The current plan is just to use mirrored memory for kernel allocations. This has been broken into two phases: 1) This patch series - find the mirrored memory, use it for boot time allocations 2) Wade into mm/page_alloc.c and define a ZONE_MIRROR to pick up the unused mirrored memory from mm/memblock.c and only give it out to select kernel allocations (this is still being scoped because page_alloc.c is scary). This patch (of 3): Add extra "flags" to memblock to allow selection of memory based on attribute. No functional changes Signed-off-by: Tony Luck <tony.luck@intel.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Xiexiuqi <xiexiuqi@huawei.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull first batch of KVM updates from Paolo Bonzini: "The bulk of the changes here is for x86. And for once it's not for silicon that no one owns: these are really new features for everyone. Details: - ARM: several features are in progress but missed the 4.2 deadline. So here is just a smattering of bug fixes, plus enabling the VFIO integration. - s390: Some fixes/refactorings/optimizations, plus support for 2GB pages. - x86: * host and guest support for marking kvmclock as a stable scheduler clock. * support for write combining. * support for system management mode, needed for secure boot in guests. * a bunch of cleanups required for the above * support for virtualized performance counters on AMD * legacy PCI device assignment is deprecated and defaults to "n" in Kconfig; VFIO replaces it On top of this there are also bug fixes and eager FPU context loading for FPU-heavy guests. - Common code: Support for multiple address spaces; for now it is used only for x86 SMM but the s390 folks also have plans" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (124 commits) KVM: s390: clear floating interrupt bitmap and parameters KVM: x86/vPMU: Enable PMU handling for AMD PERFCTRn and EVNTSELn MSRs KVM: x86/vPMU: Implement AMD vPMU code for KVM KVM: x86/vPMU: Define kvm_pmu_ops to support vPMU function dispatch KVM: x86/vPMU: introduce kvm_pmu_msr_idx_to_pmc KVM: x86/vPMU: reorder PMU functions KVM: x86/vPMU: whitespace and stylistic adjustments in PMU code KVM: x86/vPMU: use the new macros to go between PMC, PMU and VCPU KVM: x86/vPMU: introduce pmu.h header KVM: x86/vPMU: rename a few PMU functions KVM: MTRR: do not map huge page for non-consistent range KVM: MTRR: simplify kvm_mtrr_get_guest_memory_type KVM: MTRR: introduce mtrr_for_each_mem_type KVM: MTRR: introduce fixed_mtrr_addr_* functions KVM: MTRR: sort variable MTRRs KVM: MTRR: introduce var_mtrr_range KVM: MTRR: introduce fixed_mtrr_segment table KVM: MTRR: improve kvm_mtrr_get_guest_memory_type KVM: MTRR: do not split 64 bits MSR content KVM: MTRR: clean up mtrr default type ...
2015-06-23Merge tag 'pm+acpi-4.2-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management and ACPI updates from Rafael Wysocki: "The rework of backlight interface selection API from Hans de Goede stands out from the number of commits and the number of affected places perspective. The cpufreq core fixes from Viresh Kumar are quite significant too as far as the number of commits goes and because they should reduce CPU online/offline overhead quite a bit in the majority of cases. From the new featues point of view, the ACPICA update (to upstream revision 20150515) adding support for new ACPI 6 material to ACPICA is the one that matters the most as some new significant features will be based on it going forward. Also included is an update of the ACPI device power management core to follow ACPI 6 (which in turn reflects the Windows' device PM implementation), a PM core extension to support wakeup interrupts in a more generic way and support for the ACPI _CCA device configuration object. The rest is mostly fixes and cleanups all over and some documentation updates, including new DT bindings for Operating Performance Points. There is one fix for a regression introduced in the 4.1 cycle, but it adds quite a number of lines of code, it wasn't really ready before Thursday and you were on vacation, so I refrained from pushing it on the last minute for 4.1. Specifics: - ACPICA update to upstream revision 20150515 including basic support for ACPI 6 features: new ACPI tables introduced by ACPI 6 (STAO, XENV, WPBT, NFIT, IORT), changes related to the other tables (DTRM, FADT, LPIT, MADT), new predefined names (_BTH, _CR3, _DSD, _LPI, _MTL, _PRR, _RDI, _RST, _TFP, _TSN), fixes and cleanups (Bob Moore, Lv Zheng). - ACPI device power management core code update to follow ACPI 6 which reflects the ACPI device power management implementation in Windows (Rafael J Wysocki). - rework of the backlight interface selection logic to reduce the number of kernel command line options and improve the handling of DMI quirks that may be involved in that and to make the code generally more straightforward (Hans de Goede). - fixes for the ACPI Embedded Controller (EC) driver related to the handling of EC transactions (Lv Zheng). - fix for a regression related to the ACPI resources management and resulting from a recent change of ACPI initialization code ordering (Rafael J Wysocki). - fix for a system initialization regression related to ACPI introduced during the 3.14 cycle and caused by running the code that switches the platform over to the ACPI mode too early in the initialization sequence (Rafael J Wysocki). - support for the ACPI _CCA device configuration object related to DMA cache coherence (Suravee Suthikulpanit). - ACPI/APEI fixes and cleanups (Jiri Kosina, Borislav Petkov). - ACPI battery driver cleanups (Luis Henriques, Mathias Krause). - ACPI processor driver cleanups (Hanjun Guo). - cleanups and documentation update related to the ACPI device properties interface based on _DSD (Rafael J Wysocki). - ACPI device power management fixes (Rafael J Wysocki). - assorted cleanups related to ACPI (Dominik Brodowski, Fabian Frederick, Lorenzo Pieralisi, Mathias Krause, Rafael J Wysocki). - fix for a long-standing issue causing General Protection Faults to be generated occasionally on return to user space after resume from ACPI-based suspend-to-RAM on 32-bit x86 (Ingo Molnar). - fix to make the suspend core code return -EBUSY consistently in all cases when system suspend is aborted due to wakeup detection (Ruchi Kandoi). - support for automated device wakeup IRQ handling allowing drivers to make their PM support more starightforward (Tony Lindgren). - new tracepoints for suspend-to-idle tracing and rework of the prepare/complete callbacks tracing in the PM core (Todd E Brandt, Rafael J Wysocki). - wakeup sources framework enhancements (Jin Qian). - new macro for noirq system PM callbacks (Grygorii Strashko). - assorted cleanups related to system suspend (Rafael J Wysocki). - cpuidle core cleanups to make the code more efficient (Rafael J Wysocki). - powernv/pseries cpuidle driver update (Shilpasri G Bhat). - cpufreq core fixes related to CPU online/offline that should reduce the overhead of these operations quite a bit, unless the CPU in question is physically going away (Viresh Kumar, Saravana Kannan). - serialization of cpufreq governor callbacks to avoid race conditions in some cases (Viresh Kumar). - intel_pstate driver fixes and cleanups (Doug Smythies, Prarit Bhargava, Joe Konno). - cpufreq driver (arm_big_little, cpufreq-dt, qoriq) updates (Sudeep Holla, Felipe Balbi, Tang Yuantian). - assorted cleanups in cpufreq drivers and core (Shailendra Verma, Fabian Frederick, Wang Long). - new Device Tree bindings for representing Operating Performance Points (Viresh Kumar). - updates for the common clock operations support code in the PM core (Rajendra Nayak, Geert Uytterhoeven). - PM domains core code update (Geert Uytterhoeven). - Intel Knights Landing support for the RAPL (Running Average Power Limit) power capping driver (Dasaratharaman Chandramouli). - fixes related to the floor frequency setting on Atom SoCs in the RAPL power capping driver (Ajay Thomas). - runtime PM framework documentation update (Ben Dooks). - cpupower tool fix (Herton R Krzesinski)" * tag 'pm+acpi-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (194 commits) cpuidle: powernv/pseries: Auto-promotion of snooze to deeper idle state x86: Load __USER_DS into DS/ES after resume PM / OPP: Add binding for 'opp-suspend' PM / OPP: Allow multiple OPP tables to be passed via DT PM / OPP: Add new bindings to address shortcomings of existing bindings ACPI: Constify ACPI device IDs in documentation ACPI / enumeration: Document the rules regarding the PRP0001 device ID ACPI / video: Make acpi_video_unregister_backlight() private acpi-video-detect: Remove old API toshiba-acpi: Port to new backlight interface selection API thinkpad-acpi: Port to new backlight interface selection API sony-laptop: Port to new backlight interface selection API samsung-laptop: Port to new backlight interface selection API msi-wmi: Port to new backlight interface selection API msi-laptop: Port to new backlight interface selection API intel-oaktrail: Port to new backlight interface selection API ideapad-laptop: Port to new backlight interface selection API fujitsu-laptop: Port to new backlight interface selection API eeepc-laptop: Port to new backlight interface selection API dell-wmi: Port to new backlight interface selection API ...
2015-06-23Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching Pull livepatching fixes from Jiri Kosina: - symbol lookup locking fix, from Miroslav Benes - error handling improvements in case of failure of the module coming notifier, from Minfei Huang - we were too pessimistic when kASLR has been enabled on x86 and were dropping address hints on the floor unnecessarily in such case. Fix from Jiri Kosina - a few other small fixes and cleanups * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching: livepatch: add module locking around kallsyms calls livepatch: annotate klp_init() with __init livepatch: introduce patch/func-walking helpers livepatch: make kobject in klp_object statically allocated livepatch: Prevent patch inconsistencies if the coming module notifier fails livepatch: match return value to function signature x86: kaslr: fix build due to missing ALIGN definition livepatch: x86: make kASLR logic more accurate x86: introduce kaslr_offset()
2015-06-23Merge tag 'pci-v4.2-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "PCI changes for the v4.2 merge window: Enumeration - Move pci_ari_enabled() to global header (Alex Williamson) - Account for ARI in _PRT lookups (Alex Williamson) - Remove unused pci_scan_bus_parented() (Yijing Wang) Resource management - Use host bridge _CRS info on systems with >32 bit addressing (Bjorn Helgaas) - Use host bridge _CRS info on Foxconn K8M890-8237A (Bjorn Helgaas) - Fix pci_address_to_pio() conversion of CPU address to I/O port (Zhichang Yuan) - Add pci_bus_addr_t (Yinghai Lu) PCI device hotplug - Wait for pciehp command completion where necessary (Alex Williamson) - Drop pointless ACPI-based "slot detection" check (Rafael J. Wysocki) - Check ignore_hotplug for all downstream devices (Rafael J. Wysocki) - Propagate the "ignore hotplug" setting to parent (Rafael J. Wysocki) - Inline pciehp "handle event" functions into the ISR (Bjorn Helgaas) - Clean up pciehp debug logging (Bjorn Helgaas) Power management - Remove redundant PCIe port type checking (Yijing Wang) - Add dev->has_secondary_link to track downstream PCIe links (Yijing Wang) - Use dev->has_secondary_link to find downstream links for ASPM (Yijing Wang) - Drop __pci_disable_link_state() useless "force" parameter (Bjorn Helgaas) - Simplify Clock Power Management setting (Bjorn Helgaas) Virtualization - Add ACS quirks for Intel 9-series PCH root ports (Alex Williamson) - Add function 1 DMA alias quirk for Marvell 9120 (Sakari Ailus) MSI - Disable MSI at enumeration even if kernel doesn't support MSI (Michael S. Tsirkin) - Remove unused pci_msi_off() (Bjorn Helgaas) - Rename msi_set_enable(), msix_clear_and_set_ctrl() (Michael S. Tsirkin) - Export pci_msi_set_enable(), pci_msix_clear_and_set_ctrl() (Michael S. Tsirkin) - Drop pci_msi_off() calls during probe (Michael S. Tsirkin) APM X-Gene host bridge driver - Add APM X-Gene v1 PCIe MSI/MSIX termination driver (Duc Dang) - Add APM X-Gene PCIe MSI DTS nodes (Duc Dang) - Disable Configuration Request Retry Status for v1 silicon (Duc Dang) - Allow config access to Root Port even when link is down (Duc Dang) Broadcom iProc host bridge driver - Allow override of device tree IRQ mapping function (Hauke Mehrtens) - Add BCMA PCIe driver (Hauke Mehrtens) - Directly add PCI resources (Hauke Mehrtens) - Free resource list after registration (Hauke Mehrtens) Freescale i.MX6 host bridge driver - Add speed change timeout message (Troy Kisky) - Rename imx6_pcie_start_link() to imx6_pcie_establish_link() (Bjorn Helgaas) Freescale Layerscape host bridge driver - Use dw_pcie_link_up() consistently (Bjorn Helgaas) - Factor out ls_pcie_establish_link() (Bjorn Helgaas) Marvell MVEBU host bridge driver - Remove mvebu_pcie_scan_bus() (Yijing Wang) NVIDIA Tegra host bridge driver - Remove tegra_pcie_scan_bus() (Yijing Wang) Synopsys DesignWare host bridge driver - Consolidate outbound iATU programming functions (Jisheng Zhang) - Use iATU0 for cfg and IO, iATU1 for MEM (Jisheng Zhang) - Add support for x8 links (Zhou Wang) - Wait for link to come up with consistent style (Bjorn Helgaas) - Use pci_scan_root_bus() for simplicity (Yijing Wang) TI DRA7xx host bridge driver - Use dw_pcie_link_up() consistently (Bjorn Helgaas) Miscellaneous - Include <linux/pci.h>, not <asm/pci.h> (Bjorn Helgaas) - Remove unnecessary #includes of <asm/pci.h> (Bjorn Helgaas) - Remove unused pcibios_select_root() (again) (Bjorn Helgaas) - Remove unused pci_dma_burst_advice() (Bjorn Helgaas) - xen/pcifront: Don't use deprecated function pci_scan_bus_parented() (Arnd Bergmann)" * tag 'pci-v4.2-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (58 commits) PCI: pciehp: Inline the "handle event" functions into the ISR PCI: pciehp: Rename queue_interrupt_event() to pciehp_queue_interrupt_event() PCI: pciehp: Make queue_interrupt_event() void PCI: xgene: Allow config access to Root Port even when link is down PCI: xgene: Disable Configuration Request Retry Status for v1 silicon PCI: pciehp: Clean up debug logging x86/PCI: Use host bridge _CRS info on systems with >32 bit addressing PCI: imx6: Add #define PCIE_RC_LCSR PCI: imx6: Use "u32", not "uint32_t" PCI: Remove unused pci_scan_bus_parented() xen/pcifront: Don't use deprecated function pci_scan_bus_parented() PCI: imx6: Add speed change timeout message PCI/ASPM: Simplify Clock Power Management setting PCI: designware: Wait for link to come up with consistent style PCI: layerscape: Factor out ls_pcie_establish_link() PCI: layerscape: Use dw_pcie_link_up() consistently PCI: dra7xx: Use dw_pcie_link_up() consistently x86/PCI: Use host bridge _CRS info on Foxconn K8M890-8237A PCI: pciehp: Wait for hotplug command completion where necessary PCI: Remove unused pci_dma_burst_advice() ...
2015-06-22Merge branch 'timers-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "A rather largish update for everything time and timer related: - Cache footprint optimizations for both hrtimers and timer wheel - Lower the NOHZ impact on systems which have NOHZ or timer migration disabled at runtime. - Optimize run time overhead of hrtimer interrupt by making the clock offset updates smarter - hrtimer cleanups and removal of restrictions to tackle some problems in sched/perf - Some more leap second tweaks - Another round of changes addressing the 2038 problem - First step to change the internals of clock event devices by introducing the necessary infrastructure - Allow constant folding for usecs/msecs_to_jiffies() - The usual pile of clockevent/clocksource driver updates The hrtimer changes contain updates to sched, perf and x86 as they depend on them plus changes all over the tree to cleanup API changes and redundant code, which got copied all over the place. The y2038 changes touch s390 to remove the last non 2038 safe code related to boot/persistant clock" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (114 commits) clocksource: Increase dependencies of timer-stm32 to limit build wreckage timer: Minimize nohz off overhead timer: Reduce timer migration overhead if disabled timer: Stats: Simplify the flags handling timer: Replace timer base by a cpu index timer: Use hlist for the timer wheel hash buckets timer: Remove FIFO "guarantee" timers: Sanitize catchup_timer_jiffies() usage hrtimer: Allow hrtimer::function() to free the timer seqcount: Introduce raw_write_seqcount_barrier() seqcount: Rename write_seqcount_barrier() hrtimer: Fix hrtimer_is_queued() hole hrtimer: Remove HRTIMER_STATE_MIGRATE selftest: Timers: Avoid signal deadlock in leap-a-day timekeeping: Copy the shadow-timekeeper over the real timekeeper last clockevents: Check state instead of mode in suspend/resume path selftests: timers: Add leap-second timer edge testing to leap-a-day.c ntp: Do leapsecond adjustment in adjtimex read path time: Prevent early expiry of hrtimers[CLOCK_REALTIME] at the leap second edge ntp: Introduce and use SECS_PER_DAY macro instead of 86400 ...
2015-06-22Merge branch 'x86-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 core updates from Ingo Molnar: "There were so many changes in the x86/asm, x86/apic and x86/mm topics in this cycle that the topical separation of -tip broke down somewhat - so the result is a more traditional architecture pull request, collected into the 'x86/core' topic. The topics were still maintained separately as far as possible, so bisectability and conceptual separation should still be pretty good - but there were a handful of merge points to avoid excessive dependencies (and conflicts) that would have been poorly tested in the end. The next cycle will hopefully be much more quiet (or at least will have fewer dependencies). The main changes in this cycle were: * x86/apic changes, with related IRQ core changes: (Jiang Liu, Thomas Gleixner) - This is the second and most intrusive part of changes to the x86 interrupt handling - full conversion to hierarchical interrupt domains: [IOAPIC domain] ----- | [MSI domain] --------[Remapping domain] ----- [ Vector domain ] | (optional) | [HPET MSI domain] ----- | | [DMAR domain] ----------------------------- | [Legacy domain] ----------------------------- This now reflects the actual hardware and allowed us to distangle the domain specific code from the underlying parent domain, which can be optional in the case of interrupt remapping. It's a clear separation of functionality and removes quite some duct tape constructs which plugged the remap code between ioapic/msi/hpet and the vector management. - Intel IOMMU IRQ remapping enhancements, to allow direct interrupt injection into guests (Feng Wu) * x86/asm changes: - Tons of cleanups and small speedups, micro-optimizations. This is in preparation to move a good chunk of the low level entry code from assembly to C code (Denys Vlasenko, Andy Lutomirski, Brian Gerst) - Moved all system entry related code to a new home under arch/x86/entry/ (Ingo Molnar) - Removal of the fragile and ugly CFI dwarf debuginfo annotations. Conversion to C will reintroduce many of them - but meanwhile they are only getting in the way, and the upstream kernel does not rely on them (Ingo Molnar) - NOP handling refinements. (Borislav Petkov) * x86/mm changes: - Big PAT and MTRR rework: making the code more robust and preparing to phase out exposing direct MTRR interfaces to drivers - in favor of using PAT driven interfaces (Toshi Kani, Luis R Rodriguez, Borislav Petkov) - New ioremap_wt()/set_memory_wt() interfaces to support Write-Through cached memory mappings. This is especially important for good performance on NVDIMM hardware (Toshi Kani) * x86/ras changes: - Add support for deferred errors on AMD (Aravind Gopalakrishnan) This is an important RAS feature which adds hardware support for poisoned data. That means roughly that the hardware marks data which it has detected as corrupted but wasn't able to correct, as poisoned data and raises an APIC interrupt to signal that in the form of a deferred error. It is the OS's responsibility then to take proper recovery action and thus prolonge system lifetime as far as possible. - Add support for Intel "Local MCE"s: upcoming CPUs will support CPU-local MCE interrupts, as opposed to the traditional system- wide broadcasted MCE interrupts (Ashok Raj) - Misc cleanups (Borislav Petkov) * x86/platform changes: - Intel Atom SoC updates ... and lots of other cleanups, fixlets and other changes - see the shortlog and the Git log for details" * 'x86-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (222 commits) x86/hpet: Use proper hpet device number for MSI allocation x86/hpet: Check for irq==0 when allocating hpet MSI interrupts x86/mm/pat, drivers/infiniband/ipath: Use arch_phys_wc_add() and require PAT disabled x86/mm/pat, drivers/media/ivtv: Use arch_phys_wc_add() and require PAT disabled x86/platform/intel/baytrail: Add comments about why we disabled HPET on Baytrail genirq: Prevent crash in irq_move_irq() genirq: Enhance irq_data_to_desc() to support hierarchy irqdomain iommu, x86: Properly handle posted interrupts for IOMMU hotplug iommu, x86: Provide irq_remapping_cap() interface iommu, x86: Setup Posted-Interrupts capability for Intel iommu iommu, x86: Add cap_pi_support() to detect VT-d PI capability iommu, x86: Avoid migrating VT-d posted interrupts iommu, x86: Save the mode (posted or remapped) of an IRTE iommu, x86: Implement irq_set_vcpu_affinity for intel_ir_chip iommu: dmar: Provide helper to copy shared irte fields iommu: dmar: Extend struct irte for VT-d Posted-Interrupts iommu: Add new member capability to struct irq_remap_ops x86/asm/entry/64: Disentangle error_entry/exit gsbase/ebx/usermode code x86/asm/entry/32: Shorten __audit_syscall_entry() args preparation x86/asm/entry/32: Explain reloading of registers after __audit_syscall_entry() ...
2015-06-22Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 warning fixlet from Ingo Molnar: "A build fix for certain (rare) variants of binutils that did not make it into v4.1" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot: Fix overflow warning with 32-bit binutils
2015-06-22Merge branch 'x86-microcode-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pul x86 microcode updates from Ingo Molnar: "x86 microcode loader updates from Borislav Petkov: - early parsing of the built-in microcode - cleanups - misc smaller fixes" * 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode: Correct CPU family related variable types x86/microcode: Disable builtin microcode loading on 32-bit for now x86/microcode/intel: Rename get_matching_sig() x86/microcode/intel: Simplify get_matching_sig() x86/microcode/intel: Simplify update_match_cpu() x86/microcode/intel: Rename get_matching_microcode x86/cpu/microcode: Zap changelog x86/microcode: Parse built-in microcode early x86/microcode/intel: Remove unused @rev arg of get_matching_sig() x86/microcode/intel: Get rid of revision_is_newer()
2015-06-22Merge branch 'x86-kdump-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 kdump updates from Ingo Molnar: "Three kdump robustness related improvements (Joerg Roedel)" * 'x86-kdump-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/crash: Allocate enough low memory when crashkernel=high x86/swiotlb: Try coherent allocations with __GFP_NOWARN swiotlb: Warn on allocation failure in swiotlb_alloc_coherent()