summaryrefslogtreecommitdiffstats
path: root/arch/x86
AgeCommit message (Collapse)Author
2017-03-22x86/perf: Fix CR4.PCE propagation to use active_mm instead of mmAndy Lutomirski
commit 5dc855d44c2ad960a86f593c60461f1ae1566b6d upstream. If one thread mmaps a perf event while another thread in the same mm is in some context where active_mm != mm (which can happen in the scheduler, for example), refresh_pce() would write the wrong value to CR4.PCE. This broke some PAPI tests. Reported-and-tested-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bpetkov@suse.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 7911d3f7af14 ("perf/x86: Only allow rdpmc if a perf_event is mapped") Link: http://lkml.kernel.org/r/0c5b38a76ea50e405f9abe07a13dfaef87c173a1.1489694270.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-22x86/intel_rdt: Put group node in rdtgroup_kn_unlockJiri Olsa
commit 49ec8f5b6ae3ab60385492cad900ffc8a523c895 upstream. The rdtgroup_kn_unlock waits for the last user to release and put its node. But it's calling kernfs_put on the node which calls the rdtgroup_kn_unlock, which might not be the group's directory node, but another group's file node. This race could be easily reproduced by running 2 instances of following script: mount -t resctrl resctrl /sys/fs/resctrl/ pushd /sys/fs/resctrl/ mkdir krava echo "krava" > krava/schemata rmdir krava popd umount /sys/fs/resctrl It triggers the slub debug error message with following command line config: slub_debug=,kernfs_node_cache. Call kernfs_put on the group's node to fix it. Fixes: 60cf5e101fd4 ("x86/intel_rdt: Add mkdir to resctrl file system") Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Shaohua Li <shli@fb.com> Link: http://lkml.kernel.org/r/1489501253-20248-1-git-send-email-jolsa@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-22x86/kasan: Fix boot with KASAN=y and PROFILE_ANNOTATED_BRANCHES=yAndrey Ryabinin
commit be3606ff739d1c1be36389f8737c577ad87e1f57 upstream. The kernel doesn't boot with both PROFILE_ANNOTATED_BRANCHES=y and KASAN=y options selected. With branch profiling enabled we end up calling ftrace_likely_update() before kasan_early_init(). ftrace_likely_update() is built with KASAN instrumentation, so calling it before kasan has been initialized leads to crash. Use DISABLE_BRANCH_PROFILING define to make sure that we don't call ftrace_likely_update() from early code before kasan_early_init(). Fixes: ef7f0d6a6ca8 ("x86_64: add KASan support") Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: kasan-dev@googlegroups.com Cc: Alexander Potapenko <glider@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: lkp@01.org Cc: Dmitry Vyukov <dvyukov@google.com> Link: http://lkml.kernel.org/r/20170313163337.1704-1-aryabinin@virtuozzo.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-22x86/tsc: Fix ART for TSC_KNOWN_FREQPeter Zijlstra
commit 44fee88cea43d3c2cac962e0439cb10a3cabff6d upstream. Subhransu reported that convert_art_to_tsc() isn't working for him. The ART to TSC relation is only set up for systems which use the refined TSC calibration. Systems with known TSC frequency (available via CPUID 15) are not using the refined calibration and therefor the ART to TSC relation is never established. Add the setup to the known frequency init path which skips ART calibration. The init code needs to be duplicated as for systems which use refined calibration the ART setup must be delayed until calibration has been done. The problem has been there since the ART support was introdduced, but only detected now because Subhransu tested the first time on hardware which has TSC frequency enumerated via CPUID 15. Note for stable: The conditional has changed from TSC_RELIABLE to TSC_KNOWN_FREQUENCY. [ tglx: Rewrote changelog and identified the proper 'Fixes' commit ] Fixes: f9677e0f8308 ("x86/tsc: Always Running Timer (ART) correlated clocksource") Reported-by: "Prusty, Subhransu S" <subhransu.s.prusty@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: christopher.s.hall@intel.com Cc: kevin.b.stanton@intel.com Cc: john.stultz@linaro.org Cc: akataria@vmware.com Link: http://lkml.kernel.org/r/20170313145712.GI3312@twins.programming.kicks-ass.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-22x86/unwind: Fix last frame check for aligned function stacksJosh Poimboeuf
commit 87a6b2975f0d340c75b7488d22d61d2f98fb8abf upstream. Pavel Machek reported the following warning on x86-32: WARNING: kernel stack frame pointer at f50cdf98 in swapper/2:0 has bad value (null) The warning is caused by the unwinder not realizing that it reached the end of the stack, due to an unusual prologue which gcc sometimes generates for aligned stacks. The prologue is based on a gcc feature called the Dynamic Realign Argument Pointer (DRAP). It's almost always enabled for aligned stacks when -maccumulate-outgoing-args isn't set. This issue is similar to the one fixed by the following commit: 8023e0e2a48d ("x86/unwind: Adjust last frame check for aligned function stacks") ... but that fix was specific to x86-64. Make the fix more generic to cover x86-32 as well, and also ensure that the return address referred to by the frame pointer is a copy of the original return address. Fixes: acb4608ad186 ("x86/unwind: Create stack frames for saved syscall registers") Reported-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: http://lkml.kernel.org/r/50d4924db716c264b14f1633037385ec80bf89d2.1489465609.git.jpoimboe@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-15x86/tlb: Fix tlb flushing when lguest clears PGEDaniel Borkmann
commit 2c4ea6e28dbf15ab93632c5c189f3948366b8885 upstream. Fengguang reported random corruptions from various locations on x86-32 after commits d2852a224050 ("arch: add ARCH_HAS_SET_MEMORY config") and 9d876e79df6a ("bpf: fix unlocking of jited image when module ronx not set") that uses the former. While x86-32 doesn't have a JIT like x86_64, the bpf_prog_lock_ro() and bpf_prog_unlock_ro() got enabled due to ARCH_HAS_SET_MEMORY, whereas Fengguang's test kernel doesn't have module support built in and therefore never had the DEBUG_SET_MODULE_RONX setting enabled. After investigating the crashes further, it turned out that using set_memory_ro() and set_memory_rw() didn't have the desired effect, for example, setting the pages as read-only on x86-32 would still let probe_kernel_write() succeed without error. This behavior would manifest itself in situations where the vmalloc'ed buffer was accessed prior to set_memory_*() such as in case of bpf_prog_alloc(). In cases where it wasn't, the page attribute changes seemed to have taken effect, leading to the conclusion that a TLB invalidate didn't happen. Moreover, it turned out that this issue reproduced with qemu in "-cpu kvm64" mode, but not for "-cpu host". When the issue occurs, change_page_attr_set_clr() did trigger a TLB flush as expected via __flush_tlb_all() through cpa_flush_range(), though. There are 3 variants for issuing a TLB flush: invpcid_flush_all() (depends on CPU feature bits X86_FEATURE_INVPCID, X86_FEATURE_PGE), cr4 based flush (depends on X86_FEATURE_PGE), and cr3 based flush. For "-cpu host" case in my setup, the flush used invpcid_flush_all() variant, whereas for "-cpu kvm64", the flush was cr4 based. Switching the kvm64 case to cr3 manually worked fine, and further investigating the cr4 one turned out that X86_CR4_PGE bit was not set in cr4 register, meaning the __native_flush_tlb_global_irq_disabled() wrote cr4 twice with the same value instead of clearing X86_CR4_PGE in the first write to trigger the flush. It turned out that X86_CR4_PGE was cleared from cr4 during init from lguest_arch_host_init() via adjust_pge(). The X86_FEATURE_PGE bit is also cleared from there due to concerns of using PGE in guest kernel that can lead to hard to trace bugs (see bff672e630a0 ("lguest: documentation V: Host") in init()). The CPU feature bits are cleared in dynamic boot_cpu_data, but they never propagated to __flush_tlb_all() as it uses static_cpu_has() instead of boot_cpu_has() for testing which variant of TLB flushing to use, meaning they still used the old setting of the host kernel. Clearing via setup_clear_cpu_cap(X86_FEATURE_PGE) so this would propagate to static_cpu_has() checks is too late at this point as sections have been patched already, so for now, it seems reasonable to switch back to boot_cpu_has(X86_FEATURE_PGE) as it was prior to commit c109bf95992b ("x86/cpufeature: Remove cpu_has_pge"). This lets the TLB flush trigger via cr3 as originally intended, properly makes the new page attributes visible and thus fixes the crashes seen by Fengguang. Fixes: c109bf95992b ("x86/cpufeature: Remove cpu_has_pge") Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: bp@suse.de Cc: Kees Cook <keescook@chromium.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: lkp@01.org Cc: Laura Abbott <labbott@redhat.com> Link: http://lkml.kernrl.org/r/20170301125426.l4nf65rx4wahohyl@wfg-t540p.sh.intel.com Link: http://lkml.kernel.org/r/25c41ad9eca164be4db9ad84f768965b7eb19d9e.1489191673.git.daniel@iogearbox.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-15x86, mm: fix gup_pte_range() vs DAX mappingsDan Williams
commit ef947b2529f918d9606533eb9c32b187ed6a5ede upstream. gup_pte_range() fails to check pte_allows_gup() before translating a DAX pte entry, pte_devmap(), to a page. This allows writes to read-only mappings, and bypasses the DAX cacheline dirty tracking due to missed 'mkwrite' faults. The gup_huge_pmd() path and the gup_huge_pud() path correctly check pte_allows_gup() before checking for _devmap() entries. Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings") Link: http://lkml.kernel.org/r/148804251312.36605.12665024794196605053.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reported-by: Dave Hansen <dave.hansen@linux.intel.com> Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Xiong Zhou <xzhou@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-15KVM: VMX: use correct vmcs_read/write for guest segment selector/baseChao Peng
commit 96794e4ed4d758272c486e1529e431efb7045265 upstream. Guest segment selector is 16 bit field and guest segment base is natural width field. Fix two incorrect invocations accordingly. Without this patch, build fails when aggressive inlining is used with ICC. Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-12x86/pkeys: Check against max pkey to avoid overflowsDave Hansen
commit 58ab9a088ddac4efe823471275859d64f735577e upstream. Kirill reported a warning from UBSAN about undefined behavior when using protection keys. He is running on hardware that actually has support for it, which is not widely available. The warning triggers because of very large shifts of integers when doing a pkey_free() of a large, invalid value. This happens because we never check that the pkey "fits" into the mm_pkey_allocation_map(). I do not believe there is any danger here of anything bad happening other than some aliasing issues where somebody could do: pkey_free(35); and the kernel would effectively execute: pkey_free(8); While this might be confusing to an app that was doing something stupid, it has to do something stupid and the effects are limited to the app shooting itself in the foot. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: linux-kselftest@vger.kernel.org Cc: shuah@kernel.org Cc: kirill.shutemov@linux.intel.com Link: http://lkml.kernel.org/r/20170223222603.A022ED65@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-26x86/platform/goldfish: Prevent unconditional loadingThomas Gleixner
commit 47512cfd0d7a8bd6ab71d01cd89fca19eb2093eb upstream. The goldfish platform code registers the platform device unconditionally which causes havoc in several ways if the goldfish_pdev_bus driver is enabled: - Access to the hardcoded physical memory region, which is either not available or contains stuff which is completely unrelated. - Prevents that the interrupt of the serial port can be requested - In case of a spurious interrupt it goes into a infinite loop in the interrupt handler of the pdev_bus driver (which needs to be fixed seperately). Add a 'goldfish' command line option to make the registration opt-in when the platform is compiled in. I'm seriously grumpy about this engineering trainwreck, which has seven SOBs from Intel developers for 50 lines of code. And none of them figured out that this is broken. Impressive fail! Fixes: ddd70cf93d78 ("goldfish: platform device for x86") Reported-by: Gabriel C <nix.or.die@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-13x86/vm86: Fix unused variable warning if THP is disabledKirill A. Shutemov
GCC complains about unused variable 'vma' in mark_screen_rdonly() if THP is disabled: arch/x86/kernel/vm86_32.c: In function ‘mark_screen_rdonly’: arch/x86/kernel/vm86_32.c:180:26: warning: unused variable ‘vma’ [-Wunused-variable] struct vm_area_struct *vma = find_vma(mm, 0xA0000); That's silly. pmd_trans_huge() resolves to 0 when THP is disabled, so the whole block should be eliminated. Moving the variable declaration outside the if() block shuts GCC up. Reported-by: Jérémy Lefaure <jeremy.lefaure@lse.epita.fr> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Tested-by: Borislav Petkov <bp@suse.de> Cc: Carlos O'Donell <carlos@redhat.com> Link: http://lkml.kernel.org/r/20170213125228.63645-1-kirill.shutemov@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-11Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Last minute x86 fixes: - Fix a softlockup detector warning and long delays if using ptdump with KASAN enabled. - Two more TSC-adjust fixes for interesting firmware interactions. - Two commits to fix an AMD CPU topology enumeration bug that caused a measurable gaming performance regression" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm/ptdump: Fix soft lockup in page table walker x86/tsc: Make the TSC ADJUST sanitizing work for tsc_reliable x86/tsc: Avoid the large time jump when sanitizing TSC ADJUST x86/CPU/AMD: Fix Zen SMT topology x86/CPU/AMD: Bring back Compute Unit ID
2017-02-10x86/mm/ptdump: Fix soft lockup in page table walkerAndrey Ryabinin
CONFIG_KASAN=y needs a lot of virtual memory mapped for its shadow. In that case ptdump_walk_pgd_level_core() takes a lot of time to walk across all page tables and doing this without a rescheduling causes soft lockups: NMI watchdog: BUG: soft lockup - CPU#3 stuck for 23s! [swapper/0:1] ... Call Trace: ptdump_walk_pgd_level_core+0x40c/0x550 ptdump_walk_pgd_level_checkwx+0x17/0x20 mark_rodata_ro+0x13b/0x150 kernel_init+0x2f/0x120 ret_from_fork+0x2c/0x40 I guess that this issue might arise even without KASAN on huge machines with several terabytes of RAM. Stick cond_resched() in pgd loop to fix this. Reported-by: Tobias Regnery <tobias.regnery@gmail.com> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: kasan-dev@googlegroups.com Cc: Alexander Potapenko <glider@google.com> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20170210095405.31802-1-aryabinin@virtuozzo.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-10x86/tsc: Make the TSC ADJUST sanitizing work for tsc_reliableThomas Gleixner
When the TSC is marked reliable then the synchronization check is skipped, but that also skips the TSC ADJUST sanitizing code. So on a machine with a wreckaged BIOS the TSC deviation between CPUs might go unnoticed. Let the TSC adjust sanitizing code run unconditionally and just skip the expensive synchronization checks when TSC is marked reliable. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Olof Johansson <olof@lixom.net> Link: http://lkml.kernel.org/r/20170209151231.491189912@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-10x86/tsc: Avoid the large time jump when sanitizing TSC ADJUSTThomas Gleixner
Olof reported that on a machine which has a BIOS wreckaged TSC the timestamps in dmesg are making a large jump because the TSC value is jumping forward after resetting the TSC ADJUST register to a sane value. This can be avoided by calling the TSC ADJUST saniziting function before initializing the per cpu sched clock machinery. That takes the offset into account and avoid the time jump. What cannot be avoided is that the 'Firmware Bug' warnings on the secondary CPUs are printed with the large time offsets because it would be too much effort and ugly hackery to print those warnings into a buffer and emit them after the adjustemt on the starting CPUs. It's a firmware bug and should be fixed in firmware. The weird timestamps are collateral damage and just illustrate the sillyness of the BIOS folks: [ 0.397445] smp: Bringing up secondary CPUs ... [ 0.402100] x86: Booting SMP configuration: [ 0.406343] .... node #0, CPUs: #1 [1265776479.930667] [Firmware Bug]: TSC ADJUST differs: Reference CPU0: -2978888639075328 CPU1: -2978888639183101 [1265776479.944664] TSC ADJUST synchronize: Reference CPU0: 0 CPU1: -2978888639183101 [ 0.508119] #2 [1265776480.032346] [Firmware Bug]: TSC ADJUST differs: Reference CPU0: -2978888639075328 CPU2: -2978888639183677 [1265776480.044192] TSC ADJUST synchronize: Reference CPU0: 0 CPU2: -2978888639183677 [ 0.607643] #3 [1265776480.131874] [Firmware Bug]: TSC ADJUST differs: Reference CPU0: -2978888639075328 CPU3: -2978888639184530 [1265776480.143720] TSC ADJUST synchronize: Reference CPU0: 0 CPU3: -2978888639184530 [ 0.707108] smp: Brought up 1 node, 4 CPUs [ 0.711271] smpboot: Total of 4 processors activated (21698.88 BogoMIPS) Reported-by: Olof Johansson <olof@lixom.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20170209151231.411460506@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-08Revert "x86/ioapic: Restore IO-APIC irq_chip retrigger callback"Linus Torvalds
This reverts commit 020eb3daaba2857b32c4cf4c82f503d6a00a67de. Gabriel C reports that it causes his machine to not boot, and we haven't tracked down the reason for it yet. Since the bug it fixes has been around for a longish time, we're better off reverting the fix for now. Gabriel says: "It hangs early and freezes with a lot RCU warnings. I bisected it down to : > Ruslan Ruslichenko (1): > x86/ioapic: Restore IO-APIC irq_chip retrigger callback Reverting this one fixes the problem for me.. The box is a PRIMERGY TX200 S5 , 2 socket , 2 x E5520 CPU(s) installed" and Ruslan and Thomas are currently stumped. Reported-and-bisected-by: Gabriel C <nix.or.die@gmail.com> Cc: Ruslan Ruslichenko <rruslich@cisco.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@kernel.org # for the backport of the original commit Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-06Merge branch 'linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: - use-after-free in algif_aead - modular aesni regression when pcbc is modular but absent - bug causing IO page faults in ccp - double list add in ccp - NULL pointer dereference in qat (two patches) - panic in chcr - NULL pointer dereference in chcr - out-of-bound access in chcr * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: chcr - Fix key length for RFC4106 crypto: algif_aead - Fix kernel panic on list_del crypto: aesni - Fix failure when pcbc module is absent crypto: ccp - Fix double add when creating new DMA command crypto: ccp - Fix DMA operations when IOMMU is enabled crypto: chcr - Check device is allocated before use crypto: chcr - Fix panic on dma_unmap_sg crypto: qat - zero esram only for DH85x devices crypto: qat - fix bar discovery for c62x
2017-02-05x86/CPU/AMD: Fix Zen SMT topologyYazen Ghannam
After: a33d331761bc ("x86/CPU/AMD: Fix Bulldozer topology") our SMT scheduling topology for Fam17h systems is broken, because the ThreadId is included in the ApicId when SMT is enabled. So, without further decoding cpu_core_id is unique for each thread rather than the same for threads on the same core. This didn't affect systems with SMT disabled. Make cpu_core_id be what it is defined to be. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> # 4.9 Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170205105022.8705-2-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-05x86/CPU/AMD: Bring back Compute Unit IDBorislav Petkov
Commit: a33d331761bc ("x86/CPU/AMD: Fix Bulldozer topology") restored the initial approach we had with the Fam15h topology of enumerating CU (Compute Unit) threads as cores. And this is still correct - they're beefier than HT threads but still have some shared functionality. Our current approach has a problem with the Mad Max Steam game, for example. Yves Dionne reported a certain "choppiness" while playing on v4.9.5. That problem stems most likely from the fact that the CU threads share resources within one CU and when we schedule to a thread of a different compute unit, this incurs latency due to migrating the working set to a different CU through the caches. When the thread siblings mask mirrors that aspect of the CUs and threads, the scheduler pays attention to it and tries to schedule within one CU first. Which takes care of the latency, of course. Reported-by: Yves Dionne <yves.dionne@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> # 4.9 Cc: Brice Goglin <Brice.Goglin@inria.fr> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yazen Ghannam <yazen.ghannam@amd.com> Link: http://lkml.kernel.org/r/20170205105022.8705-1-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-04Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Thomas Gleixner: - Prevent double activation of interrupt lines, which causes problems on certain interrupt controllers - Handle the fallout of the above because x86 (ab)uses the activation function to reconfigure interrupts under the hood. * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/irq: Make irq activate operations symmetric irqdomain: Avoid activating interrupts more than once
2017-02-04Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fix from Radim Krčmář: "Fix a regression that prevented migration between hosts with different XSAVE features even if the missing features were not used by the guest (for stable)" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: do not save guest-unsupported XSAVE state
2017-02-03KVM: x86: do not save guest-unsupported XSAVE stateRadim Krčmář
Saving unsupported state prevents migration when the new host does not support a XSAVE feature of the original host, even if the feature is not exposed to the guest. We've masked host features with guest-visible features before, with 4344ee981e21 ("KVM: x86: only copy XSAVE state for the supported features") and dropped it when implementing XSAVES. Do it again. Fixes: df1daba7d1cb ("KVM: x86: support XSAVES usage in the host") Cc: stable@vger.kernel.org Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-02-03crypto: aesni - Fix failure when pcbc module is absentHerbert Xu
When aesni is built as a module together with pcbc, the pcbc module must be present for aesni to load. However, the pcbc module may not be present for reasons such as its absence on initramfs. This patch allows the aesni to function even if the pcbc module is enabled but not present. Reported-by: Arkadiusz Miśkiewicz <arekm@maven.pl> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-02Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes: - two microcode loader fixes - two FPU xstate handling fixes - an MCE timer handling related crash fix" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Make timer handling more robust x86/microcode: Do not access the initrd after it has been freed x86/fpu/xstate: Fix xcomp_bv in XSAVES header x86/fpu: Set the xcomp_bv when we fake up a XSAVES area x86/microcode/intel: Drop stashed AP patch pointer optimization
2017-02-02Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Five kernel fixes: - an mmap tracing ABI fix for certain mappings - a use-after-free fix, found via KASAN - three CPU hotplug related x86 PMU driver fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel/uncore: Make package handling more robust perf/x86/intel/uncore: Clean up hotplug conversion fallout perf/x86/intel/rapl: Make package handling more robust perf/core: Fix PERF_RECORD_MMAP2 prot/flags for anonymous memory perf/core: Fix use-after-free bug
2017-02-01perf/x86/intel/uncore: Make package handling more robustThomas Gleixner
The package management code in uncore relies on package mapping being available before a CPU is started. This changed with: 9d85eb9119f4 ("x86/smpboot: Make logical package management more robust") because the ACPI/BIOS information turned out to be unreliable, but that left uncore in broken state. This was not noticed because on a regular boot all CPUs are online before uncore is initialized. Move the allocation to the CPU online callback and simplify the hotplug handling. At this point the package mapping is established and correct. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Yasuaki Ishimatsu <yasu.isimatu@gmail.com> Fixes: 9d85eb9119f4 ("x86/smpboot: Make logical package management more robust") Link: http://lkml.kernel.org/r/20170131230141.377156255@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-01perf/x86/intel/uncore: Clean up hotplug conversion falloutThomas Gleixner
The recent conversion to the hotplug state machine kept two mechanisms from the original code: 1) The first_init logic which adds the number of online CPUs in a package to the refcount. That's wrong because the callbacks are executed for all online CPUs. Remove it so the refcounting is correct. 2) The on_each_cpu() call to undo box->init() in the error handling path. That's bogus because when the prepare callback fails no box has been initialized yet. Remove it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Yasuaki Ishimatsu <yasu.isimatu@gmail.com> Fixes: 1a246b9f58c6 ("perf/x86/intel/uncore: Convert to hotplug state machine") Link: http://lkml.kernel.org/r/20170131230141.298032324@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-01perf/x86/intel/rapl: Make package handling more robustThomas Gleixner
The package management code in RAPL relies on package mapping being available before a CPU is started. This changed with: 9d85eb9119f4 ("x86/smpboot: Make logical package management more robust") because the ACPI/BIOS information turned out to be unreliable, but that left RAPL in broken state. This was not noticed because on a regular boot all CPUs are online before RAPL is initialized. A possible fix would be to reintroduce the mess which allocates a package data structure in CPU prepare and when it turns out to already exist in starting throw it away later in the CPU online callback. But that's a horrible hack and not required at all because RAPL becomes functional for perf only in the CPU online callback. That's correct because user space is not yet informed about the CPU being onlined, so nothing caan rely on RAPL being available on that particular CPU. Move the allocation to the CPU online callback and simplify the hotplug handling. At this point the package mapping is established and correct. This also adds a missing check for available package data in the event_init() function. Reported-by: Yasuaki Ishimatsu <yasu.isimatu@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Fixes: 9d85eb9119f4 ("x86/smpboot: Make logical package management more robust") Link: http://lkml.kernel.org/r/20170131230141.212593966@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-31x86/mce: Make timer handling more robustThomas Gleixner
Erik reported that on a preproduction hardware a CMCI storm triggers the BUG_ON in add_timer_on(). The reason is that the per CPU MCE timer is started by the CMCI logic before the MCE CPU hotplug callback starts the timer with add_timer_on(). So the timer is already queued which triggers the BUG. Using add_timer_on() is pretty pointless in this code because the timer is strictlty per CPU, initialized as pinned and all operations which arm the timer happen on the CPU to which the timer belongs. Simplify the whole machinery by using mod_timer() instead of add_timer_on() which avoids the problem because mod_timer() can handle already queued timers. Use __start_timer() everywhere so the earliest armed expiry time is preserved. Reported-by: Erik Veijola <erik.veijola@intel.com> Tested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1701310936080.3457@nanos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-01-31x86/irq: Make irq activate operations symmetricThomas Gleixner
The recent commit which prevents double activation of interrupts unearthed interesting code in x86. The code (ab)uses irq_domain_activate_irq() to reconfigure an already activated interrupt. That trips over the prevention code now. Fix it by deactivating the interrupt before activating the new configuration. Fixes: 08d85f3ea99f1 "irqdomain: Avoid activating interrupts more than once" Reported-and-tested-by: Mike Galbraith <efault@gmx.de> Reported-and-tested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1701311901580.3457@nanos
2017-01-30x86/microcode: Do not access the initrd after it has been freedBorislav Petkov
When we look for microcode blobs, we first try builtin and if that doesn't succeed, we fallback to the initrd supplied to the kernel. However, at some point doing boot, that initrd gets jettisoned and we shouldn't access it anymore. But we do, as the below KASAN report shows. That's because find_microcode_in_initrd() doesn't check whether the initrd is still valid or not. So do that. ================================================================== BUG: KASAN: use-after-free in find_cpio_data Read of size 1 by task swapper/1/0 page:ffffea0000db9d40 count:0 mapcount:0 mapping: (null) index:0x1 flags: 0x100000000000000() raw: 0100000000000000 0000000000000000 0000000000000001 00000000ffffffff raw: dead000000000100 dead000000000200 0000000000000000 0000000000000000 page dumped because: kasan: bad access detected CPU: 1 PID: 0 Comm: swapper/1 Tainted: G W 4.10.0-rc5-debug-00075-g2dbde22 #3 Hardware name: Dell Inc. XPS 13 9360/0839Y6, BIOS 1.2.3 12/01/2016 Call Trace: dump_stack ? _atomic_dec_and_lock ? __dump_page kasan_report_error ? pointer ? find_cpio_data __asan_report_load1_noabort ? find_cpio_data find_cpio_data ? vsprintf ? dump_stack ? get_ucode_user ? print_usage_bug find_microcode_in_initrd __load_ucode_intel ? collect_cpu_info_early ? debug_check_no_locks_freed load_ucode_intel_ap ? collect_cpu_info ? trace_hardirqs_on ? flat_send_IPI_mask_allbutself load_ucode_ap ? get_builtin_firmware ? flush_tlb_func ? do_raw_spin_trylock ? cpumask_weight cpu_init ? trace_hardirqs_off ? play_dead_common ? native_play_dead ? hlt_play_dead ? syscall_init ? arch_cpu_idle_dead ? do_idle start_secondary start_cpu Memory state around the buggy address: ffff880036e74f00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff880036e74f80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >ffff880036e75000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ^ ffff880036e75080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff880036e75100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ================================================================== Reported-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Tested-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170126165833.evjemhbqzaepirxo@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-28x86/efi: Always map the first physical page into the EFI pagetablesJiri Kosina
Commit: 129766708 ("x86/efi: Only map RAM into EFI page tables if in mixed-mode") stopped creating 1:1 mappings for all RAM, when running in native 64-bit mode. It turns out though that there are 64-bit EFI implementations in the wild (this particular problem has been reported on a Lenovo Yoga 710-11IKB), which still make use of the first physical page for their own private use, even though they explicitly mark it EFI_CONVENTIONAL_MEMORY in the memory map. In case there is no mapping for this particular frame in the EFI pagetables, as soon as firmware tries to make use of it, a triple fault occurs and the system reboots (in case of the Yoga 710-11IKB this is very early during bootup). Fix that by always mapping the first page of physical memory into the EFI pagetables. We're free to hand this page to the BIOS, as trim_bios_range() will reserve the first page and isolate it away from memory allocators anyway. Note that just reverting 129766708 alone is not enough on v4.9-rc1+ to fix the regression on affected hardware, as this commit: ab72a27da ("x86/efi: Consolidate region mapping logic") later made the first physical frame not to be mapped anyway. Reported-by: Hanka Pavlikova <hanka@ucw.cz> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Laura Abbott <labbott@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vojtech Pavlik <vojtech@ucw.cz> Cc: Waiman Long <waiman.long@hpe.com> Cc: linux-efi@vger.kernel.org Cc: stable@kernel.org # v4.8+ Fixes: 129766708 ("x86/efi: Only map RAM into EFI page tables if in mixed-mode") Link: http://lkml.kernel.org/r/20170127222552.22336-1-matt@codeblueprint.co.uk [ Tidied up the changelog and the comment. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-24x86/fpu/xstate: Fix xcomp_bv in XSAVES headerYu-cheng Yu
The compacted-format XSAVES area is determined at boot time and never changed after. The field xsave.header.xcomp_bv indicates which components are in the fixed XSAVES format. In fpstate_init() we did not set xcomp_bv to reflect the XSAVES format since at the time there is no valid data. However, after we do copy_init_fpstate_to_fpregs() in fpu__clear(), as in commit: b22cbe404a9c x86/fpu: Fix invalid FPU ptrace state after execve() and when __fpu_restore_sig() does fpu__restore() for a COMPAT-mode app, a #GP occurs. This can be easily triggered by doing valgrind on a COMPAT-mode "Hello World," as reported by Joakim Tjernlund and others: https://bugzilla.kernel.org/show_bug.cgi?id=190061 Fix it by setting xcomp_bv correctly. This patch also moves the xcomp_bv initialization to the proper place, which was in copyin_to_xsaves() as of: 4c833368f0bf x86/fpu: Set the xcomp_bv when we fake up a XSAVES area which fixed the bug too, but it's more efficient and cleaner to initialize things once per boot, not for every signal handling operation. Reported-by: Kevin Hao <haokexin@gmail.com> Reported-by: Joakim Tjernlund <Joakim.Tjernlund@infinera.com> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@suse.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: haokexin@gmail.com Link: http://lkml.kernel.org/r/1485212084-4418-1-git-send-email-yu-cheng.yu@intel.com [ Combined it with 4c833368f0bf. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-23x86/fpu: Set the xcomp_bv when we fake up a XSAVES areaKevin Hao
I got the following calltrace on a Apollo Lake SoC with 32-bit kernel: WARNING: CPU: 2 PID: 261 at arch/x86/include/asm/fpu/internal.h:363 fpu__restore+0x1f5/0x260 [...] Hardware name: Intel Corp. Broxton P/NOTEBOOK, BIOS APLIRVPA.X64.0138.B35.1608091058 08/09/2016 Call Trace: dump_stack() __warn() ? fpu__restore() warn_slowpath_null() fpu__restore() __fpu__restore_sig() fpu__restore_sig() restore_sigcontext.isra.9() sys_sigreturn() do_int80_syscall_32() entry_INT80_32() The reason is that a #GP occurs when executing XRSTORS. The root cause is that we forget to set the xcomp_bv when we fake up the XSAVES area in the copyin_to_xsaves() function. Signed-off-by: Kevin Hao <haokexin@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/1485075023-30161-1-git-send-email-haokexin@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-01-23x86/microcode/intel: Drop stashed AP patch pointer optimizationBorislav Petkov
This was meant to save us the scanning of the microcode containter in the initrd since the first AP had already done that but it can also hurt us: Imagine a single hyperthreaded CPU (Intel(R) Atom(TM) CPU N270, for example) which updates the microcode on the BSP but since the microcode engine is shared between the two threads, the update on CPU1 doesn't happen because it has already happened on CPU0 and we don't find a newer microcode revision on CPU1. Which doesn't set the intel_ucode_patch pointer and at initrd jettisoning time we don't save the microcode patch for later application. Now, when we suspend to RAM, the loaded microcode gets cleared so we need to reload but there's no patch saved in the cache. Removing the optimization fixes this issue and all is fine and dandy. Fixes: 06b8534cb728 ("x86/microcode: Rework microcode loading") Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170120202955.4091-2-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-01-22Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Thomas Gleixner: "Restore the retrigger callbacks in the IO APIC irq chips. That addresses a long standing regression which got introduced with the rewrite of the x86 irq subsystem two years ago and went unnoticed so far" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ioapic: Restore IO-APIC irq_chip retrigger callback
2017-01-20Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Radim Krčmář: "ARM: - Fix for timer setup on VHE machines - Drop spurious warning when the timer races against the vcpu running again - Prevent a vgic deadlock when the initialization fails (for stable) s390: - Fix a kernel memory exposure (for stable) x86: - Fix exception injection when hypercall instruction cannot be patched" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: s390: do not expose random data via facility bitmap KVM: x86: fix fixing of hypercalls KVM: arm/arm64: vgic: Fix deadlock on error handling KVM: arm64: Access CNTHCTL_EL2 bit fields correctly on VHE systems KVM: arm/arm64: Fix occasional warning from the timer work function
2017-01-19Merge tag 'pci-v4.10-fixes-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fixes from Bjorn Helgaas: - recognize that a PCI-to-PCIe bridge originates a PCIe hierarchy, so we enumerate that hierarchy correctly - X-Gene: fix a change merged for v4.10 that broke MSI - Keystone: avoid reading undefined registers, which can cause asynchronous external aborts - Supermicro X8DTH-i/6/iF/6F: ignore broken _CRS that caused us to change (and break) existing I/O port assignments * tag 'pci-v4.10-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI/MSI: pci-xgene-msi: Fix CPU hotplug registration handling PCI: Enumerate switches below PCI-to-PCIe bridges x86/PCI: Ignore _CRS on Supermicro X8DTH-i/6/iF/6F PCI: designware: Check for iATU unroll only on platforms that use ATU
2017-01-18Merge branch 'smp-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull SMP hotplug update from Thomas Gleixner: "This contains a trivial typo fix and an extension to the core code for dynamically allocating states in the prepare stage. The extension is necessary right now because we need a proper way to unbreak LTTNG, which iscurrently non functional due to the removal of the notifiers. Surely it's out of tree, but it's widely used by distros. The simple solution would have been to reserve a state for LTTNG, but I'm not fond about unused crap in the kernel and the dynamic range, which we admittedly should have done right away, allows us to remove quite some of the hardcoded states, i.e. those which have no ordering requirements. So doing the right thing now is better than having an smaller intermediate solution which needs to be reworked anyway" * 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: cpu/hotplug: Provide dynamic range for prepare stage perf/x86/amd/ibs: Fix typo after cleanup state names in cpu/hotplug
2017-01-18x86/ioapic: Restore IO-APIC irq_chip retrigger callbackRuslan Ruslichenko
commit d32932d02e18 removed the irq_retrigger callback from the IO-APIC chip and did not add it to the new IO-APIC-IR irq chip. Unfortunately the software resend fallback is not enabled on X86, so edge interrupts which are received during the lazy disabled state of the interrupt line are not retriggered and therefor lost. Restore the callbacks. [ tglx: Massaged changelog ] Fixes: d32932d02e18 ("x86/irq: Convert IOAPIC to use hierarchical irqdomain interfaces") Signed-off-by: Ruslan Ruslichenko <rruslich@cisco.com> Cc: xe-linux-external@cisco.com Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1484662432-13580-1-git-send-email-rruslich@cisco.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-01-17KVM: x86: fix fixing of hypercallsDmitry Vyukov
emulator_fix_hypercall() replaces hypercall with vmcall instruction, but it does not handle GP exception properly when writes the new instruction. It can return X86EMUL_PROPAGATE_FAULT without setting exception information. This leads to incorrect emulation and triggers WARN_ON(ctxt->exception.vector > 0x1f) in x86_emulate_insn() as discovered by syzkaller fuzzer: WARNING: CPU: 2 PID: 18646 at arch/x86/kvm/emulate.c:5558 Call Trace: warn_slowpath_null+0x2c/0x40 kernel/panic.c:582 x86_emulate_insn+0x16a5/0x4090 arch/x86/kvm/emulate.c:5572 x86_emulate_instruction+0x403/0x1cc0 arch/x86/kvm/x86.c:5618 emulate_instruction arch/x86/include/asm/kvm_host.h:1127 [inline] handle_exception+0x594/0xfd0 arch/x86/kvm/vmx.c:5762 vmx_handle_exit+0x2b7/0x38b0 arch/x86/kvm/vmx.c:8625 vcpu_enter_guest arch/x86/kvm/x86.c:6888 [inline] vcpu_run arch/x86/kvm/x86.c:6947 [inline] Set exception information when write in emulator_fix_hypercall() fails. Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Cc: kvm@vger.kernel.org Cc: syzkaller@googlegroups.com Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-01-17perf/x86/intel: Handle exclusive threadid correctly on CPU hotplugZhou Chengming
The CPU hotplug function intel_pmu_cpu_starting() sets cpu_hw_events.excl_thread_id unconditionally to 1 when the shared exclusive counters data structure is already availabe for the sibling thread. This works during the boot process because the first sibling gets threadid 0 assigned and the second sibling which shares the data structure gets 1. But when the first thread of the core is offlined and onlined again it shares the data structure with the second thread and gets exclusive thread id 1 assigned as well. Prevent this by checking the threadid of the already online thread. [ tglx: Rewrote changelog ] Signed-off-by: Zhou Chengming <zhouchengming1@huawei.com> Cc: NuoHan Qiao <qiaonuohan@huawei.com> Cc: ak@linux.intel.com Cc: peterz@infradead.org Cc: kan.liang@intel.com Cc: dave.hansen@linux.intel.com Cc: eranian@google.com Cc: qiaonuohan@huawei.com Cc: davidcc@google.com Cc: guohanjun@huawei.com Link: http://lkml.kernel.org/r/1484536871-3131-1-git-send-email-zhouchengming1@huawei.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- --- arch/x86/events/intel/core.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
2017-01-15Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes: - unwinder fixes - AMD CPU topology enumeration fixes - microcode loader fixes - x86 embedded platform fixes - fix for a bootup crash that may trigger when clearcpuid= is used with invalid values" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mpx: Use compatible types in comparison to fix sparse error x86/tsc: Add the Intel Denverton Processor to native_calibrate_tsc() x86/entry: Fix the end of the stack for newly forked tasks x86/unwind: Include __schedule() in stack traces x86/unwind: Disable KASAN checks for non-current tasks x86/unwind: Silence warnings for non-current tasks x86/microcode/intel: Use correct buffer size for saving microcode data x86/microcode/intel: Fix allocation size of struct ucode_patch x86/microcode/intel: Add a helper which gives the microcode revision x86/microcode: Use native CPUID to tickle out microcode revision x86/CPU: Add native CPUID variants returning a single datum x86/boot: Add missing declaration of string functions x86/CPU/AMD: Fix Bulldozer topology x86/platform/intel-mid: Rename 'spidev' to 'mrfld_spidev' x86/cpu: Fix typo in the comment for Anniedale x86/cpu: Fix bootup crashes by sanitizing the argument of the 'clearcpuid=' command-line option
2017-01-15Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Misc race fixes uncovered by fuzzing efforts, a Sparse fix, two PMU driver fixes, plus miscellanous tooling fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86: Reject non sampling events with precise_ip perf/x86/intel: Account interrupts for PEBS errors perf/core: Fix concurrent sys_perf_event_open() vs. 'move_group' race perf/core: Fix sys_perf_event_open() vs. hotplug perf/x86/intel: Use ULL constant to prevent undefined shift behaviour perf/x86/intel/uncore: Fix hardcoded socket 0 assumption in the Haswell init code perf/x86: Set pmu->module in Intel PMU modules perf probe: Fix to probe on gcc generated symbols for offline kernel perf probe: Fix --funcs to show correct symbols for offline module perf symbols: Robustify reading of build-id from sysfs perf tools: Install tools/lib/traceevent plugins with install-bin tools lib traceevent: Fix prev/next_prio for deadline tasks perf record: Fix --switch-output documentation and comment perf record: Make __record_options static tools lib subcmd: Add OPT_STRING_OPTARG_SET option perf probe: Fix to get correct modname from elf header samples/bpf trace_output_user: Remove duplicate sys/ioctl.h include samples/bpf sock_example: Avoid getting ethhdr from two includes perf sched timehist: Show total scheduling time
2017-01-15Merge branch 'efi-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI fixes from Ingo Molnar: "A number of regression fixes: - Fix a boot hang on machines that have somewhat unusual memory map entries of phys_addr=0x0 num_pages=0, which broke due to a recent commit. This commit got cherry-picked from the v4.11 queue because the bug is affecting real machines. - Fix a boot hang also reported by KASAN, caused by incorrect init ordering introduced by a recent optimization. - Fix a recent robustification fix to allocate_new_fdt_and_exit_boot() that introduced an invalid assumption. Neither bugs were seen in the wild AFAIK" * 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi/x86: Prune invalid memory map entries and fix boot regression x86/efi: Don't allocate memmap through memblock after mm_init() efi/libstub/arm*: Pass latest memory map to the kernel
2017-01-14efi/x86: Prune invalid memory map entries and fix boot regressionPeter Jones
Some machines, such as the Lenovo ThinkPad W541 with firmware GNET80WW (2.28), include memory map entries with phys_addr=0x0 and num_pages=0. These machines fail to boot after the following commit, commit 8e80632fb23f ("efi/esrt: Use efi_mem_reserve() and avoid a kmalloc()") Fix this by removing such bogus entries from the memory map. Furthermore, currently the log output for this case (with efi=debug) looks like: [ 0.000000] efi: mem45: [Reserved | | | | | | | | | | | | ] range=[0x0000000000000000-0xffffffffffffffff] (0MB) This is clearly wrong, and also not as informative as it could be. This patch changes it so that if we find obviously invalid memory map entries, we print an error and skip those entries. It also detects the display of the address range calculation overflow, so the new output is: [ 0.000000] efi: [Firmware Bug]: Invalid EFI memory map entries: [ 0.000000] efi: mem45: [Reserved | | | | | | | | | | | | ] range=[0x0000000000000000-0x0000000000000000] (invalid) It also detects memory map sizes that would overflow the physical address, for example phys_addr=0xfffffffffffff000 and num_pages=0x0200000000000001, and prints: [ 0.000000] efi: [Firmware Bug]: Invalid EFI memory map entries: [ 0.000000] efi: mem45: [Reserved | | | | | | | | | | | | ] range=[phys_addr=0xfffffffffffff000-0x20ffffffffffffffff] (invalid) It then removes these entries from the memory map. Signed-off-by: Peter Jones <pjones@redhat.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> [ardb: refactor for clarity with no functional changes, avoid PAGE_SHIFT] Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> [Matt: Include bugzilla info in commit log] Cc: <stable@vger.kernel.org> # v4.9+ Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://bugzilla.kernel.org/show_bug.cgi?id=191121 Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-14perf/x86: Reject non sampling events with precise_ipJiri Olsa
As Peter suggested [1] rejecting non sampling PEBS events, because they dont make any sense and could cause bugs in the NMI handler [2]. [1] http://lkml.kernel.org/r/20170103094059.GC3093@worktop [2] http://lkml.kernel.org/r/1482931866-6018-3-git-send-email-jolsa@kernel.org Signed-off-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vince@deater.net> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/20170103142454.GA26251@krava Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-14perf/x86/intel: Account interrupts for PEBS errorsJiri Olsa
It's possible to set up PEBS events to get only errors and not any data, like on SNB-X (model 45) and IVB-EP (model 62) via 2 perf commands running simultaneously: taskset -c 1 ./perf record -c 4 -e branches:pp -j any -C 10 This leads to a soft lock up, because the error path of the intel_pmu_drain_pebs_nhm() does not account event->hw.interrupt for error PEBS interrupts, so in case you're getting ONLY errors you don't have a way to stop the event when it's over the max_samples_per_tick limit: NMI watchdog: BUG: soft lockup - CPU#22 stuck for 22s! [perf_fuzzer:5816] ... RIP: 0010:[<ffffffff81159232>] [<ffffffff81159232>] smp_call_function_single+0xe2/0x140 ... Call Trace: ? trace_hardirqs_on_caller+0xf5/0x1b0 ? perf_cgroup_attach+0x70/0x70 perf_install_in_context+0x199/0x1b0 ? ctx_resched+0x90/0x90 SYSC_perf_event_open+0x641/0xf90 SyS_perf_event_open+0x9/0x10 do_syscall_64+0x6c/0x1f0 entry_SYSCALL64_slow_path+0x25/0x25 Add perf_event_account_interrupt() which does the interrupt and frequency checks and call it from intel_pmu_drain_pebs_nhm()'s error path. We keep the pending_kill and pending_wakeup logic only in the __perf_event_overflow() path, because they make sense only if there's any data to deliver. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vince@deater.net> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1482931866-6018-2-git-send-email-jolsa@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-14x86/mpx: Use compatible types in comparison to fix sparse errorTobias Klauser
info->si_addr is of type void __user *, so it should be compared against something from the same address space. This fixes the following sparse error: arch/x86/mm/mpx.c:296:27: error: incompatible types in comparison expression (different address spaces) Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-14x86/tsc: Add the Intel Denverton Processor to native_calibrate_tsc()Len Brown
The Intel Denverton microserver uses a 25 MHz TSC crystal, so we can derive its exact [*] TSC frequency using CPUID and some arithmetic, eg.: TSC: 1800 MHz (25000000 Hz * 216 / 3 / 1000000) [*] 'exact' is only as good as the crystal, which should be +/- 20ppm Signed-off-by: Len Brown <len.brown@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/306899f94804aece6d8fa8b4223ede3b48dbb59c.1484287748.git.len.brown@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>