aboutsummaryrefslogtreecommitdiffstats
path: root/arch/x86/kernel/smpboot.c
AgeCommit message (Collapse)Author
2019-09-17Merge branch 'x86-apic-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 apic updates from Thomas Gleixner: - Cleanup the apic IPI implementation by removing duplicated code and consolidating the functions into the APIC core. - Implement a safe variant of the IPI broadcast mode. Contrary to earlier attempts this uses the core tracking of which CPUs have been brought online at least once so that a broadcast does not end up in some dead end in BIOS/SMM code when the CPU is still waiting for init. Once all CPUs have been brought up once, IPI broadcasting is enabled. Before that regular one by one IPIs are issued. - Drop the paravirt CR8 related functions as they have no user anymore - Initialize the APIC TPR to block interrupt 16-31 as they are reserved for CPU exceptions and should never be raised by any well behaving device. - Emit a warning when vector space exhaustion breaks the admin set affinity of an interrupt. - Make sure to use the NMI fallback when shutdown via reboot vector IPI fails. The original code had conditions which prevent the code path to be reached. - Annotate various APIC config variables as RO after init. [ The ipi broadcase change came in earlier through the cpu hotplug branch, but I left the explanation in the commit message since it was shared between the two different branches - Linus ] * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (28 commits) x86/apic/vector: Warn when vector space exhaustion breaks affinity x86/apic: Annotate global config variables as "read-only after init" x86/apic/x2apic: Implement IPI shorthands support x86/apic/flat64: Remove the IPI shorthand decision logic x86/apic: Share common IPI helpers x86/apic: Remove the shorthand decision logic x86/smp: Enhance native_send_call_func_ipi() x86/smp: Move smp_function_call implementations into IPI code x86/apic: Provide and use helper for send_IPI_allbutself() x86/apic: Add static key to Control IPI shorthands x86/apic: Move no_ipi_broadcast() out of 32bit x86/apic: Add NMI_VECTOR wait to IPI shorthand x86/apic: Remove dest argument from __default_send_IPI_shortcut() x86/hotplug: Silence APIC and NMI when CPU is dead x86/cpu: Move arch_smt_update() to a neutral place x86/apic/uv: Make x2apic_extra_bits static x86/apic: Consolidate the apic local headers x86/apic: Move apic_flat_64 header into apic directory x86/apic: Move ipi header into apic directory x86/apic: Cleanup the include maze ...
2019-07-25x86/hotplug: Silence APIC and NMI when CPU is deadThomas Gleixner
In order to support IPI/NMI broadcasting via the shorthand mechanism side effects of shorthands need to be mitigated: Shorthand IPIs and NMIs hit all CPUs including unplugged CPUs Neither of those can be handled on unplugged CPUs for obvious reasons. It would be trivial to just fully disable the APIC via the enable bit in MSR_APICBASE. But that's not possible because clearing that bit on systems based on the 3 wire APIC bus would require a hardware reset to bring it back as the APIC would lose track of bus arbitration. On systems with FSB delivery APICBASE could be disabled, but it has to be guaranteed that no interrupt is sent to the APIC while in that state and it's not clear from the SDM whether it still responds to INIT/SIPI messages. Therefore stay on the safe side and switch the APIC into soft disabled mode so it won't deliver any regular vector to the CPU. NMIs are still propagated to the 'dead' CPUs. To mitigate that add a check for the CPU being offline on early nmi entry and if so bail. Note, this cannot use the stop/restart_nmi() magic which is used in the alternatives code. A dead CPU cannot invoke nmi_enter() or anything else due to RCU and other reasons. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1907241723290.1791@nanos.tec.linutronix.de
2019-07-22x86/realmode: Remove trampoline_statusPingfan Liu
There is no reader of trampoline_status, it's only written. It turns out that after commit ce4b1b16502b ("x86/smpboot: Initialize secondary CPU only if master CPU will wait for it"), trampoline_status is not needed any more. Signed-off-by: Pingfan Liu <kernelfans@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1563266424-3472-1-git-send-email-kernelfans@gmail.com
2019-07-17Revert "x86/paravirt: Set up the virt_spin_lock_key after static keys get ↵Zhenzhong Duan
initialized" This reverts commit ca5d376e17072c1b60c3fee66f3be58ef018952d. Commit 8990cac6e5ea ("x86/jump_label: Initialize static branching early") adds jump_label_init() call in setup_arch() to make static keys initialized early, so we could use the original simpler code again. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Juergen Gross <jgross@suse.com>
2019-07-10x86/asm: Move native_write_cr0/4() out of lineThomas Gleixner
The pinning of sensitive CR0 and CR4 bits caused a boot crash when loading the kvm_intel module on a kernel compiled with CONFIG_PARAVIRT=n. The reason is that the static key which controls the pinning is marked RO after init. The kvm_intel module contains a CR4 write which requires to update the static key entry list. That obviously does not work when the key is in a RO section. With CONFIG_PARAVIRT enabled this does not happen because the CR4 write uses the paravirt indirection and the actual write function is built in. As the key is intended to be immutable after init, move native_write_cr0/4() out of line. While at it consolidate the update of the cr4 shadow variable and store the value right away when the pinning is initialized on a booting CPU. No point in reading it back 20 instructions later. This allows to confine the static key and the pinning variable to cpu/common and allows to mark them static. Fixes: 8dbec27a242c ("x86/asm: Pin sensitive CR0 bits") Fixes: 873d50d58f67 ("x86/asm: Pin sensitive CR4 bits") Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Xi Ruoyao <xry111@mengyan1223.wang> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Xi Ruoyao <xry111@mengyan1223.wang> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1907102140340.1758@nanos.tec.linutronix.de
2019-07-08Merge branch 'x86-topology-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 topology updates from Ingo Molnar: "Implement multi-die topology support on Intel CPUs and expose the die topology to user-space tooling, by Len Brown, Kan Liang and Zhang Rui. These changes should have no effect on the kernel's existing understanding of topologies, i.e. there should be no behavioral impact on cache, NUMA, scheduler, perf and other topologies and overall system performance" * 'x86-topology-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel/rapl: Cosmetic rename internal variables in response to multi-die/pkg support perf/x86/intel/uncore: Cosmetic renames in response to multi-die/pkg support hwmon/coretemp: Cosmetic: Rename internal variables to zones from packages thermal/x86_pkg_temp_thermal: Cosmetic: Rename internal variables to zones from packages perf/x86/intel/cstate: Support multi-die/package perf/x86/intel/rapl: Support multi-die/package perf/x86/intel/uncore: Support multi-die/package topology: Create core_cpus and die_cpus sysfs attributes topology: Create package_cpus sysfs attribute hwmon/coretemp: Support multi-die/package powercap/intel_rapl: Update RAPL domain name and debug messages thermal/x86_pkg_temp_thermal: Support multi-die/package powercap/intel_rapl: Support multi-die/package powercap/intel_rapl: Simplify rapl_find_package() x86/topology: Define topology_logical_die_id() x86/topology: Define topology_die_id() cpu/topology: Export die_id x86/topology: Create topology_max_die_per_package() x86/topology: Add CPUID.1F multi-die/package support
2019-06-22x86/asm: Pin sensitive CR4 bitsKees Cook
Several recent exploits have used direct calls to the native_write_cr4() function to disable SMEP and SMAP before then continuing their exploits using userspace memory access. Direct calls of this form can be mitigate by pinning bits of CR4 so that they cannot be changed through a common function. This is not intended to be a general ROP protection (which would require CFI to defend against properly), but rather a way to avoid trivial direct function calling (or CFI bypasses via a matching function prototype) as seen in: https://googleprojectzero.blogspot.com/2017/05/exploiting-linux-kernel-via-packet.html (https://github.com/xairy/kernel-exploits/tree/master/CVE-2017-7308) The goals of this change: - Pin specific bits (SMEP, SMAP, and UMIP) when writing CR4. - Avoid setting the bits too early (they must become pinned only after CPU feature detection and selection has finished). - Pinning mask needs to be read-only during normal runtime. - Pinning needs to be checked after write to validate the cr4 state Using __ro_after_init on the mask is done so it can't be first disabled with a malicious write. Since these bits are global state (once established by the boot CPU and kernel boot parameters), they are safe to write to secondary CPUs before those CPUs have finished feature detection. As such, the bits are set at the first cr4 write, so that cr4 write bugs can be detected (instead of silently papered over). This uses a few bytes less storage of a location we don't have: read-only per-CPU data. A check is performed after the register write because an attack could just skip directly to the register write. Such a direct jump is possible because of how this function may be built by the compiler (especially due to the removal of frame pointers) where it doesn't add a stack frame (function exit may only be a retq without pops) which is sufficient for trivial exploitation like in the timer overwrites mentioned above). The asm argument constraints gain the "+" modifier to convince the compiler that it shouldn't make ordering assumptions about the arguments or memory, and treat them as changed. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: kernel-hardening@lists.openwall.com Link: https://lkml.kernel.org/r/20190618045503.39105-3-keescook@chromium.org
2019-05-24treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 82Thomas Gleixner
Based on 1 normalized pattern(s): this code is released under the gnu general public license version 2 or later extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 3 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Richard Fontana <rfontana@redhat.com> Reviewed-by: Armijn Hemel <armijn@tjaldur.nl> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190520075211.232210963@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-23topology: Create core_cpus and die_cpus sysfs attributesLen Brown
Create CPU topology sysfs attributes: "core_cpus" and "core_cpus_list" These attributes represent all of the logical CPUs that share the same core. These attriutes is synonymous with the existing "thread_siblings" and "thread_siblings_list" attribute, which will be deprecated. Create CPU topology sysfs attributes: "die_cpus" and "die_cpus_list". These attributes represent all of the logical CPUs that share the same die. Suggested-by: Brice Goglin <Brice.Goglin@inria.fr> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/071c23a298cd27ede6ed0b6460cae190d193364f.1557769318.git.len.brown@intel.com
2019-05-23x86/topology: Define topology_logical_die_id()Len Brown
Define topology_logical_die_id() ala existing topology_logical_package_id() Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Zhang Rui <rui.zhang@intel.com> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/2f3526e25ae14fbeff26fb26e877d159df8946d9.1557769318.git.len.brown@intel.com
2019-05-23x86/topology: Add CPUID.1F multi-die/package supportLen Brown
Some new systems have multiple software-visible die within each package. Update Linux parsing of the Intel CPUID "Extended Topology Leaf" to handle either CPUID.B, or the new CPUID.1F. Add cpuinfo_x86.die_id and cpuinfo_x86.max_dies to store the result. die_id will be non-zero only for multi-die/package systems. Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: linux-doc@vger.kernel.org Link: https://lkml.kernel.org/r/7b23d2d26d717b8e14ba137c94b70943f1ae4b5c.1557769318.git.len.brown@intel.com
2019-05-06Merge branch 'x86-topology-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 topology updates from Ingo Molnar: "Two main changes: preparatory changes for Intel multi-die topology support, plus a syslog message tweak" * 'x86-topology-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/topology: Make DEBUG_HOTPLUG_CPU0 pr_info() more descriptive x86/smpboot: Rename match_die() to match_pkg() topology: Simplify cputopology.txt formatting and wording x86/topology: Fix documentation typo
2019-04-19x86/smpboot: Rename match_die() to match_pkg()Len Brown
Syntax only, no functional or semantic change. This routine matches packages, not die, so name it thus. Signed-off-by: Len Brown <len.brown@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Link: http://lkml.kernel.org/r/7ca18c4ae7816a1f9eda37414725df676e63589d.1551160674.git.len.brown@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-17x86/irq/32: Handle irq stack allocation failure properThomas Gleixner
irq_ctx_init() crashes hard on page allocation failures. While that's ok during early boot, it's just wrong in the CPU hotplug bringup code. Check the page allocation failure and return -ENOMEM and handle it at the call sites. On early boot the only way out is to BUG(), but on CPU hotplug there is no reason to crash, so just abort the operation. Rename the function to something more sensible while at it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Alison Schofield <alison.schofield@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Nicolai Stange <nstange@suse.de> Cc: Pu Wen <puwen@hygon.cn> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Shaokun Zhang <zhangshaokun@hisilicon.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: x86-ml <x86@kernel.org> Cc: xen-devel@lists.xenproject.org Cc: Yazen Ghannam <yazen.ghannam@amd.com> Cc: Yi Wang <wang.yi59@zte.com.cn> Cc: Zhenzhong Duan <zhenzhong.duan@oracle.com> Link: https://lkml.kernel.org/r/20190414160146.089060584@linutronix.de
2019-03-07Merge branch 'x86-cleanups-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: "Various cleanups and simplifications, none of them really stands out, they are all over the place" * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/uaccess: Remove unused __addr_ok() macro x86/smpboot: Remove unused phys_id variable x86/mm/dump_pagetables: Remove the unused prev_pud variable x86/fpu: Move init_xstate_size() to __init section x86/cpu_entry_area: Move percpu_setup_debug_store() to __init section x86/mtrr: Remove unused variable x86/boot/compressed/64: Explain paging_prepare()'s return value x86/resctrl: Remove duplicate MSR_MISC_FEATURE_CONTROL definition x86/asm/suspend: Drop ENTRY from local data x86/hw_breakpoints, kprobes: Remove kprobes ifdeffery x86/boot: Save several bytes in decompressor x86/trap: Remove useless declaration x86/mm/tlb: Remove unused cpu variable x86/events: Mark expected switch-case fall-throughs x86/asm-prototypes: Remove duplicate include <asm/page.h> x86/kernel: Mark expected switch-case fall-throughs x86/insn-eval: Mark expected switch-case fall-through x86/platform/UV: Replace kmalloc() and memset() with k[cz]alloc() calls x86/e820: Replace kmalloc() + memcpy() with kmemdup()
2019-03-05mm: replace all open encodings for NUMA_NO_NODEAnshuman Khandual
Patch series "Replace all open encodings for NUMA_NO_NODE", v3. All these places for replacement were found by running the following grep patterns on the entire kernel code. Please let me know if this might have missed some instances. This might also have replaced some false positives. I will appreciate suggestions, inputs and review. 1. git grep "nid == -1" 2. git grep "node == -1" 3. git grep "nid = -1" 4. git grep "node = -1" This patch (of 2): At present there are multiple places where invalid node number is encoded as -1. Even though implicitly understood it is always better to have macros in there. Replace these open encodings for an invalid node number with the global macro NUMA_NO_NODE. This helps remove NUMA related assumptions like 'invalid node' from various places redirecting them to a common definition. Link: http://lkml.kernel.org/r/1545127933-10711-2-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> [ixgbe] Acked-by: Jens Axboe <axboe@kernel.dk> [mtip32xx] Acked-by: Vinod Koul <vkoul@kernel.org> [dmaengine.c] Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Acked-by: Doug Ledford <dledford@redhat.com> [drivers/infiniband] Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Hans Verkuil <hverkuil@xs4all.nl> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-18x86/smpboot: Remove unused phys_id variableShaokun Zhang
The 'phys_id' local variable became unused after commit ce4b1b16502b ("x86/smpboot: Initialize secondary CPU only if master CPU will wait for it"). Remove it. Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Alison Schofield <alison.schofield@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pu Wen <puwen@hygon.cn> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Cc: Yazen Ghannam <yazen.ghannam@amd.com> Cc: Zhenzhong Duan <zhenzhong.duan@oracle.com> Link: https://lkml.kernel.org/r/1550495101-41755-1-git-send-email-zhangshaokun@hisilicon.com
2018-12-18x86/topology: Use total_cpus for max logical packages calculationHui Wang
nr_cpu_ids can be limited on the command line via nr_cpus=. This can break the logical package management because it results in a smaller number of packages while in kdump kernel. Check below case: There is a two sockets system, each socket has 8 cores, which has 16 logical cpus while HT was turn on. 0 1 2 3 4 5 6 7 | 16 17 18 19 20 21 22 23 cores on socket 0 threads on socket 0 8 9 10 11 12 13 14 15 | 24 25 26 27 28 29 30 31 cores on socket 1 threads on socket 1 While starting the kdump kernel with command line option nr_cpus=16 panic was triggered on one of the cpus 24-31 eg. 26, then online cpu will be 1-15, 26(cpu 0 was disabled in kdump), ncpus will be 16 and __max_logical_packages will be 1, but actually two packages were booted on. This issue can reproduced by set kdump option nr_cpus=<real physical core numbers>, and then trigger panic on last socket's thread, for example: taskset -c 26 echo c > /proc/sysrq-trigger Use total_cpus which will not be limited by nr_cpus command line to calculate the value of __max_logical_packages. Signed-off-by: Hui Wang <john.wanghui@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: <guijianfeng@huawei.com> Cc: <wencongyang2@huawei.com> Cc: <douliyang1@huawei.com> Cc: <qiaonuohan@huawei.com> Link: https://lkml.kernel.org/r/20181107023643.22174-1-john.wanghui@huawei.com
2018-10-31mm: remove include/linux/bootmem.hMike Rapoport
Move remaining definitions and declarations from include/linux/bootmem.h into include/linux/memblock.h and remove the redundant header. The includes were replaced with the semantic patch below and then semi-automated removal of duplicated '#include <linux/memblock.h> @@ @@ - #include <linux/bootmem.h> + #include <linux/memblock.h> [sfr@canb.auug.org.au: dma-direct: fix up for the removal of linux/bootmem.h] Link: http://lkml.kernel.org/r/20181002185342.133d1680@canb.auug.org.au [sfr@canb.auug.org.au: powerpc: fix up for removal of linux/bootmem.h] Link: http://lkml.kernel.org/r/20181005161406.73ef8727@canb.auug.org.au [sfr@canb.auug.org.au: x86/kaslr, ACPI/NUMA: fix for linux/bootmem.h removal] Link: http://lkml.kernel.org/r/20181008190341.5e396491@canb.auug.org.au Link: http://lkml.kernel.org/r/1536927045-23536-30-git-send-email-rppt@linux.vnet.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Zankel <chris@zankel.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Ingo Molnar <mingo@redhat.com> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Jonas Bonn <jonas@southpole.se> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Ley Foon Tan <lftan@altera.com> Cc: Mark Salter <msalter@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Paul Burton <paul.burton@mips.com> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Serge Semin <fancer.lancer@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-09-27x86/smpboot: Do not use BSP INIT delay and MWAIT to idle on DhyanaPu Wen
The Hygon Dhyana CPU uses no delay in smp_quirk_init_udelay(), and does HLT on idle just like AMD does. Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: bp@alien8.de Cc: tglx@linutronix.de Cc: mingo@redhat.com Cc: hpa@zytor.com Cc: x86@kernel.org Cc: thomas.lendacky@amd.com Link: https://lkml.kernel.org/r/87000fa82e273f5967c908448414228faf61e077.1537533369.git.puwen@hygon.cn
2018-08-05Merge 4.18-rc7 into master to pick up the KVM dependcyThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-08-05x86: Don't include linux/irq.h from asm/hardirq.hNicolai Stange
The next patch in this series will have to make the definition of irq_cpustat_t available to entering_irq(). Inclusion of asm/hardirq.h into asm/apic.h would cause circular header dependencies like asm/smp.h asm/apic.h asm/hardirq.h linux/irq.h linux/topology.h linux/smp.h asm/smp.h or linux/gfp.h linux/mmzone.h asm/mmzone.h asm/mmzone_64.h asm/smp.h asm/apic.h asm/hardirq.h linux/irq.h linux/irqdesc.h linux/kobject.h linux/sysfs.h linux/kernfs.h linux/idr.h linux/gfp.h and others. This causes compilation errors because of the header guards becoming effective in the second inclusion: symbols/macros that had been defined before wouldn't be available to intermediate headers in the #include chain anymore. A possible workaround would be to move the definition of irq_cpustat_t into its own header and include that from both, asm/hardirq.h and asm/apic.h. However, this wouldn't solve the real problem, namely asm/harirq.h unnecessarily pulling in all the linux/irq.h cruft: nothing in asm/hardirq.h itself requires it. Also, note that there are some other archs, like e.g. arm64, which don't have that #include in their asm/hardirq.h. Remove the linux/irq.h #include from x86' asm/hardirq.h. Fix resulting compilation errors by adding appropriate #includes to *.c files as needed. Note that some of these *.c files could be cleaned up a bit wrt. to their set of #includes, but that should better be done from separate patches, if at all. Signed-off-by: Nicolai Stange <nstange@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-07-03x86/mm/32: Initialize the CR4 shadow before __flush_tlb_all()Zhenzhong Duan
On 32-bit kernels, __flush_tlb_all() may have read the CR4 shadow before the initialization of CR4 shadow in cpu_init(). Fix it by adding an explicit cr4_init_shadow() call into start_secondary() which is the first function called on non-boot SMP CPUs - ahead of the __flush_tlb_all() call. ( This is somewhat of a layering violation, but start_secondary() does CR4 bootstrap in the PCID case anyway. ) Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: http://lkml.kernel.org/r/b07b6ae9-4b57-4b40-b9bc-50c2c67f1d91@default Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21x86/topology: Provide topology_smt_supported()Thomas Gleixner
Provide information whether SMT is supoorted by the CPUs. Preparatory patch for SMT control mechanism. Suggested-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@kernel.org>
2018-06-21x86/smp: Provide topology_is_primary_thread()Thomas Gleixner
If the CPU is supporting SMT then the primary thread can be found by checking the lower APIC ID bits for zero. smp_num_siblings is used to build the mask for the APIC ID bits which need to be taken into account. This uses the MPTABLE or ACPI/MADT supplied APIC ID, which can be different than the initial APIC ID in CPUID. But according to AMD the lower bits have to be consistent. Intel gave a tentative confirmation as well. Preparatory patch to support disabling SMT at boot/runtime. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Ingo Molnar <mingo@kernel.org>
2018-06-04Merge branch 'x86-boot-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 boot updates from Ingo Molnar: - Centaur CPU updates (David Wang) - AMD and other CPU topology enumeration improvements and fixes (Borislav Petkov, Thomas Gleixner, Suravee Suthikulpanit) - Continued 5-level paging work (Kirill A. Shutemov) * 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Mark __pgtable_l5_enabled __initdata x86/mm: Mark p4d_offset() __always_inline x86/mm: Introduce the 'no5lvl' kernel parameter x86/mm: Stop pretending pgtable_l5_enabled is a variable x86/mm: Unify pgtable_l5_enabled usage in early boot code x86/boot/compressed/64: Fix trampoline page table address calculation x86/CPU: Move x86_cpuinfo::x86_max_cores assignment to detect_num_cpu_cores() x86/Centaur: Report correct CPU/cache topology x86/CPU: Move cpu_detect_cache_sizes() into init_intel_cacheinfo() x86/CPU: Make intel_num_cpu_cores() generic x86/CPU: Move cpu local function declarations to local header x86/CPU/AMD: Derive CPU topology from CPUID function 0xB when available x86/CPU: Modify detect_extended_topology() to return result x86/CPU/AMD: Calculate last level cache ID from number of sharing threads x86/CPU: Rename intel_cacheinfo.c to cacheinfo.c perf/events/amd/uncore: Fix amd_uncore_llc ID to use pre-defined cpu_llc_id x86/CPU/AMD: Have smp_num_siblings and cpu_llc_id always be present x86/Centaur: Initialize supported CPU features properly
2018-05-19Merge branches 'x86/urgent' and 'core/urgent' into x86/boot, to pick up ↵Ingo Molnar
fixes and avoid conflicts Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-17x86/speculation: Handle HT correctly on AMDThomas Gleixner
The AMD64_LS_CFG MSR is a per core MSR on Family 17H CPUs. That means when hyperthreading is enabled the SSBD bit toggle needs to take both cores into account. Otherwise the following situation can happen: CPU0 CPU1 disable SSB disable SSB enable SSB <- Enables it for the Core, i.e. for CPU0 as well So after the SSB enable on CPU1 the task on CPU0 runs with SSB enabled again. On Intel the SSBD control is per core as well, but the synchronization logic is implemented behind the per thread SPEC_CTRL MSR. It works like this: CORE_SPEC_CTRL = THREAD0_SPEC_CTRL | THREAD1_SPEC_CTRL i.e. if one of the threads enables a mitigation then this affects both and the mitigation is only disabled in the core when both threads disabled it. Add the necessary synchronization logic for AMD family 17H. Unfortunately that requires a spinlock to serialize the access to the MSR, but the locks are only shared between siblings. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2018-05-06x86/CPU/AMD: Have smp_num_siblings and cpu_llc_id always be presentBorislav Petkov
Move smp_num_siblings and cpu_llc_id to cpu/common.c so that they're always present as symbols and not only in the CONFIG_SMP case. Then, other code using them doesn't need ugly ifdeffery anymore. Get rid of some ifdeffery. Signed-off-by: Borislav Petkov <bpetkov@suse.de> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1524864877-111962-2-git-send-email-suravee.suthikulpanit@amd.com
2018-04-26x86/smpboot: Don't use mwait_play_dead() on AMD systemsYazen Ghannam
Recent AMD systems support using MWAIT for C1 state. However, MWAIT will not allow deeper cstates than C1 on current systems. play_dead() expects to use the deepest state available. The deepest state available on AMD systems is reached through SystemIO or HALT. If MWAIT is available, it is preferred over the other methods, so the CPU never reaches the deepest possible state. Don't try to use MWAIT to play_dead() on AMD systems. Instead, use CPUIDLE to enter the deepest state advertised by firmware. If CPUIDLE is not available then fallback to HALT. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org Cc: Yazen Ghannam <Yazen.Ghannam@amd.com> Link: https://lkml.kernel.org/r/20180403140228.58540-1-Yazen.Ghannam@amd.com
2018-04-17x86,sched: Allow topologies where NUMA nodes share an LLCAlison Schofield
Intel's Skylake Server CPUs have a different LLC topology than previous generations. When in Sub-NUMA-Clustering (SNC) mode, the package is divided into two "slices", each containing half the cores, half the LLC, and one memory controller and each slice is enumerated to Linux as a NUMA node. This is similar to how the cores and LLC were arranged for the Cluster-On-Die (CoD) feature. CoD allowed the same cache line to be present in each half of the LLC. But, with SNC, each line is only ever present in *one* slice. This means that the portion of the LLC *available* to a CPU depends on the data being accessed: Remote socket: entire package LLC is shared Local socket->local slice: data goes into local slice LLC Local socket->remote slice: data goes into remote-slice LLC. Slightly higher latency than local slice LLC. The biggest implication from this is that a process accessing all NUMA-local memory only sees half the LLC capacity. The CPU describes its cache hierarchy with the CPUID instruction. One of the CPUID leaves enumerates the "logical processors sharing this cache". This information is used for scheduling decisions so that tasks move more freely between CPUs sharing the cache. But, the CPUID for the SNC configuration discussed above enumerates the LLC as being shared by the entire package. This is not 100% precise because the entire cache is not usable by all accesses. But, it *is* the way the hardware enumerates itself, and this is not likely to change. The userspace visible impact of all the above is that the sysfs info reports the entire LLC as being available to the entire package. As noted above, this is not true for local socket accesses. This patch does not correct the sysfs info. It is the same, pre and post patch. The current code emits the following warning: sched: CPU #3's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency. The warning is coming from the topology_sane() check in smpboot.c because the topology is not matching the expectations of the model for obvious reasons. To fix this, add a vendor and model specific check to never call topology_sane() for these systems. Also, just like "Cluster-on-Die" disable the "coregroup" sched_domain_topology_level and use NUMA information from the SRAT alone. This is OK at least on the hardware we are immediately concerned about because the LLC sharing happens at both the slice and at the package level, which are also NUMA boundaries. Signed-off-by: Alison Schofield <alison.schofield@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: brice.goglin@gmail.com Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: David Rientjes <rientjes@google.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: "H. Peter Anvin" <hpa@linux.intel.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Link: https://lkml.kernel.org/r/20180407002130.GA18984@alison-desk.jf.intel.com
2018-02-23x86/topology: Update the 'cpu cores' field in /proc/cpuinfo correctly across ↵Samuel Neves
CPU hotplug operations Without this fix, /proc/cpuinfo will display an incorrect amount of CPU cores, after bringing them offline and online again, as exemplified below: $ cat /proc/cpuinfo | grep cores cpu cores : 4 cpu cores : 8 cpu cores : 8 cpu cores : 20 cpu cores : 4 cpu cores : 3 cpu cores : 2 cpu cores : 2 This patch fixes this by always zeroing the booted_cores variable upon turning off a logical CPU. Tested-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Signed-off-by: Samuel Neves <sneves@dei.uc.pt> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: jgross@suse.com Cc: luto@kernel.org Cc: prarit@redhat.com Cc: vkuznets@redhat.com Link: http://lkml.kernel.org/r/20180221205036.5244-1-sneves@dei.uc.pt Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-17x86/xen: Calculate __max_logical_packages on PV domainsPrarit Bhargava
The kernel panics on PV domains because native_smp_cpus_done() is only called for HVM domains. Calculate __max_logical_packages for PV domains. Fixes: b4c0a7326f5d ("x86/smpboot: Fix __max_logical_packages estimate") Signed-off-by: Prarit Bhargava <prarit@redhat.com> Tested-and-reported-by: Simon Gaiser <simon@invisiblethingslab.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Juergen Gross <jgross@suse.com> Cc: Dou Liyang <douly.fnst@cn.fujitsu.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: xen-devel@lists.xenproject.org Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2018-02-13x86/smpboot: Fix uncore_pci_remove() indexing bug when hot-removing a ↵Masayoshi Mizuma
physical CPU When a physical CPU is hot-removed, the following warning messages are shown while the uncore device is removed in uncore_pci_remove(): WARNING: CPU: 120 PID: 5 at arch/x86/events/intel/uncore.c:988 uncore_pci_remove+0xf1/0x110 ... CPU: 120 PID: 5 Comm: kworker/u1024:0 Not tainted 4.15.0-rc8 #1 Workqueue: kacpi_hotplug acpi_hotplug_work_fn ... Call Trace: pci_device_remove+0x36/0xb0 device_release_driver_internal+0x145/0x210 pci_stop_bus_device+0x76/0xa0 pci_stop_root_bus+0x44/0x60 acpi_pci_root_remove+0x1f/0x80 acpi_bus_trim+0x54/0x90 acpi_bus_trim+0x2e/0x90 acpi_device_hotplug+0x2bc/0x4b0 acpi_hotplug_work_fn+0x1a/0x30 process_one_work+0x141/0x340 worker_thread+0x47/0x3e0 kthread+0xf5/0x130 When uncore_pci_remove() runs, it tries to get the package ID to clear the value of uncore_extra_pci_dev[].dev[] by using topology_phys_to_logical_pkg(). The warning messesages are shown because topology_phys_to_logical_pkg() returns -1. arch/x86/events/intel/uncore.c: static void uncore_pci_remove(struct pci_dev *pdev) { ... phys_id = uncore_pcibus_to_physid(pdev->bus); ... pkg = topology_phys_to_logical_pkg(phys_id); // returns -1 for (i = 0; i < UNCORE_EXTRA_PCI_DEV_MAX; i++) { if (uncore_extra_pci_dev[pkg].dev[i] == pdev) { uncore_extra_pci_dev[pkg].dev[i] = NULL; break; } } WARN_ON_ONCE(i >= UNCORE_EXTRA_PCI_DEV_MAX); // <=========== HERE!! topology_phys_to_logical_pkg() tries to find cpuinfo_x86->phys_proc_id that matches the phys_pkg argument. arch/x86/kernel/smpboot.c: int topology_phys_to_logical_pkg(unsigned int phys_pkg) { int cpu; for_each_possible_cpu(cpu) { struct cpuinfo_x86 *c = &cpu_data(cpu); if (c->initialized && c->phys_proc_id == phys_pkg) return c->logical_proc_id; } return -1; } However, the phys_proc_id was already set to 0 by remove_siblinginfo() when the CPU was offlined. So, topology_phys_to_logical_pkg() cannot find the correct logical_proc_id and always returns -1. As the result, uncore_pci_remove() calls WARN_ON_ONCE() and the warning messages are shown. What is worse is that the bogus 'pkg' index results in two bugs: - We dereference uncore_extra_pci_dev[] with a negative index - We fail to clean up a stale pointer in uncore_extra_pci_dev[][] To fix these bugs, remove the clearing of ->phys_proc_id from remove_siblinginfo(). This should not cause any problems, because ->phys_proc_id is not used after it is hot-removed and it is re-set while hot-adding. Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: yasu.isimatu@gmail.com Cc: <stable@vger.kernel.org> Fixes: 30bb9811856f ("x86/topology: Avoid wasting 128k for package id array") Link: http://lkml.kernel.org/r/ed738d54-0f01-b38b-b794-c31dc118c207@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-30Merge branch 'x86-cleanups-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: "Misc cleanups" * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Remove unused IOMMU_STRESS Kconfig x86/extable: Mark exception handler functions visible x86/timer: Don't inline __const_udelay x86/headers: Remove duplicate #includes
2018-01-14x86/platform: Control warm reset setup via legacy feature flagJan Kiszka
Allow to turn off the setup of BIOS-managed warm reset via a new flag in x86_legacy_features. Besides the UV1, the upcoming jailhose guest support needs this switched off. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: jailhouse-dev@googlegroups.com Link: https://lkml.kernel.org/r/44376558129d70a2c1527959811371ef4b82e829.1511770314.git.jan.kiszka@siemens.com
2017-12-31Merge branch 'x86-pti-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 page table isolation fixes from Thomas Gleixner: "Four patches addressing the PTI fallout as discussed and debugged yesterday: - Remove stale and pointless TLB flush invocations from the hotplug code - Remove stale preempt_disable/enable from __native_flush_tlb() - Plug the memory leak in the write_ldt() error path" * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ldt: Make LDT pgtable free conditional x86/ldt: Plug memory leak in error path x86/mm: Remove preempt_disable/enable() from __native_flush_tlb() x86/smpboot: Remove stale TLB flush invocations
2017-12-31x86/smpboot: Remove stale TLB flush invocationsThomas Gleixner
smpboot_setup_warm_reset_vector() and smpboot_restore_warm_reset_vector() invoke local_flush_tlb() for no obvious reason. Digging in history revealed that the original code in the 2.1 era added those because the code manipulated a swapper_pg_dir pagetable entry. The pagetable manipulation was removed long ago in the 2.3 timeframe, but the TLB flush invocations stayed around forever. Remove them along with the pointless pr_debug()s which come from the same 2.1 change. Reported-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Linus Torvalds <torvalds@linuxfoundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20171230211829.586548655@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-23Merge branch 'x86-pti-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 PTI preparatory patches from Thomas Gleixner: "Todays Advent calendar window contains twentyfour easy to digest patches. The original plan was to have twenty three matching the date, but a late fixup made that moot. - Move the cpu_entry_area mapping out of the fixmap into a separate address space. That's necessary because the fixmap becomes too big with NRCPUS=8192 and this caused already subtle and hard to diagnose failures. The top most patch is fresh from today and cures a brain slip of that tall grumpy german greybeard, who ignored the intricacies of 32bit wraparounds. - Limit the number of CPUs on 32bit to 64. That's insane big already, but at least it's small enough to prevent address space issues with the cpu_entry_area map, which have been observed and debugged with the fixmap code - A few TLB flush fixes in various places plus documentation which of the TLB functions should be used for what. - Rename the SYSENTER stack to CPU_ENTRY_AREA stack as it is used for more than sysenter now and keeping the name makes backtraces confusing. - Prevent LDT inheritance on exec() by moving it to arch_dup_mmap(), which is only invoked on fork(). - Make vysycall more robust. - A few fixes and cleanups of the debug_pagetables code. Check PAGE_PRESENT instead of checking the PTE for 0 and a cleanup of the C89 initialization of the address hint array which already was out of sync with the index enums. - Move the ESPFIX init to a different place to prepare for PTI. - Several code moves with no functional change to make PTI integration simpler and header files less convoluted. - Documentation fixes and clarifications" * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) x86/cpu_entry_area: Prevent wraparound in setup_cpu_entry_area_ptes() on 32bit init: Invoke init_espfix_bsp() from mm_init() x86/cpu_entry_area: Move it out of the fixmap x86/cpu_entry_area: Move it to a separate unit x86/mm: Create asm/invpcid.h x86/mm: Put MMU to hardware ASID translation in one place x86/mm: Remove hard-coded ASID limit checks x86/mm: Move the CR3 construction functions to tlbflush.h x86/mm: Add comments to clarify which TLB-flush functions are supposed to flush what x86/mm: Remove superfluous barriers x86/mm: Use __flush_tlb_one() for kernel memory x86/microcode: Dont abuse the TLB-flush interface x86/uv: Use the right TLB-flush API x86/entry: Rename SYSENTER_stack to CPU_ENTRY_AREA_entry_stack x86/doc: Remove obvious weirdnesses from the x86 MM layout documentation x86/mm/64: Improve the memory map documentation x86/ldt: Prevent LDT inheritance on exec x86/ldt: Rework locking arch, mm: Allow arch_dup_mmap() to fail x86/vsyscall/64: Warn and fail vsyscall emulation in NATIVE mode ...
2017-12-22init: Invoke init_espfix_bsp() from mm_init()Thomas Gleixner
init_espfix_bsp() needs to be invoked before the page table isolation initialization. Move it into mm_init() which is the place where pti_init() will be added. While at it get rid of the #ifdeffery and provide proper stub functions. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-17Merge commit 'upstream-x86-entry' into WIP.x86/mmIngo Molnar
Pull in a minimal set of v4.15 entry code changes, for a base for the MM isolation patches. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-12x86/headers: Remove duplicate #includesPravin Shedge
These duplicate includes have been found with scripts/checkincludes.pl but they have been removed manually to avoid removing false positives. Signed-off-by: Pravin Shedge <pravin.shedge4linux@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: ard.biesheuvel@linaro.org Cc: boris.ostrovsky@oracle.com Cc: geert@linux-m68k.org Cc: jgross@suse.com Cc: linux-efi@vger.kernel.org Cc: luto@kernel.org Cc: matt@codeblueprint.co.uk Cc: thomas.lendacky@amd.com Cc: tim.c.chen@linux.intel.com Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1513024951-9221-1-git-send-email-pravin.shedge4linux@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-07x86/smpboot: Do not use smp_num_siblings in __max_logical_packages calculationPrarit Bhargava
Documentation/x86/topology.txt defines smp_num_siblings as "The number of threads in a core". Since commit bbb65d2d365e ("x86: use cpuid vector 0xb when available for detecting cpu topology") smp_num_siblings is the maximum number of threads in a core. If Simultaneous MultiThreading (SMT) is disabled on a system, smp_num_siblings is 2 and not 1 as expected. Use topology_max_smt_threads(), which contains the active numer of threads, in the __max_logical_packages calculation. On a single socket, single core, single thread system __max_smt_threads has not been updated when the __max_logical_packages calculation happens, so its zero which makes the package estimate fail. Initialize it to one, which is the minimum number of threads on a core. [ tglx: Folded the __max_smt_threads fix in ] Fixes: b4c0a7326f5d ("x86/smpboot: Fix __max_logical_packages estimate") Reported-by: Jakub Kicinski <kubakici@wp.pl> Signed-off-by: Prarit Bhargava <prarit@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Jakub Kicinski <kubakici@wp.pl> Cc: netdev@vger.kernel.org Cc: "netdev@vger.kernel.org" Cc: Clark Williams <williams@redhat.com> Link: https://lkml.kernel.org/r/20171204164521.17870-1-prarit@redhat.com
2017-11-28x86/idt: Load idt early in start_secondaryChunyu Hu
On a secondary, idt is first loaded in cpu_init() with load_current_idt(), i.e. no exceptions can be handled before that point. The conversion of WARN() to use UD requires the IDT being loaded earlier as any warning between start_secondary() and load_curren_idt() in cpu_init() will result in an unhandled @UD exception and therefore fail the bringup of the CPU. Install the IDT handlers right in start_secondary() before calling cpu_init(). [ tglx: Massaged changelog ] Fixes: 9a93848fe787 ("x86/debug: Implement __WARN() using UD0") Signed-off-by: Chunyu Hu <chuhu@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Cc: peterz@infradead.org Cc: bp@alien8.de Cc: rostedt@goodmis.org Cc: luto@kernel.org Link: https://lkml.kernel.org/r/1511792499-4073-1-git-send-email-chuhu@redhat.com
2017-11-17x86/smpboot: Fix __max_logical_packages estimatePrarit Bhargava
A system booted with a small number of cores enabled per package panics because the estimate of __max_logical_packages is too low. This occurs when the total number of active cores across all packages is less than the maximum core count for a single package. e.g.: On a 4 package system with 20 cores/package where only 4 cores are enabled on each package, the value of __max_logical_packages is calculated as DIV_ROUND_UP(16 / 20) = 1 and not 4. Calculate __max_logical_packages after the cpu enumeration has completed. Use the boot cpu's data to extrapolate the number of packages. Signed-off-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kan Liang <kan.liang@intel.com> Cc: He Chen <he.chen@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Piotr Luc <piotr.luc@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arvind Yadav <arvind.yadav.cs@gmail.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Mathias Krause <minipli@googlemail.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Link: https://lkml.kernel.org/r/20171114124257.22013-4-prarit@redhat.com
2017-11-17x86/topology: Avoid wasting 128k for package id arrayAndi Kleen
Analyzing large early boot allocations unveiled the logical package id storage as a prominent memory waste. Since commit 1f12e32f4cd5 ("x86/topology: Create logical package id") every 64-bit system allocates a 128k array to convert logical package ids. This happens because the array is sized for MAX_LOCAL_APIC which is always 32k on 64bit systems, and it needs 4 bytes for each entry. This is fairly wasteful, especially for the common case of having only one socket, which uses exactly 4 byte out of 128K. There is no user of the package id map which is performance critical, so the lookup is not required to be O(1). Store the logical processor id in cpu_data and use a loop based lookup. To keep the mapping stable accross cpu hotplug operations, add a flag to cpu_data which is set when the CPU is brought up the first time. When the flag is set, then cpu_data is not reinitialized by copying boot_cpu_data on subsequent bringups. [ tglx: Rename the flag to 'initialized', use proper pointers instead of repeated cpu_data(x) evaluation and massage changelog. ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kan Liang <kan.liang@intel.com> Cc: He Chen <he.chen@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Piotr Luc <piotr.luc@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arvind Yadav <arvind.yadav.cs@gmail.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Mathias Krause <minipli@googlemail.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Link: https://lkml.kernel.org/r/20171114124257.22013-3-prarit@redhat.com
2017-11-13Merge branch 'x86-apic-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 APIC updates from Thomas Gleixner: "This update provides a major overhaul of the APIC initialization and vector allocation code: - Unification of the APIC and interrupt mode setup which was scattered all over the place and was hard to follow. This also distangles the timer setup from the APIC initialization which brings a clear separation of functionality. Great detective work from Dou Lyiang! - Refactoring of the x86 vector allocation mechanism. The existing code was based on nested loops and rather convoluted APIC callbacks which had a horrible worst case behaviour and tried to serve all different use cases in one go. This led to quite odd hacks when supporting the new managed interupt facility for multiqueue devices and made it more or less impossible to deal with the vector space exhaustion which was a major roadblock for server hibernation. Aside of that the code dealing with cpu hotplug and the system vectors was disconnected from the actual vector management and allocation code, which made it hard to follow and maintain. Utilizing the new bitmap matrix allocator core mechanism, the new allocator and management code consolidates the handling of system vectors, legacy vectors, cpu hotplug mechanisms and the actual allocation which needs to be aware of system and legacy vectors and hotplug constraints into a single consistent entity. This has one visible change: The support for multi CPU targets of interrupts, which is only available on a certain subset of CPUs/APIC variants has been removed in favour of single interrupt targets. A proper analysis of the multi CPU target feature revealed that there is no real advantage as the vast majority of interrupts end up on the CPU with the lowest APIC id in the set of target CPUs anyway. That change was agreed on by the relevant folks and allowed to simplify the implementation significantly and to replace rather fragile constructs like the vector cleanup IPI with straight forward and solid code. Furthermore this allowed to cleanly separate the allocation details for legacy, normal and managed interrupts: * Legacy interrupts are not longer wasting 16 vectors unconditionally * Managed interrupts have now a guaranteed vector reservation, but the actual vector assignment happens when the interrupt is requested. It's guaranteed not to fail. * Normal interrupts no longer allocate vectors unconditionally when the interrupt is set up (IO/APIC init or MSI(X) enable). The mechanism has been switched to a best effort reservation mode. The actual allocation happens when the interrupt is requested. Contrary to managed interrupts the request can fail due to vector space exhaustion, but drivers must handle a fail of request_irq() anyway. When the interrupt is freed, the vector is handed back as well. This solves a long standing problem with large unconditional vector allocations for a certain class of enterprise devices which prevented server hibernation due to vector space exhaustion when the unused allocated vectors had to be migrated to CPU0 while unplugging all non boot CPUs. The code has been equipped with trace points and detailed debugfs information to aid analysis of the vector space" * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits) x86/vector/msi: Select CONFIG_GENERIC_IRQ_RESERVATION_MODE PCI/MSI: Set MSI_FLAG_MUST_REACTIVATE in core code genirq: Add config option for reservation mode x86/vector: Use correct per cpu variable in free_moved_vector() x86/apic/vector: Ignore set_affinity call for inactive interrupts x86/apic: Fix spelling mistake: "symmectic" -> "symmetric" x86/apic: Use dead_cpu instead of current CPU when cleaning up ACPI/init: Invoke early ACPI initialization earlier x86/vector: Respect affinity mask in irq descriptor x86/irq: Simplify hotplug vector accounting x86/vector: Switch IOAPIC to global reservation mode x86/vector/msi: Switch to global reservation mode x86/vector: Handle managed interrupts proper x86/io_apic: Reevaluate vector configuration on activate() iommu/amd: Reevaluate vector configuration on activate() iommu/vt-d: Reevaluate vector configuration on activate() x86/apic/msi: Force reactivation of interrupts at startup time x86/vector: Untangle internal state from irq_cfg x86/vector: Compile SMP only code conditionally x86/apic: Remove unused callbacks ...
2017-11-13Merge branch 'x86-boot-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 boot updates from Ingo Molnar: "Three smaller changes: - clang fix - boot message beautification - unnecessary header inclusion removal" * 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot: Disable Clang warnings about GNU extensions x86/boot: Remove unnecessary #include <generated/utsrelease.h> x86/boot: Spell out "boot CPU" for BP
2017-11-13Merge branch 'x86-asm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 core updates from Ingo Molnar: "Note that in this cycle most of the x86 topics interacted at a level that caused them to be merged into tip:x86/asm - but this should be a temporary phenomenon, hopefully we'll back to the usual patterns in the next merge window. The main changes in this cycle were: Hardware enablement: - Add support for the Intel UMIP (User Mode Instruction Prevention) CPU feature. This is a security feature that disables certain instructions such as SGDT, SLDT, SIDT, SMSW and STR. (Ricardo Neri) [ Note that this is disabled by default for now, there are some smaller enhancements in the pipeline that I'll follow up with in the next 1-2 days, which allows this to be enabled by default.] - Add support for the AMD SEV (Secure Encrypted Virtualization) CPU feature, on top of SME (Secure Memory Encryption) support that was added in v4.14. (Tom Lendacky, Brijesh Singh) - Enable new SSE/AVX/AVX512 CPU features: AVX512_VBMI2, GFNI, VAES, VPCLMULQDQ, AVX512_VNNI, AVX512_BITALG. (Gayatri Kammela) Other changes: - A big series of entry code simplifications and enhancements (Andy Lutomirski) - Make the ORC unwinder default on x86 and various objtool enhancements. (Josh Poimboeuf) - 5-level paging enhancements (Kirill A. Shutemov) - Micro-optimize the entry code a bit (Borislav Petkov) - Improve the handling of interdependent CPU features in the early FPU init code (Andi Kleen) - Build system enhancements (Changbin Du, Masahiro Yamada) - ... plus misc enhancements, fixes and cleanups" * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (118 commits) x86/build: Make the boot image generation less verbose selftests/x86: Add tests for the STR and SLDT instructions selftests/x86: Add tests for User-Mode Instruction Prevention x86/traps: Fix up general protection faults caused by UMIP x86/umip: Enable User-Mode Instruction Prevention at runtime x86/umip: Force a page fault when unable to copy emulated result to user x86/umip: Add emulation code for UMIP instructions x86/cpufeature: Add User-Mode Instruction Prevention definitions x86/insn-eval: Add support to resolve 16-bit address encodings x86/insn-eval: Handle 32-bit address encodings in virtual-8086 mode x86/insn-eval: Add wrapper function for 32 and 64-bit addresses x86/insn-eval: Add support to resolve 32-bit address encodings x86/insn-eval: Compute linear address in several utility functions resource: Fix resource_size.cocci warnings X86/KVM: Clear encryption attribute when SEV is active X86/KVM: Decrypt shared per-cpu variables when SEV is active percpu: Introduce DEFINE_PER_CPU_DECRYPTED x86: Add support for changing memory encryption attribute in early boot x86/io: Unroll string I/O when SEV is active x86/boot: Add early boot support when running with SEV active ...
2017-11-13Merge branch 'locking-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core locking updates from Ingo Molnar: "The main changes in this cycle are: - Another attempt at enabling cross-release lockdep dependency tracking (automatically part of CONFIG_PROVE_LOCKING=y), this time with better performance and fewer false positives. (Byungchul Park) - Introduce lockdep_assert_irqs_enabled()/disabled() and convert open-coded equivalents to lockdep variants. (Frederic Weisbecker) - Add down_read_killable() and use it in the VFS's iterate_dir() method. (Kirill Tkhai) - Convert remaining uses of ACCESS_ONCE() to READ_ONCE()/WRITE_ONCE(). Most of the conversion was Coccinelle driven. (Mark Rutland, Paul E. McKenney) - Get rid of lockless_dereference(), by strengthening Alpha atomics, strengthening READ_ONCE() with smp_read_barrier_depends() and thus being able to convert users of lockless_dereference() to READ_ONCE(). (Will Deacon) - Various micro-optimizations: - better PV qspinlocks (Waiman Long), - better x86 barriers (Michael S. Tsirkin) - better x86 refcounts (Kees Cook) - ... plus other fixes and enhancements. (Borislav Petkov, Juergen Gross, Miguel Bernal Marin)" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (70 commits) locking/x86: Use LOCK ADD for smp_mb() instead of MFENCE rcu: Use lockdep to assert IRQs are disabled/enabled netpoll: Use lockdep to assert IRQs are disabled/enabled timers/posix-cpu-timers: Use lockdep to assert IRQs are disabled/enabled sched/clock, sched/cputime: Use lockdep to assert IRQs are disabled/enabled irq_work: Use lockdep to assert IRQs are disabled/enabled irq/timings: Use lockdep to assert IRQs are disabled/enabled perf/core: Use lockdep to assert IRQs are disabled/enabled x86: Use lockdep to assert IRQs are disabled/enabled smp/core: Use lockdep to assert IRQs are disabled/enabled timers/hrtimer: Use lockdep to assert IRQs are disabled/enabled timers/nohz: Use lockdep to assert IRQs are disabled/enabled workqueue: Use lockdep to assert IRQs are disabled/enabled irq/softirqs: Use lockdep to assert IRQs are disabled/enabled locking/lockdep: Add IRQs disabled/enabled assertion APIs: lockdep_assert_irqs_enabled()/disabled() locking/pvqspinlock: Implement hybrid PV queued/unfair locks locking/rwlocks: Fix comments x86/paravirt: Set up the virt_spin_lock_key after static keys get initialized block, locking/lockdep: Assign a lock_class per gendisk used for wait_for_completion() workqueue: Remove now redundant lock acquisitions wrt. workqueue flushes ...