summaryrefslogtreecommitdiffstats
path: root/arch/x86/xen
AgeCommit message (Collapse)Author
2021-01-12xen/pvh: correctly setup the PV EFI interface for dom0Roger Pau Monne
commit 72813bfbf0276a97c82af038efb5f02dcdd9e310 upstream. This involves initializing the boot params EFI related fields and the efi global variable. Without this fix a PVH dom0 doesn't detect when booted from EFI, and thus doesn't support accessing any of the EFI related data. Reported-by: PGNet Dev <pgnet.dev@gmail.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Jinoh Kang <jinoh.kang.kr@gmail.com> Cc: stable@vger.kernel.org # 4.19+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-02x86/xen: don't unbind uninitialized lock_kicker_irqBrian Masney
[ Upstream commit 65cae18882f943215d0505ddc7e70495877308e6 ] When booting a hyperthreaded system with the kernel parameter 'mitigations=auto,nosmt', the following warning occurs: WARNING: CPU: 0 PID: 1 at drivers/xen/events/events_base.c:1112 unbind_from_irqhandler+0x4e/0x60 ... Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006 ... Call Trace: xen_uninit_lock_cpu+0x28/0x62 xen_hvm_cpu_die+0x21/0x30 takedown_cpu+0x9c/0xe0 ? trace_suspend_resume+0x60/0x60 cpuhp_invoke_callback+0x9a/0x530 _cpu_up+0x11a/0x130 cpu_up+0x7e/0xc0 bringup_nonboot_cpus+0x48/0x50 smp_init+0x26/0x79 kernel_init_freeable+0xea/0x229 ? rest_init+0xaa/0xaa kernel_init+0xa/0x106 ret_from_fork+0x35/0x40 The secondary CPUs are not activated with the nosmt mitigations and only the primary thread on each CPU core is used. In this situation, xen_hvm_smp_prepare_cpus(), and more importantly xen_init_lock_cpu(), is not called, so the lock_kicker_irq is not initialized for the secondary CPUs. Let's fix this by exiting early in xen_uninit_lock_cpu() if the irq is not set to avoid the warning from above for each secondary CPU. Signed-off-by: Brian Masney <bmasney@redhat.com> Link: https://lore.kernel.org/r/20201107011119.631442-1-bmasney@redhat.com Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-11-05x86/xen: disable Firmware First mode for correctable memory errorsJuergen Gross
commit d759af38572f97321112a0852353613d18126038 upstream. When running as Xen dom0 the kernel isn't responsible for selecting the error handling mode, this should be handled by the hypervisor. So disable setting FF mode when running as Xen pv guest. Not doing so might result in boot splats like: [ 7.509696] HEST: Enabling Firmware First mode for corrected errors. [ 7.510382] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 2. [ 7.510383] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 3. [ 7.510384] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 4. [ 7.510384] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 5. [ 7.510385] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 6. [ 7.510386] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 7. [ 7.510386] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 8. Reason is that the HEST ACPI table contains the real number of MCA banks, while the hypervisor is emulating only 2 banks for guests. Cc: stable@vger.kernel.org Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/20200925140751.31381-1-jgross@suse.com Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-20x86: Fix early boot crash on gcc-10, third tryBorislav Petkov
commit a9a3ed1eff3601b63aea4fb462d8b3b92c7c1e7e upstream. ... or the odyssey of trying to disable the stack protector for the function which generates the stack canary value. The whole story started with Sergei reporting a boot crash with a kernel built with gcc-10: Kernel panic — not syncing: stack-protector: Kernel stack is corrupted in: start_secondary CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.6.0-rc5—00235—gfffb08b37df9 #139 Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./H77M—D3H, BIOS F12 11/14/2013 Call Trace: dump_stack panic ? start_secondary __stack_chk_fail start_secondary secondary_startup_64 -—-[ end Kernel panic — not syncing: stack—protector: Kernel stack is corrupted in: start_secondary This happens because gcc-10 tail-call optimizes the last function call in start_secondary() - cpu_startup_entry() - and thus emits a stack canary check which fails because the canary value changes after the boot_init_stack_canary() call. To fix that, the initial attempt was to mark the one function which generates the stack canary with: __attribute__((optimize("-fno-stack-protector"))) ... start_secondary(void *unused) however, using the optimize attribute doesn't work cumulatively as the attribute does not add to but rather replaces previously supplied optimization options - roughly all -fxxx options. The key one among them being -fno-omit-frame-pointer and thus leading to not present frame pointer - frame pointer which the kernel needs. The next attempt to prevent compilers from tail-call optimizing the last function call cpu_startup_entry(), shy of carving out start_secondary() into a separate compilation unit and building it with -fno-stack-protector, was to add an empty asm(""). This current solution was short and sweet, and reportedly, is supported by both compilers but we didn't get very far this time: future (LTO?) optimization passes could potentially eliminate this, which leads us to the third attempt: having an actual memory barrier there which the compiler cannot ignore or move around etc. That should hold for a long time, but hey we said that about the other two solutions too so... Reported-by: Sergei Trofimovich <slyfox@gentoo.org> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Kalle Valo <kvalo@codeaurora.org> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200314164451.346497-1-slyfox@gentoo.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-11x86/xen: Distribute switch variables for initializationKees Cook
[ Upstream commit 9038ec99ceb94fb8d93ade5e236b2928f0792c7c ] Variables declared in a switch statement before any case statements cannot be automatically initialized with compiler instrumentation (as they are not part of any execution flow). With GCC's proposed automatic stack variable initialization feature, this triggers a warning (and they don't get initialized). Clang's automatic stack variable initialization (via CONFIG_INIT_STACK_ALL=y) doesn't throw a warning, but it also doesn't initialize such variables[1]. Note that these warnings (or silent skipping) happen before the dead-store elimination optimization phase, so even when the automatic initializations are later elided in favor of direct initializations, the warnings remain. To avoid these problems, move such variables into the "case" where they're used or lift them up into the main function body. arch/x86/xen/enlighten_pv.c: In function ‘xen_write_msr_safe’: arch/x86/xen/enlighten_pv.c:904:12: warning: statement will never be executed [-Wswitch-unreachable] 904 | unsigned which; | ^~~~~ [1] https://bugs.llvm.org/show_bug.cgi?id=44916 Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20200220062318.69299-1-keescook@chromium.org Reviewed-by: Juergen Gross <jgross@suse.com> [boris: made @which an 'unsigned int'] Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-05kprobes/x86/xen: blacklist non-attachable xen interrupt functionsAndrea Righi
[ Upstream commit bf9445a33ae6ac2f0822d2f1ce1365408387d568 ] Blacklist symbols in Xen probe-prohibited areas, so that user can see these prohibited symbols in debugfs. See also: a50480cb6d61. Signed-off-by: Andrea Righi <righi.andrea@gmail.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-06x86/xen: Return from panic notifierBoris Ostrovsky
[ Upstream commit c6875f3aacf2a5a913205accddabf0bfb75cac76 ] Currently execution of panic() continues until Xen's panic notifier (xen_panic_event()) is called at which point we make a hypercall that never returns. This means that any notifier that is supposed to be called later as well as significant part of panic() code (such as pstore writes from kmsg_dump()) is never executed. There is no reason for xen_panic_event() to be this last point in execution since panic()'s emergency_restart() will call into xen_emergency_restart() from where we can perform our hypercall. Nevertheless, we will provide xen_legacy_crash boot option that will preserve original behavior during crash. This option could be used, for example, if running kernel dumper (which happens after panic notifiers) is undesirable. Reported-by: James Dingwall <james@dingwall.me.uk> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-29xen/efi: Set nonblocking callbacksRoss Lagerwall
[ Upstream commit df359f0d09dc029829b66322707a2f558cb720f7 ] Other parts of the kernel expect these nonblocking EFI callbacks to exist and crash when running under Xen. Since the implementations of xen_efi_set_variable() and xen_efi_query_variable_info() do not take any locks, use them for the nonblocking callbacks too. Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-08-06xen/pv: Fix a boot up hang revealed by int3 self testZhenzhong Duan
[ Upstream commit b23e5844dfe78a80ba672793187d3f52e4b528d7 ] Commit 7457c0da024b ("x86/alternatives: Add int3_emulate_call() selftest") is used to ensure there is a gap setup in int3 exception stack which could be used for inserting call return address. This gap is missed in XEN PV int3 exception entry path, then below panic triggered: [ 0.772876] general protection fault: 0000 [#1] SMP NOPTI [ 0.772886] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.2.0+ #11 [ 0.772893] RIP: e030:int3_magic+0x0/0x7 [ 0.772905] RSP: 3507:ffffffff82203e98 EFLAGS: 00000246 [ 0.773334] Call Trace: [ 0.773334] alternative_instructions+0x3d/0x12e [ 0.773334] check_bugs+0x7c9/0x887 [ 0.773334] ? __get_locked_pte+0x178/0x1f0 [ 0.773334] start_kernel+0x4ff/0x535 [ 0.773334] ? set_init_arg+0x55/0x55 [ 0.773334] xen_start_kernel+0x571/0x57a For 64bit PV guests, Xen's ABI enters the kernel with using SYSRET, with %rcx/%r11 on the stack. To convert back to "normal" looking exceptions, the xen thunks do 'xen_*: pop %rcx; pop %r11; jmp *'. E.g. Extracting 'xen_pv_trap xenint3' we have: xen_xenint3: pop %rcx; pop %r11; jmp xenint3 As xenint3 and int3 entry code are same except xenint3 doesn't generate a gap, we can fix it by using int3 and drop useless xenint3. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Juergen Gross <jgross@suse.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-05-22xen/pvh: set xen_domain_type to HVM in xen_pvh_initRoger Pau Monne
commit c9f804d64bb93c8dbf957df1d7e9de11380e522d upstream. Or else xen_domain() returns false despite xen_pvh being set. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: stable@vger.kernel.org # 4.19+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-03-23xen: fix dom0 boot on huge systemsJuergen Gross
commit 01bd2ac2f55a1916d81dace12fa8d7ae1c79b5ea upstream. Commit f7c90c2aa40048 ("x86/xen: don't write ptes directly in 32-bit PV guests") introduced a regression for booting dom0 on huge systems with lots of RAM (in the TB range). Reason is that on those hosts the p2m list needs to be moved early in the boot process and this requires temporary page tables to be created. Said commit modified xen_set_pte_init() to use a hypercall for writing a PTE, but this requires the page table being in the direct mapped area, which is not the case for the temporary page tables used in xen_relocate_p2m(). As the page tables are completely written before being linked to the actual address space instead of set_pte() a plain write to memory can be used in xen_relocate_p2m(). Fixes: f7c90c2aa40048 ("x86/xen: don't write ptes directly in 32-bit PV guests") Cc: stable@vger.kernel.org Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-27always clear the X2APIC_ENABLE bit for PV guestTalons Lee
[ Upstream commit 5268c8f39e0efef81af2aaed160272d9eb507beb ] Commit e657fcc clears cpu capability bit instead of using fake cpuid value, the EXTD should always be off for PV guest without depending on cpuid value. So remove the cpuid check in xen_read_msr_safe() to always clear the X2APIC_ENABLE bit. Signed-off-by: Talons Lee <xin.li@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-22xen: Fix x86 sched_clock() interface for xenJuergen Gross
commit 867cefb4cb1012f42cada1c7d1f35ac8dd276071 upstream. Commit f94c8d11699759 ("sched/clock, x86/tsc: Rework the x86 'unstable' sched_clock() interface") broke Xen guest time handling across migration: [ 187.249951] Freezing user space processes ... (elapsed 0.001 seconds) done. [ 187.251137] OOM killer disabled. [ 187.251137] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. [ 187.252299] suspending xenstore... [ 187.266987] xen:grant_table: Grant tables using version 1 layout [18446743811.706476] OOM killer enabled. [18446743811.706478] Restarting tasks ... done. [18446743811.720505] Setting capacity to 16777216 Fix that by setting xen_sched_clock_offset at resume time to ensure a monotonic clock value. [boris: replaced pr_info() with pr_info_once() in xen_callback_vector() to avoid printing with incorrect timestamp during resume (as we haven't re-adjusted the clock yet)] Fixes: f94c8d11699759 ("sched/clock, x86/tsc: Rework the x86 'unstable' sched_clock() interface") Cc: <stable@vger.kernel.org> # 4.11 Reported-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com> Signed-off-by: Juergen Gross <jgross@suse.com> Tested-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-13x86/mm: Fix guard hole handlingKirill A. Shutemov
[ Upstream commit 16877a5570e0c5f4270d5b17f9bab427bcae9514 ] There is a guard hole at the beginning of the kernel address space, also used by hypervisors. It occupies 16 PGD entries. This reserved range is not defined explicitely, it is calculated relative to other entities: direct mapping and user space ranges. The calculation got broken by recent changes of the kernel memory layout: LDT remap range is now mapped before direct mapping and makes the calculation invalid. The breakage leads to crash on Xen dom0 boot[1]. Define the reserved range explicitely. It's part of kernel ABI (hypervisors expect it to be stable) and must not depend on changes in the rest of kernel memory layout. [1] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03313.html Fixes: d52888aa2753 ("x86/mm: Move LDT remap out of KASLR region on 5-level paging") Reported-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: bp@alien8.de Cc: hpa@zytor.com Cc: dave.hansen@linux.intel.com Cc: luto@kernel.org Cc: peterz@infradead.org Cc: boris.ostrovsky@oracle.com Cc: bhe@redhat.com Cc: linux-mm@kvack.org Cc: xen-devel@lists.xenproject.org Link: https://lkml.kernel.org/r/20181130202328.65359-2-kirill.shutemov@linux.intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-12-17Revert "xen/balloon: Mark unallocated host memory as UNUSABLE"Igor Druzhinin
[ Upstream commit 123664101aa2156d05251704fc63f9bcbf77741a ] This reverts commit b3cf8528bb21febb650a7ecbf080d0647be40b9f. That commit unintentionally broke Xen balloon memory hotplug with "hotplug_unpopulated" set to 1. As long as "System RAM" resource got assigned under a new "Unusable memory" resource in IO/Mem tree any attempt to online this memory would fail due to general kernel restrictions on having "System RAM" resources as 1st level only. The original issue that commit has tried to workaround fa564ad96366 ("x86/PCI: Enable a 64bit BAR on AMD Family 15h (Models 00-1f, 30-3f, 60-7f)") also got amended by the following 03a551734 ("x86/PCI: Move and shrink AMD 64-bit window to avoid conflict") which made the original fix to Xen ballooning unnecessary. Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-11-27x86/mm: Move LDT remap out of KASLR region on 5-level pagingKirill A. Shutemov
commit d52888aa2753e3063a9d3a0c9f72f94aa9809c15 upstream On 5-level paging the LDT remap area is placed in the middle of the KASLR randomization region and it can overlap with the direct mapping, the vmalloc or the vmap area. The LDT mapping is per mm, so it cannot be moved into the P4D page table next to the CPU_ENTRY_AREA without complicating PGD table allocation for 5-level paging. The 4 PGD slot gap just before the direct mapping is reserved for hypervisors, so it cannot be used. Move the direct mapping one slot deeper and use the resulting gap for the LDT remap area. The resulting layout is the same for 4 and 5 level paging. [ tglx: Massaged changelog ] Fixes: f55f0501cbf6 ("x86/pti: Put the LDT in its own PGD if PTI is on") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: bp@alien8.de Cc: hpa@zytor.com Cc: dave.hansen@linux.intel.com Cc: peterz@infradead.org Cc: boris.ostrovsky@oracle.com Cc: jgross@suse.com Cc: bhe@redhat.com Cc: willy@infradead.org Cc: linux-mm@kvack.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181026122856.66224-2-kirill.shutemov@linux.intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-11-13xen: fix xen_qlock_wait()Juergen Gross
commit d3132b3860f6cf35ff7609a76bbcdbb814bd027c upstream. Commit a856531951dc80 ("xen: make xen_qlock_wait() nestable") introduced a regression for Xen guests running fully virtualized (HVM or PVH mode). The Xen hypervisor wouldn't return from the poll hypercall with interrupts disabled in case of an interrupt (for PV guests it does). So instead of disabling interrupts in xen_qlock_wait() use a nesting counter to avoid calling xen_clear_irq_pending() in case xen_qlock_wait() is nested. Fixes: a856531951dc80 ("xen: make xen_qlock_wait() nestable") Cc: stable@vger.kernel.org Reported-by: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Tested-by: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-13xen/pvh: don't try to unplug emulated devicesJuergen Gross
commit e6111161c0a02d58919d776eec94b313bb57911f upstream. A Xen PVH guest has no associated qemu device model, so trying to unplug any emulated devices is making no sense at all. Bail out early from xen_unplug_emulated_devices() when running as PVH guest. This will avoid issuing the boot message: [ 0.000000] Xen Platform PCI: unrecognised magic value Cc: <stable@vger.kernel.org> # 4.11 Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-13xen/pvh: increase early stack sizeRoger Pau Monne
commit 7deecbda3026f5e2a8cc095d7ef7261a920efcf2 upstream. While booting on an AMD EPYC box the stack canary would detect stack overflows when using the current PVH early stack size (256). Switch to using the value defined by BOOT_STACK_SIZE, which prevents the stack overflow. Cc: <stable@vger.kernel.org> # 4.11 Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-13xen: make xen_qlock_wait() nestableJuergen Gross
commit a856531951dc8094359dfdac21d59cee5969c18e upstream. xen_qlock_wait() isn't safe for nested calls due to interrupts. A call of xen_qlock_kick() might be ignored in case a deeper nesting level was active right before the call of xen_poll_irq(): CPU 1: CPU 2: spin_lock(lock1) spin_lock(lock1) -> xen_qlock_wait() -> xen_clear_irq_pending() Interrupt happens spin_unlock(lock1) -> xen_qlock_kick(CPU 2) spin_lock_irqsave(lock2) spin_lock_irqsave(lock2) -> xen_qlock_wait() -> xen_clear_irq_pending() clears kick for lock1 -> xen_poll_irq() spin_unlock_irq_restore(lock2) -> xen_qlock_kick(CPU 2) wakes up spin_unlock_irq_restore(lock2) IRET resumes in xen_qlock_wait() -> xen_poll_irq() never wakes up The solution is to disable interrupts in xen_qlock_wait() and not to poll for the irq in case xen_qlock_wait() is called in nmi context. Cc: stable@vger.kernel.org Cc: Waiman.Long@hp.com Cc: peterz@infradead.org Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-13xen: fix race in xen_qlock_wait()Juergen Gross
commit 2ac2a7d4d9ff4e01e36f9c3d116582f6f655ab47 upstream. In the following situation a vcpu waiting for a lock might not be woken up from xen_poll_irq(): CPU 1: CPU 2: CPU 3: takes a spinlock tries to get lock -> xen_qlock_wait() frees the lock -> xen_qlock_kick(cpu2) -> xen_clear_irq_pending() takes lock again tries to get lock -> *lock = _Q_SLOW_VAL -> *lock == _Q_SLOW_VAL ? -> xen_poll_irq() frees the lock -> xen_qlock_kick(cpu3) And cpu 2 will sleep forever. This can be avoided easily by modifying xen_qlock_wait() to call xen_poll_irq() only if the related irq was not pending and to call xen_clear_irq_pending() only if it was pending. Cc: stable@vger.kernel.org Cc: Waiman.Long@hp.com Cc: peterz@infradead.org Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-13x86/xen: Fix boot loader version reported for PVH guestsJuergen Gross
commit 357d291ce035d1b757568058f3c9898c60d125b1 upstream. The boot loader version reported via sysfs is wrong in case of the kernel being booted via the Xen PVH boot entry. it should be 2.12 (0x020c), but it is reported to be 2.18 (0x0212). As the current way to set the version is error prone use the more readable variant (2 << 8) | 12. Signed-off-by: Juergen Gross <jgross@suse.com> Cc: <stable@vger.kernel.org> # 4.12 Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: boris.ostrovsky@oracle.com Cc: bp@alien8.de Cc: corbet@lwn.net Cc: linux-doc@vger.kernel.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/20181010061456.22238-2-jgross@suse.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-23Merge tag 'for-linus-4.19d-rc5-tag' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Juergen writes: "xen: Two small fixes for xen drivers." * tag 'for-linus-4.19d-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: issue warning message when out of grant maptrack entries xen/x86/vpmu: Zero struct pt_regs before calling into sample handling code
2018-09-20x86/mm: Expand static page table for fixmap spaceFeng Tang
We met a kernel panic when enabling earlycon, which is due to the fixmap address of earlycon is not statically setup. Currently the static fixmap setup in head_64.S only covers 2M virtual address space, while it actually could be in 4M space with different kernel configurations, e.g. when VSYSCALL emulation is disabled. So increase the static space to 4M for now by defining FIXMAP_PMD_NUM to 2, and add a build time check to ensure that the fixmap is covered by the initial static page tables. Fixes: 1ad83c858c7d ("x86_64,vsyscall: Make vsyscall emulation configurable") Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Feng Tang <feng.tang@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: kernel test robot <rong.a.chen@intel.com> Reviewed-by: Juergen Gross <jgross@suse.com> (Xen parts) Cc: H Peter Anvin <hpa@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andy Lutomirsky <luto@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180920025828.23699-1-feng.tang@intel.com
2018-09-19xen/x86/vpmu: Zero struct pt_regs before calling into sample handling codeBoris Ostrovsky
Otherwise we may leak kernel stack for events that sample user registers. Reported-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: stable@vger.kernel.org
2018-08-31Merge tag 'for-linus-4.19b-rc2-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: - minor cleanup avoiding a warning when building with new gcc - a patch to add a new sysfs node for Xen frontend/backend drivers to make it easier to obtain the state of a pv device - two fixes for 32-bit pv-guests to avoid intermediate L1TF vulnerable PTEs * tag 'for-linus-4.19b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/xen: remove redundant variable save_pud xen: export device state to sysfs x86/pae: use 64 bit atomic xchg function in native_ptep_get_and_clear x86/xen: don't write ptes directly in 32-bit PV guests
2018-08-28x86/xen: remove redundant variable save_pudColin Ian King
Variable save_pud is being assigned but is never used hence it is redundant and can be removed. Cleans up clang warning: variable 'save_pud' set but not used [-Wunused-but-set-variable] Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2018-08-27x86/xen: don't write ptes directly in 32-bit PV guestsJuergen Gross
In some cases 32-bit PAE PV guests still write PTEs directly instead of using hypercalls. This is especially bad when clearing a PTE as this is done via 32-bit writes which will produce intermediate L1TF attackable PTEs. Change the code to use hypercalls instead. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2018-08-23Merge branch 'tlb-fixes'Linus Torvalds
Merge fixes for missing TLB shootdowns. This fixes a couple of cases that involved us possibly freeing page table structures before the required TLB shootdown had been done. There are a few cleanup patches to make the code easier to follow, and to avoid some of the more problematic cases entirely when not necessary. To make this easier for backports, it undoes the recent lazy TLB patches, because the cleanups and fixes are more important, and Rik is ok with re-doing them later when things have calmed down. The missing TLB flush was only delayed, and the wrong ordering only happened under memory pressure (and in theory under a couple of other fairly theoretical situations), so this may have been all very unlikely to have hit people in practice. But getting the TLB shootdown wrong is _so_ hard to debug and see that I consider this a crticial fix. Many thanks to Jann Horn for having debugged this. * tlb-fixes: x86/mm: Only use tlb_remove_table() for paravirt mm: mmu_notifier fix for tlb_end_vma mm/tlb, x86/mm: Support invalidating TLB caches for RCU_TABLE_FREE mm/tlb: Remove tlb_remove_table() non-concurrent condition mm: move tlb_table_flush to tlb_flush_mmu_free x86/mm/tlb: Revert the recent lazy TLB patches
2018-08-23x86/mm: Only use tlb_remove_table() for paravirtPeter Zijlstra
If we don't use paravirt; don't play unnecessary and complicated games to free page-tables. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Rik van Riel <riel@surriel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-20x86/xen: enable early use of set_fixmap in 32-bit Xen PV guestJuergen Gross
Commit 7b25b9cb0dad83 ("x86/xen/time: Initialize pv xen time in init_hypervisor_platform()") moved the mapping of the shared info area before pagetable_init(). This breaks booting as 32-bit PV guest as the use of set_fixmap isn't possible at this time on 32-bit. This can be worked around by populating the needed PMD on 32-bit kernel earlier. In order not to reimplement populate_extra_pte() using extend_brk() for allocating new page tables extend alloc_low_pages() to do that in case the early page table pool is not yet available. Fixes: 7b25b9cb0dad83 ("x86/xen/time: Initialize pv xen time in init_hypervisor_platform()") Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2018-08-20x86/xen: remove unused function xen_auto_xlated_memory_setup()Juergen Gross
xen_auto_xlated_memory_setup() is a leftover from PVH V1. Remove it. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2018-08-20x86/Xen: further refine add_preferred_console() invocationsJan Beulich
As the sequence of invocations matters, add "tty" only after "hvc" when a VGA console is available (which is often the case for Dom0, but hardly ever for DomU). Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2018-08-20x86/Xen: mark xen_setup_gdt() __initJan Beulich
Its only caller is __init, so to avoid section mismatch warnings when a compiler decides to not inline the function marke this function so as well. Take the opportunity and also make the function actually use its argument: The sole caller passes in zero anyway. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2018-08-14Merge tag 'for-linus-4.19-rc1-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from Juergen Gross: - add dma-buf functionality to Xen grant table handling - fix for booting the kernel as Xen PVH dom0 - fix for booting the kernel as a Xen PV guest with CONFIG_DEBUG_VIRTUAL enabled - other minor performance and style fixes * tag 'for-linus-4.19-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/balloon: fix balloon initialization for PVH Dom0 xen: don't use privcmd_call() from xen_mc_flush() xen/pv: Call get_cpu_address_sizes to set x86_virt/phys_bits xen/biomerge: Use true and false for boolean values xen/gntdev: don't dereference a null gntdev_dmabuf on allocation failure xen/spinlock: Don't use pvqspinlock if only 1 vCPU xen/gntdev: Implement dma-buf import functionality xen/gntdev: Implement dma-buf export functionality xen/gntdev: Add initial support for dma-buf UAPI xen/gntdev: Make private routines/structures accessible xen/gntdev: Allow mappings for DMA buffers xen/grant-table: Allow allocating buffers suitable for DMA xen/balloon: Share common memory reservation routines xen/grant-table: Make set/clear page private code shared
2018-08-14Merge branch 'l1tf-final' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Merge L1 Terminal Fault fixes from Thomas Gleixner: "L1TF, aka L1 Terminal Fault, is yet another speculative hardware engineering trainwreck. It's a hardware vulnerability which allows unprivileged speculative access to data which is available in the Level 1 Data Cache when the page table entry controlling the virtual address, which is used for the access, has the Present bit cleared or other reserved bits set. If an instruction accesses a virtual address for which the relevant page table entry (PTE) has the Present bit cleared or other reserved bits set, then speculative execution ignores the invalid PTE and loads the referenced data if it is present in the Level 1 Data Cache, as if the page referenced by the address bits in the PTE was still present and accessible. While this is a purely speculative mechanism and the instruction will raise a page fault when it is retired eventually, the pure act of loading the data and making it available to other speculative instructions opens up the opportunity for side channel attacks to unprivileged malicious code, similar to the Meltdown attack. While Meltdown breaks the user space to kernel space protection, L1TF allows to attack any physical memory address in the system and the attack works across all protection domains. It allows an attack of SGX and also works from inside virtual machines because the speculation bypasses the extended page table (EPT) protection mechanism. The assoicated CVEs are: CVE-2018-3615, CVE-2018-3620, CVE-2018-3646 The mitigations provided by this pull request include: - Host side protection by inverting the upper address bits of a non present page table entry so the entry points to uncacheable memory. - Hypervisor protection by flushing L1 Data Cache on VMENTER. - SMT (HyperThreading) control knobs, which allow to 'turn off' SMT by offlining the sibling CPU threads. The knobs are available on the kernel command line and at runtime via sysfs - Control knobs for the hypervisor mitigation, related to L1D flush and SMT control. The knobs are available on the kernel command line and at runtime via sysfs - Extensive documentation about L1TF including various degrees of mitigations. Thanks to all people who have contributed to this in various ways - patches, review, testing, backporting - and the fruitful, sometimes heated, but at the end constructive discussions. There is work in progress to provide other forms of mitigations, which might be less horrible performance wise for a particular kind of workloads, but this is not yet ready for consumption due to their complexity and limitations" * 'l1tf-final' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (75 commits) x86/microcode: Allow late microcode loading with SMT disabled tools headers: Synchronise x86 cpufeatures.h for L1TF additions x86/mm/kmmio: Make the tracer robust against L1TF x86/mm/pat: Make set_memory_np() L1TF safe x86/speculation/l1tf: Make pmd/pud_mknotpresent() invert x86/speculation/l1tf: Invert all not present mappings cpu/hotplug: Fix SMT supported evaluation KVM: VMX: Tell the nested hypervisor to skip L1D flush on vmentry x86/speculation: Use ARCH_CAPABILITIES to skip L1D flush on vmentry x86/speculation: Simplify sysfs report of VMX L1TF vulnerability Documentation/l1tf: Remove Yonah processors from not vulnerable list x86/KVM/VMX: Don't set l1tf_flush_l1d from vmx_handle_external_intr() x86/irq: Let interrupt handlers set kvm_cpu_l1tf_flush_l1d x86: Don't include linux/irq.h from asm/hardirq.h x86/KVM/VMX: Introduce per-host-cpu analogue of l1tf_flush_l1d x86/irq: Demote irq_cpustat_t::__softirq_pending to u16 x86/KVM/VMX: Move the l1tf_flush_l1d test to vmx_l1d_flush() x86/KVM/VMX: Replace 'vmx_l1d_flush_always' with 'vmx_l1d_flush_cond' x86/KVM/VMX: Don't set l1tf_flush_l1d to true from vmx_l1d_flush() cpu/hotplug: detect SMT disabled by BIOS ...
2018-08-07xen: don't use privcmd_call() from xen_mc_flush()Juergen Gross
Using privcmd_call() for a singleton multicall seems to be wrong, as privcmd_call() is using stac()/clac() to enable hypervisor access to Linux user space. Even if currently not a problem (pv domains can't use SMAP while HVM and PVH domains can't use multicalls) things might change when PVH dom0 support is added to the kernel. Reported-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2018-08-06xen/pv: Call get_cpu_address_sizes to set x86_virt/phys_bitsM. Vefa Bicakci
Commit d94a155c59c9 ("x86/cpu: Prevent cpuinfo_x86::x86_phys_bits adjustment corruption") has moved the query and calculation of the x86_virt_bits and x86_phys_bits fields of the cpuinfo_x86 struct from the get_cpu_cap function to a new function named get_cpu_address_sizes. One of the call sites related to Xen PV VMs was unfortunately missed in the aforementioned commit. This prevents successful boot-up of kernel versions 4.17 and up in Xen PV VMs if CONFIG_DEBUG_VIRTUAL is enabled, due to the following code path: enlighten_pv.c::xen_start_kernel mmu_pv.c::xen_reserve_special_pages page.h::__pa physaddr.c::__phys_addr physaddr.h::phys_addr_valid phys_addr_valid uses boot_cpu_data.x86_phys_bits to validate physical addresses. boot_cpu_data.x86_phys_bits is no longer populated before the call to xen_reserve_special_pages due to the aforementioned commit though, so the validation performed by phys_addr_valid fails, which causes __phys_addr to trigger a BUG, preventing boot-up. Signed-off-by: M. Vefa Bicakci <m.v.b@runbox.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Juergen Gross <jgross@suse.com> Cc: xen-devel@lists.xenproject.org Cc: x86@kernel.org Cc: stable@vger.kernel.org # for v4.17 and up Fixes: d94a155c59c9 ("x86/cpu: Prevent cpuinfo_x86::x86_phys_bits adjustment corruption") Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2018-08-05Merge 4.18-rc7 into master to pick up the KVM dependcyThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-08-05x86: Don't include linux/irq.h from asm/hardirq.hNicolai Stange
The next patch in this series will have to make the definition of irq_cpustat_t available to entering_irq(). Inclusion of asm/hardirq.h into asm/apic.h would cause circular header dependencies like asm/smp.h asm/apic.h asm/hardirq.h linux/irq.h linux/topology.h linux/smp.h asm/smp.h or linux/gfp.h linux/mmzone.h asm/mmzone.h asm/mmzone_64.h asm/smp.h asm/apic.h asm/hardirq.h linux/irq.h linux/irqdesc.h linux/kobject.h linux/sysfs.h linux/kernfs.h linux/idr.h linux/gfp.h and others. This causes compilation errors because of the header guards becoming effective in the second inclusion: symbols/macros that had been defined before wouldn't be available to intermediate headers in the #include chain anymore. A possible workaround would be to move the definition of irq_cpustat_t into its own header and include that from both, asm/hardirq.h and asm/apic.h. However, this wouldn't solve the real problem, namely asm/harirq.h unnecessarily pulling in all the linux/irq.h cruft: nothing in asm/hardirq.h itself requires it. Also, note that there are some other archs, like e.g. arm64, which don't have that #include in their asm/hardirq.h. Remove the linux/irq.h #include from x86' asm/hardirq.h. Fix resulting compilation errors by adding appropriate #includes to *.c files as needed. Note that some of these *.c files could be cleaned up a bit wrt. to their set of #includes, but that should better be done from separate patches, if at all. Signed-off-by: Nicolai Stange <nstange@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-07-26xen/spinlock: Don't use pvqspinlock if only 1 vCPUWaiman Long
On a VM with only 1 vCPU, the locking fast paths will always be successful. In this case, there is no need to use the the PV qspinlock code which has higher overhead on the unlock side than the native qspinlock code. The xen_pvspin veriable is also turned off in this 1 vCPU case to eliminate unneeded pvqspinlock initialization in xen_init_lock_cpu() which is run after xen_init_spinlocks(). Signed-off-by: Waiman Long <longman@redhat.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2018-07-20x86/xen/time: Output xen sched_clock time from 0Pavel Tatashin
It is expected for sched_clock() to output data from 0, when system boots. Add an offset xen_sched_clock_offset (similarly how it is done in other hypervisors i.e. kvm_sched_clock_offset) to count sched_clock() from 0, when time is first initialized. Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: steven.sistare@oracle.com Cc: daniel.m.jordan@oracle.com Cc: linux@armlinux.org.uk Cc: schwidefsky@de.ibm.com Cc: heiko.carstens@de.ibm.com Cc: john.stultz@linaro.org Cc: sboyd@codeaurora.org Cc: hpa@zytor.com Cc: douly.fnst@cn.fujitsu.com Cc: peterz@infradead.org Cc: prarit@redhat.com Cc: feng.tang@intel.com Cc: pmladek@suse.com Cc: gnomes@lxorguk.ukuu.org.uk Cc: linux-s390@vger.kernel.org Cc: boris.ostrovsky@oracle.com Cc: jgross@suse.com Cc: pbonzini@redhat.com Link: https://lkml.kernel.org/r/20180719205545.16512-14-pasha.tatashin@oracle.com
2018-07-20x86/xen/time: Initialize pv xen time in init_hypervisor_platform()Pavel Tatashin
In every hypervisor except for xen pv time ops are initialized in init_hypervisor_platform(). Xen PV domains initialize time ops in x86_init.paging.pagetable_init(), by calling xen_setup_shared_info() which is a poor design, as time is needed prior to memory allocator. xen_setup_shared_info() is called from two places: during boot, and after suspend. Split the content of xen_setup_shared_info() into three places: 1. add the clock relavent data into new xen pv init_platform vector, and set clock ops in there. 2. move xen_setup_vcpu_info_placement() to new xen_pv_guest_late_init() call. 3. Re-initializing parts of shared info copy to xen_pv_post_suspend() to be symmetric to xen_pv_pre_suspend Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: steven.sistare@oracle.com Cc: daniel.m.jordan@oracle.com Cc: linux@armlinux.org.uk Cc: schwidefsky@de.ibm.com Cc: heiko.carstens@de.ibm.com Cc: john.stultz@linaro.org Cc: sboyd@codeaurora.org Cc: hpa@zytor.com Cc: douly.fnst@cn.fujitsu.com Cc: peterz@infradead.org Cc: prarit@redhat.com Cc: feng.tang@intel.com Cc: pmladek@suse.com Cc: gnomes@lxorguk.ukuu.org.uk Cc: linux-s390@vger.kernel.org Cc: boris.ostrovsky@oracle.com Cc: jgross@suse.com Cc: pbonzini@redhat.com Link: https://lkml.kernel.org/r/20180719205545.16512-13-pasha.tatashin@oracle.com
2018-07-14Merge tag 'for-linus-4.18-rc5-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: "Two related fixes for a boot failure of Xen PV guests" * tag 'for-linus-4.18-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: setup pv irq ops vector earlier xen: remove global bit from __default_kernel_pte_mask for pv guests
2018-07-13xen: setup pv irq ops vector earlierJuergen Gross
Setting pv_irq_ops for Xen PV domains should be done as early as possible in order to support e.g. very early printk() usage. The same applies to xen_vcpu_info_reset(0), as it is needed for the pv irq ops. Move the call of xen_setup_machphys_mapping() after initializing the pv functions as it contains a WARN_ON(), too. Remove the no longer necessary conditional in xen_init_irq_ops() from PVH V1 times to make clear this is a PV only function. Cc: <stable@vger.kernel.org> # 4.14 Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2018-07-12xen: remove global bit from __default_kernel_pte_mask for pv guestsJuergen Gross
When removing the global bit from __supported_pte_mask do the same for __default_kernel_pte_mask in order to avoid the WARN_ONCE() in check_pgprot() when setting a kernel pte before having called init_mem_mapping(). Cc: <stable@vger.kernel.org> # 4.17 Reported-by: Michael Young <m.a.young@durham.ac.uk> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2018-06-24Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A set of fixes for x86: - Make Xen PV guest deal with speculative store bypass correctly - Address more fallout from the 5-Level pagetable handling. Undo an __initdata annotation to avoid section mismatch and malfunction when post init code would touch the freed variable. - Handle exception fixup in math_error() before calling notify_die(). The reverse call order incorrectly triggers notify_die() listeners for soemthing which is handled correctly at the site which issues the floating point instruction. - Fix an off by one in the LLC topology calculation on AMD - Handle non standard memory block sizes gracefully un UV platforms - Plug a memory leak in the microcode loader - Sanitize the purgatory build magic - Add the x86 specific device tree bindings directory to the x86 MAINTAINER file patterns" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Fix 'no5lvl' handling Revert "x86/mm: Mark __pgtable_l5_enabled __initdata" x86/CPU/AMD: Fix LLC ID bit-shift calculation MAINTAINERS: Add file patterns for x86 device tree bindings x86/microcode/intel: Fix memleak in save_microcode_patch() x86/platform/UV: Add kernel parameter to set memory block size x86/platform/UV: Use new set memory block size function x86/platform/UV: Add adjustable set memory block size function x86/build: Remove unnecessary preparation for purgatory Revert "kexec/purgatory: Add clean-up for purgatory directory" x86/xen: Add call of speculative_store_bypass_ht_init() to PV paths x86: Call fixup_exception() before notify_die() in math_error()
2018-06-23Merge tag 'for-linus-4.18-rc2-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: "This contains the following fixes/cleanups: - the removal of a BUG_ON() which wasn't necessary and which could trigger now due to a recent change - a correction of a long standing bug happening very rarely in Xen dom0 when a hypercall buffer from user land was not accessible by the hypervisor for very short periods of time due to e.g. page migration or compaction - usage of EXPORT_SYMBOL_GPL() instead of EXPORT_SYMBOL() in a Xen-related driver (no breakage possible as using those symbols without others already exported via EXPORT-SYMBOL_GPL() wouldn't make any sense) - a simplification for Xen PVH or Xen ARM guests - some additional error handling for callers of xenbus_printf()" * tag 'for-linus-4.18-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: Remove unnecessary BUG_ON from __unbind_from_irq() xen: add new hypercall buffer mapping device xen/scsiback: add error handling for xenbus_printf scsi: xen-scsifront: add error handling for xenbus_printf xen/grant-table: Export gnttab_{alloc|free}_pages as GPL xen: add error handling for xenbus_printf xen: share start flags between PV and PVH
2018-06-22Merge branch 'linus' into x86/urgentThomas Gleixner
Required to queue a dependent fix.
2018-06-21x86/xen: Add call of speculative_store_bypass_ht_init() to PV pathsJuergen Gross
Commit: 1f50ddb4f418 ("x86/speculation: Handle HT correctly on AMD") ... added speculative_store_bypass_ht_init() to the per-CPU initialization sequence. speculative_store_bypass_ht_init() needs to be called on each CPU for PV guests, too. Reported-by: Brian Woods <brian.woods@amd.com> Tested-by: Brian Woods <brian.woods@amd.com> Signed-off-by: Juergen Gross <jgross@suse.com> Cc: <stable@vger.kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: boris.ostrovsky@oracle.com Cc: xen-devel@lists.xenproject.org Fixes: 1f50ddb4f4189243c05926b842dc1a0332195f31 ("x86/speculation: Handle HT correctly on AMD") Link: https://lore.kernel.org/lkml/20180621084331.21228-1-jgross@suse.com Signed-off-by: Ingo Molnar <mingo@kernel.org>