aboutsummaryrefslogtreecommitdiffstats
path: root/arch/x86/include/asm
AgeCommit message (Collapse)Author
2016-08-06Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull more KVM updates from Paolo Bonzini: - ARM bugfix and MSI injection support - x86 nested virt tweak and OOPS fix - Simplify pvclock code (vdso bits acked by Andy Lutomirski). * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: nvmx: mark ept single context invalidation as supported nvmx: remove comment about missing nested vpid support KVM: lapic: fix access preemption timer stuff even if kernel_irqchip=off KVM: documentation: fix KVM_CAP_X2APIC_API information x86: vdso: use __pvclock_read_cycles pvclock: introduce seqcount-like API arm64: KVM: Set cpsr before spsr on fault injection KVM: arm: vgic-irqfd: Workaround changing kvm_set_routing_entry prototype KVM: arm/arm64: Enable MSI routing KVM: arm/arm64: Enable irqchip routing KVM: Move kvm_setup_default/empty_irq_routing declaration in arch specific header KVM: irqchip: Convey devid to kvm_set_msi KVM: Add devid in kvm_kernel_irq_routing_entry KVM: api: Pass the devid in the msi routing entry
2016-08-06Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Two fixes and a cleanup-fix, to the syscall entry code and to ptrace" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/syscalls/64: Add compat_sys_keyctl for 32-bit userspace x86/ptrace: Stop setting TS_COMPAT in ptrace code x86/vdso: Error out if the vDSO isn't a valid DSO
2016-08-05Merge tag 'rtc-4.8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux Pull RTC updates from Alexandre Belloni: "RTC for 4.8 Cleanups: - huge cleanup of rtc-generic and char/genrtc this allowed to cleanup rtc-cmos, rtc-sh, rtc-m68k, rtc-powerpc and rtc-parisc - move mn10300 to rtc-cmos Subsystem: - fix wakealarms after hibernate - multiples fixes for rctest - simplify implementations of .read_alarm New drivers: - Maxim MAX6916 Drivers: - ds1307: fix weekday - m41t80: add wakeup support - pcf85063: add support for PCF85063A variant - rv8803: extend i2c fix and other fixes - s35390a: fix alarm reading, this fixes instant reboot after shutdown for QNAP TS-41x - s3c: clock fixes" * tag 'rtc-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (65 commits) rtc: rv8803: Clear V1F when setting the time rtc: rv8803: Stop the clock while setting the time rtc: rv8803: Always apply the I²C workaround rtc: rv8803: Fix read day of week rtc: rv8803: Remove the check for valid time rtc: rv8803: Kconfig: Indicate rx8900 support rtc: asm9260: remove .owner field for driver rtc: at91sam9: Fix missing spin_lock_init() rtc: m41t80: add suspend handlers for alarm IRQ rtc: m41t80: make it a real error message rtc: pcf85063: Add support for the PCF85063A device rtc: pcf85063: fix year range rtc: hym8563: in .read_alarm set .tm_sec to 0 to signal minute accuracy rtc: explicitly set tm_sec = 0 for drivers with minute accurancy rtc: s3c: Add s3c_rtc_{enable/disable}_clk in s3c_rtc_setfreq() rtc: s3c: Remove unnecessary call to disable already disabled clock rtc: abx80x: use devm_add_action_or_reset() rtc: m41t80: use devm_add_action_or_reset() rtc: fix a typo and reduce three empty lines to one rtc: s35390a: improve two comments in .set_alarm ...
2016-08-04dma-mapping: use unsigned long for dma_attrsKrzysztof Kozlowski
The dma-mapping core and the implementations do not change the DMA attributes passed by pointer. Thus the pointer can point to const data. However the attributes do not have to be a bitfield. Instead unsigned long will do fine: 1. This is just simpler. Both in terms of reading the code and setting attributes. Instead of initializing local attributes on the stack and passing pointer to it to dma_set_attr(), just set the bits. 2. It brings safeness and checking for const correctness because the attributes are passed by value. Semantic patches for this change (at least most of them): virtual patch virtual context @r@ identifier f, attrs; @@ f(..., - struct dma_attrs *attrs + unsigned long attrs , ...) { ... } @@ identifier r.f; @@ f(..., - NULL + 0 ) and // Options: --all-includes virtual patch virtual context @r@ identifier f, attrs; type t; @@ t f(..., struct dma_attrs *attrs); @@ identifier r.f; @@ f(..., - NULL + 0 ) Link: http://lkml.kernel.org/r/1468399300-5399-2-git-send-email-k.kozlowski@samsung.com Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Acked-by: Vineet Gupta <vgupta@synopsys.com> Acked-by: Robin Murphy <robin.murphy@arm.com> Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no> Acked-by: Mark Salter <msalter@redhat.com> [c6x] Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> [cris] Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> [drm] Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Acked-by: Joerg Roedel <jroedel@suse.de> [iommu] Acked-by: Fabien Dessenne <fabien.dessenne@st.com> [bdisp] Reviewed-by: Marek Szyprowski <m.szyprowski@samsung.com> [vb2-core] Acked-by: David Vrabel <david.vrabel@citrix.com> [xen] Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [xen swiotlb] Acked-by: Joerg Roedel <jroedel@suse.de> [iommu] Acked-by: Richard Kuo <rkuo@codeaurora.org> [hexagon] Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k] Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [s390] Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org> Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no> [avr32] Acked-by: Vineet Gupta <vgupta@synopsys.com> [arc] Acked-by: Robin Murphy <robin.murphy@arm.com> [arm64 and dma-iommu] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-04tree-wide: replace config_enabled() with IS_ENABLED()Masahiro Yamada
The use of config_enabled() against config options is ambiguous. In practical terms, config_enabled() is equivalent to IS_BUILTIN(), but the author might have used it for the meaning of IS_ENABLED(). Using IS_ENABLED(), IS_BUILTIN(), IS_MODULE() etc. makes the intention clearer. This commit replaces config_enabled() with IS_ENABLED() where possible. This commit is only touching bool config options. I noticed two cases where config_enabled() is used against a tristate option: - config_enabled(CONFIG_HWMON) [ drivers/net/wireless/ath/ath10k/thermal.c ] - config_enabled(CONFIG_BACKLIGHT_CLASS_DEVICE) [ drivers/gpu/drm/gma500/opregion.c ] I did not touch them because they should be converted to IS_BUILTIN() in order to keep the logic, but I was not sure it was the authors' intention. Link: http://lkml.kernel.org/r/1465215656-20569-1-git-send-email-yamada.masahiro@socionext.com Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Stas Sergeev <stsp@list.ru> Cc: Matt Redfearn <matt.redfearn@imgtec.com> Cc: Joshua Kinard <kumba@gentoo.org> Cc: Jiri Slaby <jslaby@suse.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov <bp@suse.de> Cc: Markos Chandras <markos.chandras@imgtec.com> Cc: "Dmitry V. Levin" <ldv@altlinux.org> Cc: yu-cheng yu <yu-cheng.yu@intel.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Will Drewry <wad@chromium.org> Cc: Nikolay Martynov <mar.kolya@gmail.com> Cc: Huacai Chen <chenhc@lemote.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Leonid Yegoshin <Leonid.Yegoshin@imgtec.com> Cc: Rafal Milecki <zajec5@gmail.com> Cc: James Cowgill <James.Cowgill@imgtec.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Alex Smith <alex.smith@imgtec.com> Cc: Adam Buchbinder <adam.buchbinder@gmail.com> Cc: Qais Yousef <qais.yousef@imgtec.com> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Mikko Rapeli <mikko.rapeli@iki.fi> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Brian Norris <computersforpeace@gmail.com> Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: "Luis R. Rodriguez" <mcgrof@do-not-panic.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Roland McGrath <roland@hack.frob.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: Kalle Valo <kvalo@qca.qualcomm.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Tony Wu <tung7970@gmail.com> Cc: Huaitong Han <huaitong.han@intel.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: Jason Cooper <jason@lakedaemon.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrea Gelmini <andrea.gelmini@gelma.net> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Rabin Vincent <rabin@rab.in> Cc: "Maciej W. Rozycki" <macro@imgtec.com> Cc: David Daney <david.daney@cavium.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-04pvclock: introduce seqcount-like APIPaolo Bonzini
The version field in struct pvclock_vcpu_time_info basically implements a seqcount. Wrap it with the usual read_begin and read_retry functions, and use these APIs instead of peppering the code with smp_rmb()s. While at it, change it to the more pedantically correct virt_rmb(). With this change, __pvclock_read_cycles can be simplified noticeably. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-08-02Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge yet more updates from Andrew Morton: - the rest of ocfs2 - various hotfixes, mainly MM - quite a bit of misc stuff - drivers, fork, exec, signals, etc. - printk updates - firmware - checkpatch - nilfs2 - more kexec stuff than usual - rapidio updates - w1 things * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (111 commits) ipc: delete "nr_ipc_ns" kcov: allow more fine-grained coverage instrumentation init/Kconfig: add clarification for out-of-tree modules config: add android config fragments init/Kconfig: ban CONFIG_LOCALVERSION_AUTO with allmodconfig relay: add global mode support for buffer-only channels init: allow blacklisting of module_init functions w1:omap_hdq: fix regression w1: add helper macro module_w1_family w1: remove need for ida and use PLATFORM_DEVID_AUTO rapidio/switches: add driver for IDT gen3 switches powerpc/fsl_rio: apply changes for RIO spec rev 3 rapidio: modify for rev.3 specification changes rapidio: change inbound window size type to u64 rapidio/idt_gen2: fix locking warning rapidio: fix error handling in mbox request/release functions rapidio/tsi721_dma: advance queue processing from transfer submit call rapidio/tsi721: add messaging mbox selector parameter rapidio/tsi721: add PCIe MRRS override parameter rapidio/tsi721_dma: add channel mask and queue size parameters ...
2016-08-02signal: consolidate {TS,TLF}_RESTORE_SIGMASK codeAndy Lutomirski
In general, there's no need for the "restore sigmask" flag to live in ti->flags. alpha, ia64, microblaze, powerpc, sh, sparc (64-bit only), tile, and x86 use essentially identical alternative implementations, placing the flag in ti->status. Replace those optimized implementations with an equally good common implementation that stores it in a bitfield in struct task_struct and drop the custom implementations. Additional architectures can opt in by removing their TIF_RESTORE_SIGMASK defines. Link: http://lkml.kernel.org/r/8a14321d64a28e40adfddc90e18a96c086a6d6f9.1468522723.git.luto@kernel.org Signed-off-by: Andy Lutomirski <luto@kernel.org> Tested-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Paolo Bonzini: - ARM: GICv3 ITS emulation and various fixes. Removal of the old VGIC implementation. - s390: support for trapping software breakpoints, nested virtualization (vSIE), the STHYI opcode, initial extensions for CPU model support. - MIPS: support for MIPS64 hosts (32-bit guests only) and lots of cleanups, preliminary to this and the upcoming support for hardware virtualization extensions. - x86: support for execute-only mappings in nested EPT; reduced vmexit latency for TSC deadline timer (by about 30%) on Intel hosts; support for more than 255 vCPUs. - PPC: bugfixes. * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (302 commits) KVM: PPC: Introduce KVM_CAP_PPC_HTM MIPS: Select HAVE_KVM for MIPS64_R{2,6} MIPS: KVM: Reset CP0_PageMask during host TLB flush MIPS: KVM: Fix ptr->int cast via KVM_GUEST_KSEGX() MIPS: KVM: Sign extend MFC0/RDHWR results MIPS: KVM: Fix 64-bit big endian dynamic translation MIPS: KVM: Fail if ebase doesn't fit in CP0_EBase MIPS: KVM: Use 64-bit CP0_EBase when appropriate MIPS: KVM: Set CP0_Status.KX on MIPS64 MIPS: KVM: Make entry code MIPS64 friendly MIPS: KVM: Use kmap instead of CKSEG0ADDR() MIPS: KVM: Use virt_to_phys() to get commpage PFN MIPS: Fix definition of KSEGX() for 64-bit KVM: VMX: Add VMCS to CPU's loaded VMCSs before VMPTRLD kvm: x86: nVMX: maintain internal copy of current VMCS KVM: PPC: Book3S HV: Save/restore TM state in H_CEDE KVM: PPC: Book3S HV: Pull out TM state save/restore into separate procedures KVM: arm64: vgic-its: Simplify MAPI error handling KVM: arm64: vgic-its: Make vgic_its_cmd_handle_mapi similar to other handlers KVM: arm64: vgic-its: Turn device_id validation into generic ID validation ...
2016-08-01Merge branch 'x86-headers-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 header cleanups from Ingo Molnar: "This tree is a cleanup of the x86 tree reducing spurious uses of module.h - which should improve build performance a bit" * 'x86-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, crypto: Restore MODULE_LICENSE() to glue_helper.c so it loads x86/apic: Remove duplicated include from probe_64.c x86/ce4100: Remove duplicated include from ce4100.c x86/headers: Include spinlock_types.h in x8664_ksyms_64.c for missing spinlock_t x86/platform: Delete extraneous MODULE_* tags fromm ts5500 x86: Audit and remove any remaining unnecessary uses of module.h x86/kvm: Audit and remove any unnecessary uses of module.h x86/xen: Audit and remove any unnecessary uses of module.h x86/platform: Audit and remove any unnecessary uses of module.h x86/lib: Audit and remove any unnecessary uses of module.h x86/kernel: Audit and remove any unnecessary uses of module.h x86/mm: Audit and remove any unnecessary uses of module.h x86: Don't use module.h just for AUTHOR / LICENSE tags
2016-07-30Merge branch 'x86-microcode-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 microcode updates from Thomas Gleixner: - more work to make the microcode loader robust - a fix for the micro code load precedence - fixes for initrd loading with randomized memory - less printk noise on SMP machines * 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/asm, x86/microcode: Add __PAGE_OFFSET_BASE define on 32-bit x86/microcode/intel: Fix initrd loading with CONFIG_RANDOMIZE_MEMORY=y x86/microcode: Remove unused symbol exports x86/microcode/intel: Do not issue microcode updates messages on each CPU Documentation/microcode: Document some aspects for more clarity x86/microcode/AMD: Make amd_ucode_patch[] static x86/microcode/intel: Unexport save_mc_for_early() x86/microcode/intel: Rename load_microcode_early() to find_microcode_patch() x86/microcode: Propagate save_microcode_in_initrd() retval x86/microcode: Get rid of find_cpio_data()'s dummy offset arg lib/cpio: Make find_cpio_data()'s offset arg optional x86/microcode: Fix suspend to RAM with builtin microcode x86/microcode: Fix loading precedence
2016-07-30Merge branch 'x86-cpu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpufeature updates from Thomas Gleixner: - a workaround for the MONITOR instruction erratum of Goldmont CPUs - small fixes and cleanups here and there * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu: Add workaround for MONITOR instruction erratum on Goldmont based CPUs x86/cpu: Rename "WESTMERE2" family to "NEHALEM_G" x86/amd_nb: Clean up init path x86/cpufeature: Add helper macro for mask check macros x86/cpufeature: Make sure DISABLED/REQUIRED macros are updated x86/cpufeature: Update cpufeaure macros
2016-07-29Merge branch 'next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security subsystem updates from James Morris: "Highlights: - TPM core and driver updates/fixes - IPv6 security labeling (CALIPSO) - Lots of Apparmor fixes - Seccomp: remove 2-phase API, close hole where ptrace can change syscall #" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (156 commits) apparmor: fix SECURITY_APPARMOR_HASH_DEFAULT parameter handling tpm: Add TPM 2.0 support to the Nuvoton i2c driver (NPCT6xx family) tpm: Factor out common startup code tpm: use devm_add_action_or_reset tpm2_i2c_nuvoton: add irq validity check tpm: read burstcount from TPM_STS in one 32-bit transaction tpm: fix byte-order for the value read by tpm2_get_tpm_pt tpm_tis_core: convert max timeouts from msec to jiffies apparmor: fix arg_size computation for when setprocattr is null terminated apparmor: fix oops, validate buffer size in apparmor_setprocattr() apparmor: do not expose kernel stack apparmor: fix module parameters can be changed after policy is locked apparmor: fix oops in profile_unpack() when policy_db is not present apparmor: don't check for vmalloc_addr if kvzalloc() failed apparmor: add missing id bounds check on dfa verification apparmor: allow SYS_CAP_RESOURCE to be sufficient to prlimit another task apparmor: use list_next_entry instead of list_entry_next apparmor: fix refcount race when finding a child profile apparmor: fix ref count leak when profile sha1 hash is read apparmor: check that xindex is in trans_table bounds ...
2016-07-28Merge tag 'libnvdimm-for-4.8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: - Replace pcommit with ADR / directed-flushing. The pcommit instruction, which has not shipped on any product, is deprecated. Instead, the requirement is that platforms implement either ADR, or provide one or more flush addresses per nvdimm. ADR (Asynchronous DRAM Refresh) flushes data in posted write buffers to the memory controller on a power-fail event. Flush addresses are defined in ACPI 6.x as an NVDIMM Firmware Interface Table (NFIT) sub-structure: "Flush Hint Address Structure". A flush hint is an mmio address that when written and fenced assures that all previous posted writes targeting a given dimm have been flushed to media. - On-demand ARS (address range scrub). Linux uses the results of the ACPI ARS commands to track bad blocks in pmem devices. When latent errors are detected we re-scrub the media to refresh the bad block list, userspace can also request a re-scrub at any time. - Support for the Microsoft DSM (device specific method) command format. - Support for EDK2/OVMF virtual disk device memory ranges. - Various fixes and cleanups across the subsystem. * tag 'libnvdimm-for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (41 commits) libnvdimm-btt: Delete an unnecessary check before the function call "__nd_device_register" nfit: do an ARS scrub on hitting a latent media error nfit: move to nfit/ sub-directory nfit, libnvdimm: allow an ARS scrub to be triggered on demand libnvdimm: register nvdimm_bus devices with an nd_bus driver pmem: clarify a debug print in pmem_clear_poison x86/insn: remove pcommit Revert "KVM: x86: add pcommit support" nfit, tools/testing/nvdimm/: unify shutdown paths libnvdimm: move ->module to struct nvdimm_bus_descriptor nfit: cleanup acpi_nfit_init calling convention nfit: fix _FIT evaluation memory leak + use after free tools/testing/nvdimm: add manufacturing_{date|location} dimm properties tools/testing/nvdimm: add virtual ramdisk range acpi, nfit: treat virtual ramdisk SPA as pmem region pmem: kill __pmem address space pmem: kill wmb_pmem() libnvdimm, pmem: use nvdimm_flush() for namespace I/O writes fs/dax: remove wmb_pmem() libnvdimm, pmem: flush posted-write queues on shutdown ...
2016-07-27Merge tag 'for-linus-4.8-rc0-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from David Vrabel: "Features and fixes for 4.8-rc0: - ACPI support for guests on ARM platforms. - Generic steal time support for arm and x86. - Support cases where kernel cpu is not Xen VCPU number (e.g., if in-guest kexec is used). - Use the system workqueue instead of a custom workqueue in various places" * tag 'for-linus-4.8-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (47 commits) xen: add static initialization of steal_clock op to xen_time_ops xen/pvhvm: run xen_vcpu_setup() for the boot CPU xen/evtchn: use xen_vcpu_id mapping xen/events: fifo: use xen_vcpu_id mapping xen/events: use xen_vcpu_id mapping in events_base x86/xen: use xen_vcpu_id mapping when pointing vcpu_info to shared_info x86/xen: use xen_vcpu_id mapping for HYPERVISOR_vcpu_op xen: introduce xen_vcpu_id mapping x86/acpi: store ACPI ids from MADT for future usage x86/xen: update cpuid.h from Xen-4.7 xen/evtchn: add IOCTL_EVTCHN_RESTRICT xen-blkback: really don't leak mode property xen-blkback: constify instance of "struct attribute_group" xen-blkfront: prefer xenbus_scanf() over xenbus_gather() xen-blkback: prefer xenbus_scanf() over xenbus_gather() xen: support runqueue steal time on xen arm/xen: add support for vm_assist hypercall xen: update xen headers xen-pciback: drop superfluous variables xen-pciback: short-circuit read path used for merging write values ...
2016-07-27x86/asm, x86/microcode: Add __PAGE_OFFSET_BASE define on 32-bitBorislav Petkov
... in order to avoid #ifdeffery in code computing the ASLR randomization offset. Remove that #ifdeffery in the microcode loader. Suggested-by: Kees Cook <keescook@chromium.org> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nicolai Stange <nicstange@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Garnier <thgarnie@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20160727120939.GA18911@nazgul.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-27Merge branch 'linus' into x86/microcode, to pick up merge window changesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-27x86/ptrace: Stop setting TS_COMPAT in ptrace codeAndy Lutomirski
Setting TS_COMPAT in ptrace is wrong: if we happen to do it during syscall entry, then we'll confuse seccomp and audit. (The former isn't a security problem: seccomp is currently entirely insecure if a malicious ptracer is attached.) As a minimal fix, this patch adds a new flag TS_I386_REGS_POKED that handles the ptrace special case. Signed-off-by: Andy Lutomirski <luto@kernel.org> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pedro Alves <palves@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/5383ebed38b39fa37462139e337aff7f2314d1ca.1469599803.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-26Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge updates from Andrew Morton: - a few misc bits - ocfs2 - most(?) of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (125 commits) thp: fix comments of __pmd_trans_huge_lock() cgroup: remove unnecessary 0 check from css_from_id() cgroup: fix idr leak for the first cgroup root mm: memcontrol: fix documentation for compound parameter mm: memcontrol: remove BUG_ON in uncharge_list mm: fix build warnings in <linux/compaction.h> mm, thp: convert from optimistic swapin collapsing to conservative mm, thp: fix comment inconsistency for swapin readahead functions thp: update Documentation/{vm/transhuge,filesystems/proc}.txt shmem: split huge pages beyond i_size under memory pressure thp: introduce CONFIG_TRANSPARENT_HUGE_PAGECACHE khugepaged: add support of collapse for tmpfs/shmem pages shmem: make shmem_inode_info::lock irq-safe khugepaged: move up_read(mmap_sem) out of khugepaged_alloc_page() thp: extract khugepaged from mm/huge_memory.c shmem, thp: respect MADV_{NO,}HUGEPAGE for file mappings shmem: add huge pages support shmem: get_unmapped_area align huge page shmem: prepare huge= mount option and sysfs knob mm, rmap: account shmem thp pages ...
2016-07-26Merge tag 'acpi-4.8-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI updates from Rafael Wysocki: "The new feaures here are the support for ACPI overlays (allowing ACPI tables to be loaded at any time from EFI variables or via configfs) and the LPI (Low-Power Idle) support. Also notable is the ACPI-based NUMA support for ARM64. Apart from that we have two new drivers, for the DPTF (Dynamic Power and Thermal Framework) power participant device and for the Intel Broxton WhiskeyCove PMIC, some more PMIC-related changes, support for the Boot Error Record Table (BERT) in APEI and support for platform-initiated graceful shutdown. Plus two new pieces of documentation and usual assorted fixes and cleanups in quite a few places. Specifics: - Support for ACPI SSDT overlays allowing Secondary System Description Tables (SSDTs) to be loaded at any time from EFI variables or via configfs (Octavian Purdila, Mika Westerberg). - Support for the ACPI LPI (Low-Power Idle) feature introduced in ACPI 6.0 and allowing processor idle states to be represented in ACPI tables in a hierarchical way (with the help of Processor Container objects) and support for ACPI idle states management on ARM64, based on LPI (Sudeep Holla). - General improvements of ACPI support for NUMA and ARM64 support for ACPI-based NUMA (Hanjun Guo, David Daney, Robert Richter). - General improvements of the ACPI table upgrade mechanism and ARM64 support for that feature (Aleksey Makarov, Jon Masters). - Support for the Boot Error Record Table (BERT) in APEI and improvements of kernel messages printed by the error injection code (Huang Ying, Borislav Petkov). - New driver for the Intel Broxton WhiskeyCove PMIC operation region and support for the REGS operation region on Broxton, PMIC code cleanups (Bin Gao, Felipe Balbi, Paul Gortmaker). - New driver for the power participant device which is part of the Dynamic Power and Thermal Framework (DPTF) and DPTF-related code reorganization (Srinivas Pandruvada). - Support for the platform-initiated graceful shutdown feature introduced in ACPI 6.1 (Prashanth Prakash). - ACPI button driver update related to lid input events generated automatically on initialization and system resume that have been problematic for some time (Lv Zheng). - ACPI EC driver cleanups (Lv Zheng). - Documentation of the ACPICA release automation process and the in-kernel ACPI AML debugger (Lv Zheng). - New blacklist entry and two fixes for the ACPI backlight driver (Alex Hung, Arvind Yadav, Ralf Gerbig). - Cleanups of the ACPI pci_slot driver (Joe Perches, Paul Gortmaker). - ACPI CPPC code changes to make it more robust against possible defects in ACPI tables and new symbol definitions for PCC (Hoan Tran). - System reboot code modification to execute the ACPI _PTS (Prepare To Sleep) method in addition to _TTS (Ocean He). - ACPICA-related change to carry out lock ordering checks in ACPICA if ACPICA debug is enabled in the kernel (Lv Zheng). - Assorted minor fixes and cleanups (Andy Shevchenko, Baoquan He, Bhaktipriya Shridhar, Paul Gortmaker, Rafael Wysocki)" * tag 'acpi-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (71 commits) ACPI: enable ACPI_PROCESSOR_IDLE on ARM64 arm64: add support for ACPI Low Power Idle(LPI) drivers: firmware: psci: initialise idle states using ACPI LPI cpuidle: introduce CPU_PM_CPU_IDLE_ENTER macro for ARM{32, 64} arm64: cpuidle: drop __init section marker to arm_cpuidle_init ACPI / processor_idle: Add support for Low Power Idle(LPI) states ACPI / processor_idle: introduce ACPI_PROCESSOR_CSTATE ACPI / DPTF: move int340x_thermal.c to the DPTF folder ACPI / DPTF: Add DPTF power participant driver ACPI / lpat: make it explicitly non-modular ACPI / dock: make dock explicitly non-modular ACPI / PCI: make pci_slot explicitly non-modular ACPI / PMIC: remove modular references from non-modular code ACPICA: Linux: Enable ACPI_MUTEX_DEBUG for Linux kernel ACPI: Rename configfs.c to acpi_configfs.c to prevent link error ACPI / debugger: Add AML debugger documentation ACPI: Add documentation describing ACPICA release automation ACPI: add support for loading SSDTs via configfs ACPI: add support for configfs efi / ACPI: load SSTDs from EFI variables ...
2016-07-26Merge tag 'pm-4.8-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "Again, the majority of changes go into the cpufreq subsystem, but there are no big features this time. The cpufreq changes that stand out somewhat are the governor interface rework and improvements related to the handling of frequency tables. Apart from those, there are fixes and new device/CPU IDs in drivers, cleanups and an improvement of the new schedutil governor. Next, there are some changes in the hibernation core, including a fix for a nasty problem related to the MONITOR/MWAIT usage by CPU offline during resume from hibernation, a few core improvements related to memory management during resume, a couple of additional debug features and cleanups. Finally, we have some fixes and cleanups in the devfreq subsystem, generic power domains framework improvements related to system suspend/resume, support for some new chips in intel_idle and in the power capping RAPL driver, a new version of the AnalyzeSuspend utility and some assorted fixes and cleanups. Specifics: - Rework the cpufreq governor interface to make it more straightforward and modify the conservative governor to avoid using transition notifications (Rafael Wysocki). - Rework the handling of frequency tables by the cpufreq core to make it more efficient (Viresh Kumar). - Modify the schedutil governor to reduce the number of wakeups it causes to occur in cases when the CPU frequency doesn't need to be changed (Steve Muckle, Viresh Kumar). - Fix some minor issues and clean up code in the cpufreq core and governors (Rafael Wysocki, Viresh Kumar). - Add Intel Broxton support to the intel_pstate driver (Srinivas Pandruvada). - Fix problems related to the config TDP feature and to the validity of the MSR_HWP_INTERRUPT register in intel_pstate (Jan Kiszka, Srinivas Pandruvada). - Make intel_pstate update the cpu_frequency tracepoint even if the frequency doesn't change to avoid confusing powertop (Rafael Wysocki). - Clean up the usage of __init/__initdata in intel_pstate, mark some of its internal variables as __read_mostly and drop an unused structure element from it (Jisheng Zhang, Carsten Emde). - Clean up the usage of some duplicate MSR symbols in intel_pstate and turbostat (Srinivas Pandruvada). - Update/fix the powernv, s3c24xx and mvebu cpufreq drivers (Akshay Adiga, Viresh Kumar, Ben Dooks). - Fix a regression (introduced during the 4.5 cycle) in the pcc-cpufreq driver by reverting the problematic commit (Andreas Herrmann). - Add support for Intel Denverton to intel_idle, clean up Broxton support in it and make it explicitly non-modular (Jacob Pan, Jan Beulich, Paul Gortmaker). - Add support for Denverton and Ivy Bridge server to the Intel RAPL power capping driver and make it more careful about the handing of MSRs that may not be present (Jacob Pan, Xiaolong Wang). - Fix resume from hibernation on x86-64 by making the CPU offline during resume avoid using MONITOR/MWAIT in the "play dead" loop which may lead to an inadvertent "revival" of a "dead" CPU and a page fault leading to a kernel crash from it (Rafael Wysocki). - Make memory management during resume from hibernation more straightforward (Rafael Wysocki). - Add debug features that should help to detect problems related to hibernation and resume from it (Rafael Wysocki, Chen Yu). - Clean up hibernation core somewhat (Rafael Wysocki). - Prevent KASAN from instrumenting the hibernation core which leads to large numbers of false-positives from it (James Morse). - Prevent PM (hibernate and suspend) notifiers from being called during the cleanup phase if they have not been called during the corresponding preparation phase which is possible if one of the other notifiers returns an error at that time (Lianwei Wang). - Improve suspend-related debug printout in the tasks freezer and clean up suspend-related console handling (Roger Lu, Borislav Petkov). - Update the AnalyzeSuspend script in the kernel sources to version 4.2 (Todd Brandt). - Modify the generic power domains framework to make it handle system suspend/resume better (Ulf Hansson). - Make the runtime PM framework avoid resuming devices synchronously when user space changes the runtime PM settings for them and improve its error reporting (Rafael Wysocki, Linus Walleij). - Fix error paths in devfreq drivers (exynos, exynos-ppmu, exynos-bus) and in the core, make some devfreq code explicitly non-modular and change some of it into tristate (Bartlomiej Zolnierkiewicz, Peter Chen, Paul Gortmaker). - Add DT support to the generic PM clocks management code and make it export some more symbols (Jon Hunter, Paul Gortmaker). - Make the PCI PM core code slightly more robust against possible driver errors (Andy Shevchenko). - Make it possible to change DESTDIR and PREFIX in turbostat (Andy Shevchenko)" * tag 'pm-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (89 commits) Revert "cpufreq: pcc-cpufreq: update default value of cpuinfo_transition_latency" PM / hibernate: Introduce test_resume mode for hibernation cpufreq: export cpufreq_driver_resolve_freq() cpufreq: Disallow ->resolve_freq() for drivers providing ->target_index() PCI / PM: check all fields in pci_set_platform_pm() cpufreq: acpi-cpufreq: use cached frequency mapping when possible cpufreq: schedutil: map raw required frequency to driver frequency cpufreq: add cpufreq_driver_resolve_freq() cpufreq: intel_pstate: Check cpuid for MSR_HWP_INTERRUPT intel_pstate: Update cpu_frequency tracepoint every time cpufreq: intel_pstate: clean remnant struct element PM / tools: scripts: AnalyzeSuspend v4.2 x86 / hibernate: Use hlt_play_dead() when resuming from hibernation cpufreq: powernv: Replacing pstate_id with frequency table index intel_pstate: Fix MSR_CONFIG_TDP_x addressing in core_get_max_pstate() PM / hibernate: Image data protection during restoration PM / hibernate: Add missing braces in __register_nosave_region() PM / hibernate: Clean up comments in snapshot.c PM / hibernate: Clean up function headers in snapshot.c PM / hibernate: Add missing braces in hibernate_setup() ...
2016-07-26arch: x86: charge page tables to kmemcgVladimir Davydov
Page tables can bite a relatively big chunk off system memory and their allocations are easy to trigger from userspace, so they should be accounted to kmemcg. This patch marks page table allocations as __GFP_ACCOUNT for x86. Note we must not charge allocations of kernel page tables, because they can be shared among processes from different cgroups so accounting them to a particular one can pin other cgroups for indefinitely long. So we clear __GFP_ACCOUNT flag if a page table is allocated for the kernel. Link: http://lkml.kernel.org/r/7d5c54f6a2bcbe76f03171689440003d87e6c742.1464079538.git.vdavydov@virtuozzo.com Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-26Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "This tree contains tooling fixes plus some additions: - fixes to the vdso2c build environment that Stephen Rothwell is using for the linux-next build (Arnaldo Carvalho de Melo) - AVX-512 instruction mappings (Adrian Hunter) - misc fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Revert "perf tools: event.h needs asm/perf_regs.h" x86: Make the vdso2c compiler use the host architecture headers tools build: Fix objtool build with ARCH=x86_64 objtool: Always use host headers objtool: Use tools/scripts/Makefile.arch to get ARCH and HOSTARCH tools build: Add HOSTARCH Makefile variable perf tests kmod-path: Fix build on ubuntu:16.04-x-armhf perf tools: Add AVX-512 instructions to the new instructions test perf tools: Add AVX-512 support to the instruction decoder used by Intel PT x86/insn: Add AVX-512 support to the instruction decoder x86/insn: perf tools: Fix vcvtph2ps instruction decoding
2016-07-25Merge branch 'x86-timers-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 timer updates from Ingo Molnar: "The main change in this tree is the reworking, fixing and extension of the TSC frequency enumeration code (by Len Brown)" * 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/tsc: Remove the unused check_tsc_disabled() x86/tsc: Enumerate BXT tsc_khz via CPUID x86/tsc: Enumerate SKL cpu_khz and tsc_khz via CPUID x86/tsc_msr: Remove irqoff around MSR-based TSC enumeration x86/tsc_msr: Add Airmont reference clock values x86/tsc_msr: Correct Silvermont reference clock values x86/tsc_msr: Update comments, expand definitions x86/tsc_msr: Remove debugging messages x86/tsc_msr: Identify Intel-specific code Revert "x86/tsc: Add missing Cherrytrail frequency to the table"
2016-07-25Merge branch 'x86-platform-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 platform updates from Ingo Molnar: "The main changes in this cycle were: - Intel-SoC enhancements (Andy Shevchenko) - Intel CPU symbolic model definition rework (Dave Hansen) - ... other misc changes" * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits) x86/sfi: Enable enumeration of SD devices x86/pci: Use MRFLD abbreviation for Merrifield x86/platform/intel-mid: Make vertical indentation consistent x86/platform/intel-mid: Mark regulators explicitly defined x86/platform/intel-mid: Rename mrfl.c to mrfld.c x86/platform/intel-mid: Enable spidev on Intel Edison boards x86/platform/intel-mid: Extend PWRMU to support Penwell x86/pci, x86/platform/intel_mid_pci: Remove duplicate power off code x86/platform/intel-mid: Add pinctrl for Intel Merrifield x86/platform/intel-mid: Enable GPIO expanders on Edison x86/platform/intel-mid: Add Power Management Unit driver x86/platform/atom/punit: Enable support for Merrifield x86/platform/intel_mid_pci: Rework IRQ0 workaround x86, thermal: Clean up and fix CPU model detection for intel_soc_dts_thermal x86, mmc: Use Intel family name macros for mmc driver x86/intel_telemetry: Use Intel family name macros for telemetry driver x86/acpi/lss: Use Intel family name macros for the acpi_lpss driver x86/cpufreq: Use Intel family name macros for the intel_pstate cpufreq driver x86/platform: Use new Intel model number macros x86/intel_idle: Use Intel family macros for intel_idle ...
2016-07-25Merge branch 'x86-fpu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fpu updates from Ingo Molnar: "The main x86 FPU changes in this cycle were: - a large series of cleanups, fixes and enhancements to re-enable the XSAVES instruction on Intel CPUs - which is the most advanced instruction to do FPU context switches (Yu-cheng Yu, Fenghua Yu) - Add FPU tracepoints for the FPU state machine (Dave Hansen)" * 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/fpu: Do not BUG_ON() in early FPU code x86/fpu/xstate: Re-enable XSAVES x86/fpu/xstate: Fix fpstate_init() for XRSTORS x86/fpu/xstate: Return NULL for disabled xstate component address x86/fpu/xstate: Fix __fpu_restore_sig() for XSAVES x86/fpu/xstate: Fix xstate_offsets, xstate_sizes for non-extended xstates x86/fpu/xstate: Fix XSTATE component offset print out x86/fpu/xstate: Fix PTRACE frames for XSAVES x86/fpu/xstate: Fix supervisor xstate component offset x86/fpu/xstate: Align xstate components according to CPUID x86/fpu/xstate: Copy xstate registers directly to the signal frame when compacted format is in use x86/fpu/xstate: Keep init_fpstate.xsave.header.xfeatures as zero for init optimization x86/fpu/xstate: Rename 'xstate_size' to 'fpu_kernel_xstate_size', to distinguish it from 'fpu_user_xstate_size' x86/fpu/xstate: Define and use 'fpu_user_xstate_size' x86/fpu: Add tracepoints to dump FPU state at key points
2016-07-25Merge branch 'x86-debug-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 stackdump update from Ingo Molnar: "A number of stackdump enhancements" * 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/dumpstack: Add show_stack_regs() and use it printk: Make the printk*once() variants return a value x86/dumpstack: Honor supplied @regs arg
2016-07-25Merge branch 'x86-build-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 build updates from Ingo Molnar: "A build system fix and a cleanup" * 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: kbuild: Remove stale asm-generic wrappers kbuild, x86: Track generated headers with generated-y
2016-07-25Merge branch 'x86-boot-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 boot updates from Ingo Molnar: "The main changes: - add initial commits to randomize kernel memory section virtual addresses, enabled via a new kernel option: RANDOMIZE_MEMORY (Thomas Garnier, Kees Cook, Baoquan He, Yinghai Lu) - enhance KASLR (RANDOMIZE_BASE) physical memory randomization (Kees Cook) - EBDA/BIOS region boot quirk cleanups (Andy Lutomirski, Ingo Molnar) - misc cleanups/fixes" * 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot: Simplify EBDA-vs-BIOS reservation logic x86/boot: Clarify what x86_legacy_features.reserve_bios_regions does x86/boot: Reorganize and clean up the BIOS area reservation code x86/mm: Do not reference phys addr beyond kernel x86/mm: Add memory hotplug support for KASLR memory randomization x86/mm: Enable KASLR for vmalloc memory regions x86/mm: Enable KASLR for physical mapping memory regions x86/mm: Implement ASLR for kernel memory regions x86/mm: Separate variable for trampoline PGD x86/mm: Add PUD VA support for physical mapping x86/mm: Update physical mapping variable names x86/mm: Refactor KASLR entropy functions x86/KASLR: Fix boot crash with certain memory configurations x86/boot/64: Add forgotten end of function marker x86/KASLR: Allow randomization below the load address x86/KASLR: Extend kernel image physical address randomization to addresses larger than 4G x86/KASLR: Randomize virtual address separately x86/KASLR: Clarify identity map interface x86/boot: Refuse to build with data relocations x86/KASLR, x86/power: Remove x86 hibernation restrictions
2016-07-25Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm updates from Ingo Molnar: "Various x86 low level modifications: - preparatory work to support virtually mapped kernel stacks (Andy Lutomirski) - support for 64-bit __get_user() on 32-bit kernels (Benjamin LaHaise) - (involved) workaround for Knights Landing CPU erratum (Dave Hansen) - MPX enhancements (Dave Hansen) - mremap() extension to allow remapping of the special VDSO vma, for purposes of user level context save/restore (Dmitry Safonov) - hweight and entry code cleanups (Borislav Petkov) - bitops code generation optimizations and cleanups with modern GCC (H. Peter Anvin) - syscall entry code optimizations (Paolo Bonzini)" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (43 commits) x86/mm/cpa: Add missing comment in populate_pdg() x86/mm/cpa: Fix populate_pgd(): Stop trying to deallocate failed PUDs x86/syscalls: Add compat_sys_preadv64v2/compat_sys_pwritev64v2 x86/smp: Remove unnecessary initialization of thread_info::cpu x86/smp: Remove stack_smp_processor_id() x86/uaccess: Move thread_info::addr_limit to thread_struct x86/dumpstack: Rename thread_struct::sig_on_uaccess_error to sig_on_uaccess_err x86/uaccess: Move thread_info::uaccess_err and thread_info::sig_on_uaccess_err to thread_struct x86/dumpstack: When OOPSing, rewind the stack before do_exit() x86/mm/64: In vmalloc_fault(), use CR3 instead of current->active_mm x86/dumpstack/64: Handle faults when printing the "Stack: " part of an OOPS x86/dumpstack: Try harder to get a call trace on stack overflow x86/mm: Remove kernel_unmap_pages_in_pgd() and efi_cleanup_page_tables() x86/mm/cpa: In populate_pgd(), don't set the PGD entry until it's populated x86/mm/hotplug: Don't remove PGD entries in remove_pagetable() x86/mm: Use pte_none() to test for empty PTE x86/mm: Disallow running with 32-bit PTEs to work around erratum x86/mm: Ignore A/D bits in pte/pmd/pud_none() x86/mm: Move swap offset/type up in PTE to work around erratum x86/entry: Inline enter_from_user_mode() ...
2016-07-25Merge branch 'x86-apic-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/apic updates from Ingo Molnar: "Misc cleanups and a small fix" * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/apic: Remove the unused struct apic::apic_id_mask field x86/apic: Fix misspelled APIC x86/ioapic: Simplify ioapic_setup_resources()
2016-07-25Merge branch 'perf-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "With over 300 commits it's been a busy cycle - with most of the work concentrated on the tooling side (as it should). The main kernel side enhancements were: - Add per event callchain limit: Recently we introduced a sysctl to tune the max-stack for all events for which callchains were requested: $ sysctl kernel.perf_event_max_stack kernel.perf_event_max_stack = 127 Now this patch introduces a way to configure this per event, i.e. this becomes possible: $ perf record -e sched:*/max-stack=2/ -e block:*/max-stack=10/ -a allowing finer tuning of how much buffer space callchains use. This uses an u16 from the reserved space at the end, leaving another u16 for future use. There has been interest in even finer tuning, namely to control the max stack for kernel and userspace callchains separately. Further discussion is needed, we may for instance use the remaining u16 for that and when it is present, assume that the sample_max_stack introduced in this patch applies for the kernel, and the u16 left is used for limiting the userspace callchain (Arnaldo Carvalho de Melo) - Optimize AUX event (hardware assisted side-band event) delivery (Kan Liang) - Rework Intel family name macro usage (this is partially x86 arch work) (Dave Hansen) - Refine and fix Intel LBR support (David Carrillo-Cisneros) - Add support for Intel 'TopDown' events (Andi Kleen) - Intel uncore PMU driver fixes and enhancements (Kan Liang) - ... other misc changes. Here's an incomplete list of the tooling enhancements (but there's much more, see the shortlog and the git log for details): - Support cross unwinding, i.e. collecting '--call-graph dwarf' perf.data files in one machine and then doing analysis in another machine of a different hardware architecture. This enables, for instance, to do: $ perf record -a --call-graph dwarf on a x86-32 or aarch64 system and then do 'perf report' on it on a x86_64 workstation (He Kuang) - Allow reading from a backward ring buffer (one setup via sys_perf_event_open() with perf_event_attr.write_backward = 1) (Wang Nan) - Finish merging initial SDT (Statically Defined Traces) support, see cset comments for details about how it all works (Masami Hiramatsu) - Support attaching eBPF programs to tracepoints (Wang Nan) - Add demangling of symbols in programs written in the Rust language (David Tolnay) - Add support for tracepoints in the python binding, including an example, that sets up and parses sched:sched_switch events, tools/perf/python/tracepoint.py (Jiri Olsa) - Introduce --stdio-color to set up the color output mode selection in 'annotate' and 'report', allowing emit color escape sequences when redirecting the output of these tools (Arnaldo Carvalho de Melo) - Add 'callindent' option to 'perf script -F', to indent the Intel PT call stack, making this output more ftrace-like (Adrian Hunter, Andi Kleen) - Allow dumping the object files generated by llvm when processing eBPF scriptlet events (Wang Nan) - Add stackcollapse.py script to help generating flame graphs (Paolo Bonzini) - Add --ldlat option to 'perf mem' to specify load latency for loads event (e.g. cpu/mem-loads/ ) (Jiri Olsa) - Tooling support for Intel TopDown counters, recently added to the kernel (Andi Kleen)" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (303 commits) perf tests: Add is_printable_array test perf tools: Make is_printable_array global perf script python: Fix string vs byte array resolving perf probe: Warn unmatched function filter correctly perf cpu_map: Add more helpers perf stat: Balance opening and reading events tools: Copy linux/{hash,poison}.h and check for drift perf tools: Remove include/linux/list.h from perf's MANIFEST tools: Copy the bitops files accessed from the kernel and check for drift Remove: kernel unistd*h files from perf's MANIFEST, not used perf tools: Remove tools/perf/util/include/linux/const.h perf tools: Remove tools/perf/util/include/asm/byteorder.h perf tools: Add missing linux/compiler.h include to perf-sys.h perf jit: Remove some no-op error handling perf jit: Add missing curly braces objtool: Initialize variable to silence old compiler objtool: Add -I$(srctree)/tools/arch/$(ARCH)/include/uapi perf record: Add --tail-synthesize option perf session: Don't warn about out of order event if write_backward is used perf tools: Enable overwrite settings ...
2016-07-25Merge branch 'locking-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "The locking tree was busier in this cycle than the usual pattern - a couple of major projects happened to coincide. The main changes are: - implement the atomic_fetch_{add,sub,and,or,xor}() API natively across all SMP architectures (Peter Zijlstra) - add atomic_fetch_{inc/dec}() as well, using the generic primitives (Davidlohr Bueso) - optimize various aspects of rwsems (Jason Low, Davidlohr Bueso, Waiman Long) - optimize smp_cond_load_acquire() on arm64 and implement LSE based atomic{,64}_fetch_{add,sub,and,andnot,or,xor}{,_relaxed,_acquire,_release}() on arm64 (Will Deacon) - introduce smp_acquire__after_ctrl_dep() and fix various barrier mis-uses and bugs (Peter Zijlstra) - after discovering ancient spin_unlock_wait() barrier bugs in its implementation and usage, strengthen its semantics and update/fix usage sites (Peter Zijlstra) - optimize mutex_trylock() fastpath (Peter Zijlstra) - ... misc fixes and cleanups" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (67 commits) locking/atomic: Introduce inc/dec variants for the atomic_fetch_$op() API locking/barriers, arch/arm64: Implement LDXR+WFE based smp_cond_load_acquire() locking/static_keys: Fix non static symbol Sparse warning locking/qspinlock: Use __this_cpu_dec() instead of full-blown this_cpu_dec() locking/atomic, arch/tile: Fix tilepro build locking/atomic, arch/m68k: Remove comment locking/atomic, arch/arc: Fix build locking/Documentation: Clarify limited control-dependency scope locking/atomic, arch/rwsem: Employ atomic_long_fetch_add() locking/atomic, arch/qrwlock: Employ atomic_fetch_add_acquire() locking/atomic, arch/mips: Convert to _relaxed atomics locking/atomic, arch/alpha: Convert to _relaxed atomics locking/atomic: Remove the deprecated atomic_{set,clear}_mask() functions locking/atomic: Remove linux/atomic.h:atomic_fetch_or() locking/atomic: Implement atomic{,64,_long}_fetch_{add,sub,and,andnot,or,xor}{,_relaxed,_acquire,_release}() locking/atomic: Fix atomic64_relaxed() bits locking/atomic, arch/xtensa: Implement atomic_fetch_{add,sub,and,or,xor}() locking/atomic, arch/x86: Implement atomic{,64}_fetch_{add,sub,and,or,xor}() locking/atomic, arch/tile: Implement atomic{,64}_fetch_{add,sub,and,or,xor}() locking/atomic, arch/sparc: Implement atomic{,64}_fetch_{add,sub,and,or,xor}() ...
2016-07-25Merge branch 'efi-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI updates from Ingo Molnar: "The biggest change in this cycle were SGI/UV related changes that clean up and fix UV boot quirks and problems. There's also various smaller cleanups and refinements" * 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi: Reorganize the GUID table to make it easier to read x86/efi: Remove the unused efi_get_time() function x86/efi: Update efi_thunk() to use the the arch_efi_call_virt*() macros x86/uv: Update uv_bios_call() to use efi_call_virt_pointer() efi: Convert efi_call_virt() to efi_call_virt_pointer() x86/efi: Remove unused variable 'efi' efi: Document #define FOO_PROTOCOL_GUID layout efibc: Report more information in the error messages
2016-07-25x86/acpi: store ACPI ids from MADT for future usageVitaly Kuznetsov
Currently we don't save ACPI ids (unlike LAPIC ids which go to x86_cpu_to_apicid) from MADT and we may need this information later. Particularly, ACPI ids is the only existent way for a PVHVM Xen guest to figure out Xen's idea of its vCPUs ids before these CPUs boot and in some cases these ids diverge from Linux's cpu ids. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2016-07-25x86/xen: update cpuid.h from Xen-4.7Vitaly Kuznetsov
Update cpuid.h header from xen hypervisor tree to get XEN_HVM_CPUID_VCPU_ID_PRESENT definition. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2016-07-25Merge branch 'pm-cpu'Rafael J. Wysocki
* pm-cpu: x86: remove duplicate turbo ratio limit MSRs tools/power turbostat: Replace MSR_NHM_TURBO_RATIO_LIMIT cpufreq: intel_pstate: Replace MSR_NHM_TURBO_RATIO_LIMIT
2016-07-25Merge branch 'x86/cpu' from tipRafael J. Wysocki
2016-07-25Merge branches 'pm-sleep' and 'pm-tools'Rafael J. Wysocki
* pm-sleep: PM / hibernate: Introduce test_resume mode for hibernation x86 / hibernate: Use hlt_play_dead() when resuming from hibernation PM / hibernate: Image data protection during restoration PM / hibernate: Add missing braces in __register_nosave_region() PM / hibernate: Clean up comments in snapshot.c PM / hibernate: Clean up function headers in snapshot.c PM / hibernate: Add missing braces in hibernate_setup() PM / hibernate: Recycle safe pages after image restoration PM / hibernate: Simplify mark_unsafe_pages() PM / hibernate: Do not free preallocated safe pages during image restore PM / suspend: show workqueue state in suspend flow PM / sleep: make PM notifiers called symmetrically PM / sleep: Make pm_prepare_console() return void PM / Hibernate: Don't let kasan instrument snapshot.c * pm-tools: PM / tools: scripts: AnalyzeSuspend v4.2 tools/turbostat: allow user to alter DESTDIR and PREFIX
2016-07-25Merge branch 'acpi-tables'Rafael J. Wysocki
* acpi-tables: ACPI: Rename configfs.c to acpi_configfs.c to prevent link error ACPI: add support for loading SSDTs via configfs ACPI: add support for configfs efi / ACPI: load SSTDs from EFI variables spi / ACPI: add support for ACPI reconfigure notifications i2c / ACPI: add support for ACPI reconfigure notifications ACPI: add support for ACPI reconfiguration notifiers ACPI / scan: fix enumeration (visited) flags for bus rescans ACPI / documentation: add SSDT overlays documentation ACPI: ARM64: support for ACPI_TABLE_UPGRADE ACPI / tables: introduce ARCH_HAS_ACPI_TABLE_UPGRADE ACPI / tables: move arch-specific symbol to asm/acpi.h ACPI / tables: table upgrade: refactor function definitions ACPI / tables: table upgrade: use cacheable map for tables Conflicts: arch/arm64/include/asm/acpi.h
2016-07-25Merge branch 'acpi-numa'Rafael J. Wysocki
* acpi-numa: ACPI / NUMA: Enable ACPI based NUMA on ARM64 arm64, ACPI, NUMA: NUMA support based on SRAT and SLIT ACPI / processor: Add acpi_map_madt_entry() ACPI / NUMA: Improve SRAT error detection and add messages ACPI / NUMA: Move acpi_numa_memory_affinity_init() to drivers/acpi/numa.c ACPI / NUMA: remove unneeded acpi_numa=1 ACPI / NUMA: move bad_srat() and srat_disabled() to drivers/acpi/numa.c x86 / ACPI / NUMA: cleanup acpi_numa_processor_affinity_init() arm64, NUMA: Cleanup NUMA disabled messages arm64, NUMA: rework numa_add_memblk() ACPI / NUMA: move acpi_numa_slit_init() to drivers/acpi/numa.c ACPI / NUMA: Move acpi_numa_arch_fixup() to ia64 only ACPI / NUMA: remove duplicate NULL check ACPI / NUMA: Replace ACPI_DEBUG_PRINT() with pr_debug() ACPI / NUMA: Use pr_fmt() instead of printk
2016-07-24Merge branch 'for-4.8/libnvdimm' into libnvdimm-for-nextDan Williams
2016-07-23x86/insn: remove pcommitDan Williams
The pcommit instruction is being deprecated in favor of either ADR (asynchronous DRAM refresh: flush-on-power-fail) at the platform level, or posted-write-queue flush addresses as defined by the ACPI 6.x NFIT (NVDIMM Firmware Interface Table). Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Acked-by: Ingo Molnar <mingo@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-23Revert "KVM: x86: add pcommit support"Dan Williams
This reverts commit 8b3e34e46aca9b6d349b331cd9cf71ccbdc91b2e. Given the deprecation of the pcommit instruction, the relevant VMX features and CPUID bits are not going to be rolled into the SDM. Remove their usage from KVM. Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-22x86/boot: Clarify what x86_legacy_features.reserve_bios_regions doesAndy Lutomirski
It doesn't just control probing for the EBDA -- it controls whether we detect and reserve the <1MB BIOS regions in general. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Mario Limonciello <mario_limonciello@dell.com> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hp.com> Link: http://lkml.kernel.org/r/55bd591115498440d461857a7b64f349a5d911f3.1469135598.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-21x86/insn: Add AVX-512 support to the instruction decoderAdrian Hunter
Add support for Intel's AVX-512 instructions to the instruction decoder. AVX-512 instructions are documented in Intel Architecture Instruction Set Extensions Programming Reference (February 2016). AVX-512 instructions are identified by a EVEX prefix which, for the purpose of instruction decoding, can be treated as though it were a 4-byte VEX prefix. Existing instructions which can now accept an EVEX prefix need not be further annotated in the op code map (x86-opcode-map.txt). In the case of new instructions, the op code map is updated accordingly. Also add associated Mask Instructions that are used to manipulate mask registers used in AVX-512 instructions. The 'perf tools' instruction decoder is updated in a subsequent patch. And a representative set of instructions is added to the perf tools new instructions test in a subsequent patch. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dan Williams <dan.j.williams@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: X86 ML <x86@kernel.org> Link: http://lkml.kernel.org/r/1469003437-32706-3-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-21x86/boot: Reorganize and clean up the BIOS area reservation codeIngo Molnar
So the reserve_ebda_region() code has accumulated a number of problems over the years that make it really difficult to read and understand: - The calculation of 'lowmem' and 'ebda_addr' is an unnecessarily interleaved mess of first lowmem, then ebda_addr, then lowmem tweaks... - 'lowmem' here means 'super low mem' - i.e. 16-bit addressable memory. In other parts of the x86 code 'lowmem' means 32-bit addressable memory... This makes it super confusing to read. - It does not help at all that we have various memory range markers, half of which are 'start of range', half of which are 'end of range' - but this crucial property is not obvious in the naming at all ... gave me a headache trying to understand all this. - Also, the 'ebda_addr' name sucks: it highlights that it's an address (which is obvious, all values here are addresses!), while it does not highlight that it's the _start_ of the EBDA region ... - 'BIOS_LOWMEM_KILOBYTES' says a lot of things, except that this is the only value that is a pointer to a value, not a memory range address! - The function name itself is a misnomer: it says 'reserve_ebda_region()' while its main purpose is to reserve all the firmware ROM typically between 640K and 1MB, while the 'EBDA' part is only a small part of that ... - Likewise, the paravirt quirk flag name 'ebda_search' is misleading as well: this too should be about whether to reserve firmware areas in the paravirt case. - In fact thinking about this as 'end of RAM' is confusing: what this function *really* wants to reserve is firmware data and code areas! Once the thinking is inverted from a mixed 'ram' and 'reserved firmware area' notion to a pure 'reserved area' notion everything becomes a lot clearer. To improve all this rewrite the whole code (without changing the logic): - Firstly invert the naming from 'lowmem end' to 'BIOS reserved area start' and propagate this concept through all the variable names and constants. BIOS_RAM_SIZE_KB_PTR // was: BIOS_LOWMEM_KILOBYTES BIOS_START_MIN // was: INSANE_CUTOFF ebda_start // was: ebda_addr bios_start // was: lowmem BIOS_START_MAX // was: LOWMEM_CAP - Then clean up the name of the function itself by renaming it to reserve_bios_regions() and renaming the ::ebda_search paravirt flag to ::reserve_bios_regions. - Fix up all the comments (fix typos), harmonize and simplify their formulation and remove comments that become unnecessary due to the much better naming all around. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-20x86/cpu: Add workaround for MONITOR instruction erratum on Goldmont based CPUsPeter Zijlstra
Monitored cached line may not wake up from mwait on certain Goldmont based CPUs. This patch will avoid calling current_set_polling_and_test() and thereby not set the TIF_ flag. The result is that we'll always send IPIs for wakeups. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1468867270-18493-1-git-send-email-jacob.jun.pan@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-20Merge branch 'linus' into x86/cpu, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-15x86 / hibernate: Use hlt_play_dead() when resuming from hibernationRafael J. Wysocki
On Intel hardware, native_play_dead() uses mwait_play_dead() by default and only falls back to the other methods if that fails. That also happens during resume from hibernation, when the restore (boot) kernel runs disable_nonboot_cpus() to take all of the CPUs except for the boot one offline. However, that is problematic, because the address passed to __monitor() in mwait_play_dead() is likely to be written to in the last phase of hibernate image restoration and that causes the "dead" CPU to start executing instructions again. Unfortunately, the page containing the address in that CPU's instruction pointer may not be valid any more at that point. First, that page may have been overwritten with image kernel memory contents already, so the instructions the CPU attempts to execute may simply be invalid. Second, the page tables previously used by that CPU may have been overwritten by image kernel memory contents, so the address in its instruction pointer is impossible to resolve then. A report from Varun Koyyalagunta and investigation carried out by Chen Yu show that the latter sometimes happens in practice. To prevent it from happening, temporarily change the smp_ops.play_dead pointer during resume from hibernation so that it points to a special "play dead" routine which uses hlt_play_dead() and avoids the inadvertent "revivals" of "dead" CPUs this way. A slightly unpleasant consequence of this change is that if the system is hibernated with one or more CPUs offline, it will generally draw more power after resume than it did before hibernation, because the physical state entered by CPUs via hlt_play_dead() is higher-power than the mwait_play_dead() one in the majority of cases. It is possible to work around this, but it is unclear how much of a problem that's going to be in practice, so the workaround will be implemented later if it turns out to be necessary. Link: https://bugzilla.kernel.org/show_bug.cgi?id=106371 Reported-by: Varun Koyyalagunta <cpudebug@centtech.com> Original-by: Chen Yu <yu.c.chen@intel.com> Tested-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Ingo Molnar <mingo@kernel.org>