summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)Author
2018-09-20KVM: x86: Turbo bits in MSR_PLATFORM_INFODrew Schmitt
Allow userspace to set turbo bits in MSR_PLATFORM_INFO. Previously, only the CPUID faulting bit was settable. But now any bit in MSR_PLATFORM_INFO would be settable. This can be used, for example, to convey frequency information about the platform on which the guest is running. Signed-off-by: Drew Schmitt <dasch@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20nVMX x86: Check VPID value on vmentry of L2 guestsKrish Sadhukhan
According to section "Checks on VMX Controls" in Intel SDM vol 3C, the following check needs to be enforced on vmentry of L2 guests: If the 'enable VPID' VM-execution control is 1, the value of the of the VPID VM-execution control field must not be 0000H. Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Mark Kanda <mark.kanda@oracle.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20nVMX x86: check posted-interrupt descriptor addresss on vmentry of L2Krish Sadhukhan
According to section "Checks on VMX Controls" in Intel SDM vol 3C, the following check needs to be enforced on vmentry of L2 guests: - Bits 5:0 of the posted-interrupt descriptor address are all 0. - The posted-interrupt descriptor address does not set any bits beyond the processor's physical-address width. Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Mark Kanda <mark.kanda@oracle.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20KVM: nVMX: Wake blocked vCPU in guest-mode if pending interrupt in virtual APICvLiran Alon
In case L1 do not intercept L2 HLT or enter L2 in HLT activity-state, it is possible for a vCPU to be blocked while it is in guest-mode. According to Intel SDM 26.6.5 Interrupt-Window Exiting and Virtual-Interrupt Delivery: "These events wake the logical processor if it just entered the HLT state because of a VM entry". Therefore, if L1 enters L2 in HLT activity-state and L2 has a pending deliverable interrupt in vmcs12->guest_intr_status.RVI, then the vCPU should be waken from the HLT state and injected with the interrupt. In addition, if while the vCPU is blocked (while it is in guest-mode), it receives a nested posted-interrupt, then the vCPU should also be waken and injected with the posted interrupt. To handle these cases, this patch enhances kvm_vcpu_has_events() to also check if there is a pending interrupt in L2 virtual APICv provided by L1. That is, it evaluates if there is a pending virtual interrupt for L2 by checking RVI[7:4] > VPPR[7:4] as specified in Intel SDM 29.2.1 Evaluation of Pending Interrupts. Note that this also handles the case of nested posted-interrupt by the fact RVI is updated in vmx_complete_nested_posted_interrupt() which is called from kvm_vcpu_check_block() -> kvm_arch_vcpu_runnable() -> kvm_vcpu_running() -> vmx_check_nested_events() -> vmx_complete_nested_posted_interrupt(). Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20KVM: VMX: check nested state and CR4.VMXE against SMMPaolo Bonzini
VMX cannot be enabled under SMM, check it when CR4 is set and when nested virtualization state is restored. This should fix some WARNs reported by syzkaller, mostly around alloc_shadow_vmcs. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20kvm: x86: make kvm_{load|put}_guest_fpu() staticSebastian Andrzej Siewior
The functions kvm_load_guest_fpu() kvm_put_guest_fpu() are only used locally, make them static. This requires also that both functions are moved because they are used before their implementation. Those functions were exported (via EXPORT_SYMBOL) before commit e5bb40251a920 ("KVM: Drop kvm_{load,put}_guest_fpu() exports"). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20x86/hyper-v: rename ipi_arg_{ex,non_ex} structuresVitaly Kuznetsov
These structures are going to be used from KVM code so let's make their names reflect their Hyper-V origin. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Acked-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20KVM: VMX: use preemption timer to force immediate VMExitSean Christopherson
A VMX preemption timer value of '0' is guaranteed to cause a VMExit prior to the CPU executing any instructions in the guest. Use the preemption timer (if it's supported) to trigger immediate VMExit in place of the current method of sending a self-IPI. This ensures that pending VMExit injection to L1 occurs prior to executing any instructions in the guest (regardless of nesting level). When deferring VMExit injection, KVM generates an immediate VMExit from the (possibly nested) guest by sending itself an IPI. Because hardware interrupts are blocked prior to VMEnter and are unblocked (in hardware) after VMEnter, this results in taking a VMExit(INTR) before any guest instruction is executed. But, as this approach relies on the IPI being received before VMEnter executes, it only works as intended when KVM is running as L0. Because there are no architectural guarantees regarding when IPIs are delivered, when running nested the INTR may "arrive" long after L2 is running e.g. L0 KVM doesn't force an immediate switch to L1 to deliver an INTR. For the most part, this unintended delay is not an issue since the events being injected to L1 also do not have architectural guarantees regarding their timing. The notable exception is the VMX preemption timer[1], which is architecturally guaranteed to cause a VMExit prior to executing any instructions in the guest if the timer value is '0' at VMEnter. Specifically, the delay in injecting the VMExit causes the preemption timer KVM unit test to fail when run in a nested guest. Note: this approach is viable even on CPUs with a broken preemption timer, as broken in this context only means the timer counts at the wrong rate. There are no known errata affecting timer value of '0'. [1] I/O SMIs also have guarantees on when they arrive, but I have no idea if/how those are emulated in KVM. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> [Use a hook for SVM instead of leaving the default in x86.c - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20KVM: VMX: modify preemption timer bit only when arming timerSean Christopherson
Provide a singular location where the VMX preemption timer bit is set/cleared so that future usages of the preemption timer can ensure the VMCS bit is up-to-date without having to modify unrelated code paths. For example, the preemption timer can be used to force an immediate VMExit. Cache the status of the timer to avoid redundant VMREAD and VMWRITE, e.g. if the timer stays armed across multiple VMEnters/VMExits. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20KVM: VMX: immediately mark preemption timer expired only for zero valueSean Christopherson
A VMX preemption timer value of '0' at the time of VMEnter is architecturally guaranteed to cause a VMExit prior to the CPU executing any instructions in the guest. This architectural definition is in place to ensure that a previously expired timer is correctly recognized by the CPU as it is possible for the timer to reach zero and not trigger a VMexit due to a higher priority VMExit being signalled instead, e.g. a pending #DB that morphs into a VMExit. Whether by design or coincidence, commit f4124500c2c1 ("KVM: nVMX: Fully emulate preemption timer") special cased timer values of '0' and '1' to ensure prompt delivery of the VMExit. Unlike '0', a timer value of '1' has no has no architectural guarantees regarding when it is delivered. Modify the timer emulation to trigger immediate VMExit if and only if the timer value is '0', and document precisely why '0' is special. Do this even if calibration of the virtual TSC failed, i.e. VMExit will occur immediately regardless of the frequency of the timer. Making only '0' a special case gives KVM leeway to be more aggressive in ensuring the VMExit is injected prior to executing instructions in the nested guest, and also eliminates any ambiguity as to why '1' is a special case, e.g. why wasn't the threshold for a "short timeout" set to 10, 100, 1000, etc... Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20KVM: SVM: Switch to bitmap_zalloc()Andy Shevchenko
Switch to bitmap_zalloc() to show clearly what we are allocating. Besides that it returns pointer of bitmap type instead of opaque void *. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20KVM/MMU: Fix comment in walk_shadow_page_lockless_end()Tianyu Lan
kvm_commit_zap_page() has been renamed to kvm_mmu_commit_zap_page() This patch is to fix the commit. Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20kvm: selftests: use -pthread instead of -lpthreadLei Yang
I run into the following error testing/selftests/kvm/dirty_log_test.c:285: undefined reference to `pthread_create' testing/selftests/kvm/dirty_log_test.c:297: undefined reference to `pthread_join' collect2: error: ld returned 1 exit status my gcc version is gcc version 4.8.4 "-pthread" would work everywhere Signed-off-by: Lei Yang <Lei.Yang@windriver.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20KVM: x86: don't reset root in kvm_mmu_setup()Wei Yang
Here is the code path which shows kvm_mmu_setup() is invoked after kvm_mmu_create(). Since kvm_mmu_setup() is only invoked in this code path, this means the root_hpa and prev_roots are guaranteed to be invalid. And it is not necessary to reset it again. kvm_vm_ioctl_create_vcpu() kvm_arch_vcpu_create() vmx_create_vcpu() kvm_vcpu_init() kvm_arch_vcpu_init() kvm_mmu_create() kvm_arch_vcpu_setup() kvm_mmu_setup() kvm_init_mmu() This patch set reset_roots to false in kmv_mmu_setup(). Fixes: 50c28f21d045dde8c52548f8482d456b3f0956f5 Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20kvm: mmu: Don't read PDPTEs when paging is not enabledJunaid Shahid
kvm should not attempt to read guest PDPTEs when CR0.PG = 0 and CR4.PAE = 1. Signed-off-by: Junaid Shahid <junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20x86/kvm/lapic: always disable MMIO interface in x2APIC modeVitaly Kuznetsov
When VMX is used with flexpriority disabled (because of no support or if disabled with module parameter) MMIO interface to lAPIC is still available in x2APIC mode while it shouldn't be (kvm-unit-tests): PASS: apic_disable: Local apic enabled in x2APIC mode PASS: apic_disable: CPUID.1H:EDX.APIC[bit 9] is set FAIL: apic_disable: *0xfee00030: 50014 The issue appears because we basically do nothing while switching to x2APIC mode when APIC access page is not used. apic_mmio_{read,write} only check if lAPIC is disabled before proceeding to actual write. When APIC access is virtualized we correctly manipulate with VMX controls in vmx_set_virtual_apic_mode() and we don't get vmexits from memory writes in x2APIC mode so there's no issue. Disabling MMIO interface seems to be easy. The question is: what do we do with these reads and writes? If we add apic_x2apic_mode() check to apic_mmio_in_range() and return -EOPNOTSUPP these reads and writes will go to userspace. When lAPIC is in kernel, Qemu uses this interface to inject MSIs only (see kvm_apic_mem_write() in hw/i386/kvm/apic.c). This somehow works with disabled lAPIC but when we're in xAPIC mode we will get a real injected MSI from every write to lAPIC. Not good. The simplest solution seems to be to just ignore writes to the region and return ~0 for all reads when we're in x2APIC mode. This is what this patch does. However, this approach is inconsistent with what currently happens when flexpriority is enabled: we allocate APIC access page and create KVM memory region so in x2APIC modes all reads and writes go to this pre-allocated page which is, btw, the same for all vCPUs. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-19Merge tag 'hwmon-for-linus-v4.19-rc5' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Guenter writes: "Various bug fixes for nct6775 driver"
2018-09-19Merge tag 'scsi-fixes' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi James writes: "SCSI fixes on 20180919 A couple of small but important fixes, one affecting big endian and the other fixing a BUG_ON in scatterlist processing. Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>"
2018-09-19xen: issue warning message when out of grant maptrack entriesJuergen Gross
When a driver domain (e.g. dom0) is running out of maptrack entries it can't map any more foreign domain pages. Instead of silently stalling the affected domUs issue a rate limited warning in this case in order to make it easier to detect that situation. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2018-09-19xen/x86/vpmu: Zero struct pt_regs before calling into sample handling codeBoris Ostrovsky
Otherwise we may leak kernel stack for events that sample user registers. Reported-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: stable@vger.kernel.org
2018-09-19MAINTAINERS: Add Borislav to the x86 maintainersThomas Gleixner
Borislav is effectivly maintaining parts of X86 already, make it official. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Borislav Petkov <bp@alien8.de>
2018-09-19Merge tag 'perf-urgent-for-mingo-4.19-20180918' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent Pull perf/urgent fixes from Arnaldo Carvalho de Melo: - Fix the build on !_GNU_SOURCE libc systems such as Alpine Linux/musl libc due to usage of strerror_r glibc variant on libbpf (Arnaldo Carvalho de Melo) - Fix out-of-tree asciidoctor man page generation (Ben Hutchings) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-09-19x86/paravirt: Fix some warning messagesDan Carpenter
The first argument to WARN_ONCE() is a condition. Fixes: 5800dc5c19f3 ("x86/paravirt: Fix spectre-v2 mitigations for paravirt guests") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Alok Kataria <akataria@vmware.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: virtualization@lists.linux-foundation.org Cc: kernel-janitors@vger.kernel.org Link: https://lkml.kernel.org/r/20180919103553.GD9238@mwanda
2018-09-19drm: sun4i: drop second PLL from A64 HDMI PHYIcenowy Zheng
The A64 HDMI PHY seems to be not able to use the second video PLL as clock parent in experiments. Drop the support for the second PLL from A64 HDMI PHY driver. Fixes: b46e2c9f5f64 ("drm/sun4i: Add support for A64 HDMI PHY") Signed-off-by: Icenowy Zheng <icenowy@aosc.io> Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180916043409.62374-2-icenowy@aosc.io
2018-09-19Merge branch 'linus' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Crypto stuff from Herbert: "This push fixes a potential boot hang in ccp and an incorrect CPU capability check in aegis/morus on x86." * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: x86/aegis,morus - Do not require OSXSAVE for SSE2 crypto: ccp - add timeout support in the SEV command
2018-09-19Merge tag 'trace-v4.19-rc4' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Steven writes: "Vaibhav Nagarnaik found that modifying the ring buffer size could cause a huge latency in the system because it does a while loop to free pages without releasing the CPU (on non preempt kernels). In a case where there are hundreds of thousands of pages to free it could actually cause a system stall. A properly place cond_resched() solves this issue."
2018-09-19Merge tag 'platform-drivers-x86-v4.19-2' of ↵Greg Kroah-Hartman
git://git.infradead.org/linux-platform-drivers-x86 Darren writes: "platform-drivers-x86 for v4.19-2 Free allocated ACPI buffers in two drivers. The following is an automated git shortlog grouped by driver: alienware-wmi: - Correct a memory leak dell-smbios-wmi: - Correct a memory leak" * tag 'platform-drivers-x86-v4.19-2' of git://git.infradead.org/linux-platform-drivers-x86: platform/x86: alienware-wmi: Correct a memory leak platform/x86: dell-smbios-wmi: Correct a memory leak
2018-09-18pinctrl: cannonlake: Fix gpio base for GPP-ESimon Detheridge
The gpio base for GPP-E was set incorrectly to 258 instead of 256, preventing the touchpad working on my Tong Fang GK5CN5Z laptop. Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=200787 Signed-off-by: Simon Detheridge <s@sd.ai> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2018-09-19Merge tag 'efi-urgent' of ↵Thomas Gleixner
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi into efi/urgent Pull EFI fix from Ard Biesheuvel: Apply a fix from Scott to make the ARM stub's DTB loader opt-out rather than opt-in.
2018-09-18x86/intel_rdt: Fix incorrect loop end conditionReinette Chatre
In order to determine a sane default cache allocation for a new CAT/CDP resource group, all resource groups are checked to determine which cache portions are available to share. At this time all possible CLOSIDs that can be supported by the resource is checked. This is problematic if the resource supports more CLOSIDs than another CAT/CDP resource. In this case, the number of CLOSIDs that could be allocated are fewer than the number of CLOSIDs that can be supported by the resource. Limit the check of closids to that what is supported by the system based on the minimum across all resources. Fixes: 95f0b77ef ("x86/intel_rdt: Initialize new resource group with sane defaults") Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "H Peter Anvin" <hpa@zytor.com> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Xiaochen Shen" <xiaochen.shen@intel.com> Cc: "Chen Yu" <yu.c.chen@intel.com> Link: https://lkml.kernel.org/r/1537048707-76280-10-git-send-email-fenghua.yu@intel.com
2018-09-18x86/intel_rdt: Fix exclusive mode handling of MBA resourceReinette Chatre
It is possible for a resource group to consist out of MBA as well as CAT/CDP resources. The "exclusive" resource mode only applies to the CAT/CDP resources since MBA allocations cannot be specified to overlap or not. When a user requests a resource group to become "exclusive" then it can only be successful if there are CAT/CDP resources in the group and none of their CBMs associated with the group's CLOSID overlaps with any other resource group. Fix the "exclusive" mode setting by failing if there isn't any CAT/CDP resource in the group and ensuring that the CBM checking is only done on CAT/CDP resources. Fixes: 49f7b4efa ("x86/intel_rdt: Enable setting of exclusive mode") Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "H Peter Anvin" <hpa@zytor.com> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Xiaochen Shen" <xiaochen.shen@intel.com> Cc: "Chen Yu" <yu.c.chen@intel.com> Link: https://lkml.kernel.org/r/1537048707-76280-9-git-send-email-fenghua.yu@intel.com
2018-09-18x86/intel_rdt: Fix incorrect loop end conditionReinette Chatre
A loop is used to check if a CAT resource's CBM of one CLOSID overlaps with the CBM of another CLOSID of the same resource. The loop is run over all CLOSIDs supported by the resource. The problem with running the loop over all CLOSIDs supported by the resource is that its number of supported CLOSIDs may be more than the number of supported CLOSIDs on the system, which is the minimum number of CLOSIDs supported across all resources. Fix the loop to only consider the number of system supported CLOSIDs, not all that are supported by the resource. Fixes: 49f7b4efa ("x86/intel_rdt: Enable setting of exclusive mode") Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "H Peter Anvin" <hpa@zytor.com> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Xiaochen Shen" <xiaochen.shen@intel.com> Cc: "Chen Yu" <yu.c.chen@intel.com> Link: https://lkml.kernel.org/r/1537048707-76280-8-git-send-email-fenghua.yu@intel.com
2018-09-18x86/intel_rdt: Do not allow pseudo-locking of MBA resourceReinette Chatre
A system supporting pseudo-locking may have MBA as well as CAT resources of which only the CAT resources could support cache pseudo-locking. When the schemata to be pseudo-locked is provided it should be checked that that schemata does not attempt to pseudo-lock a MBA resource. Fixes: e0bdfe8e3 ("x86/intel_rdt: Support creation/removal of pseudo-locked region") Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "H Peter Anvin" <hpa@zytor.com> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Xiaochen Shen" <xiaochen.shen@intel.com> Cc: "Chen Yu" <yu.c.chen@intel.com> Link: https://lkml.kernel.org/r/1537048707-76280-7-git-send-email-fenghua.yu@intel.com
2018-09-18x86/intel_rdt: Fix unchecked MSR accessReinette Chatre
When a new resource group is created, it is initialized with sane defaults that currently assume the resource being initialized is a CAT resource. This code path is also followed by a MBA resource that is not allocated the same as a CAT resource and as a result we encounter the following unchecked MSR access error: unchecked MSR access error: WRMSR to 0xd51 (tried to write 0x0000 000000000064) at rIP: 0xffffffffae059994 (native_write_msr+0x4/0x20) Call Trace: mba_wrmsr+0x41/0x80 update_domains+0x125/0x130 rdtgroup_mkdir+0x270/0x500 Fix the above by ensuring the initial allocation is only attempted on a CAT resource. Fixes: 95f0b77ef ("x86/intel_rdt: Initialize new resource group with sane defaults") Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "H Peter Anvin" <hpa@zytor.com> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Xiaochen Shen" <xiaochen.shen@intel.com> Cc: "Chen Yu" <yu.c.chen@intel.com> Link: https://lkml.kernel.org/r/1537048707-76280-6-git-send-email-fenghua.yu@intel.com
2018-09-18x86/intel_rdt: Fix invalid mode warning when multiple resources are managedReinette Chatre
When multiple resources are managed by RDT, the number of CLOSIDs used is the minimum of the CLOSIDs supported by each resource. In the function rdt_bit_usage_show(), the annotated bitmask is created to depict how the CAT supporting caches are being used. During this annotated bitmask creation, each resource group is queried for its mode that is used as a label in the annotated bitmask. The maximum number of resource groups is currently assumed to be the number of CLOSIDs supported by the resource for which the information is being displayed. This is incorrect since the number of active CLOSIDs is the minimum across all resources. If information for a cache instance with more CLOSIDs than another is being generated we thus encounter a warning like: invalid mode for closid 8 WARNING: CPU: 88 PID: 1791 at [SNIP]/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c :827 rdt_bit_usage_show+0x221/0x2b0 Fix this by ensuring that only the number of supported CLOSIDs are considered. Fixes: e651901187ab8 ("x86/intel_rdt: Introduce "bit_usage" to display cache allocations details") Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "H Peter Anvin" <hpa@zytor.com> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Xiaochen Shen" <xiaochen.shen@intel.com> Cc: "Chen Yu" <yu.c.chen@intel.com> Link: https://lkml.kernel.org/r/1537048707-76280-5-git-send-email-fenghua.yu@intel.com
2018-09-18x86/intel_rdt: Global closid helper to support future fixesReinette Chatre
The number of CLOSIDs supported by a system is the minimum number of CLOSIDs supported by any of its resources. Care should be taken when iterating over the CLOSIDs of a resource since it may be that the number of CLOSIDs supported on the system is less than the number of CLOSIDs supported by the resource. Introduce a helper function that can be used to query the number of CLOSIDs that is supported by all resources, irrespective of how many CLOSIDs are supported by a particular resource. Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "H Peter Anvin" <hpa@zytor.com> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Xiaochen Shen" <xiaochen.shen@intel.com> Cc: "Chen Yu" <yu.c.chen@intel.com> Link: https://lkml.kernel.org/r/1537048707-76280-4-git-send-email-fenghua.yu@intel.com
2018-09-18x86/intel_rdt: Fix size reporting of MBA resourceReinette Chatre
Chen Yu reported a divide-by-zero error when accessing the 'size' resctrl file when a MBA resource is enabled. divide error: 0000 [#1] SMP PTI CPU: 93 PID: 1929 Comm: cat Not tainted 4.19.0-rc2-debug-rdt+ #25 RIP: 0010:rdtgroup_cbm_to_size+0x7e/0xa0 Call Trace: rdtgroup_size_show+0x11a/0x1d0 seq_read+0xd8/0x3b0 Quoting Chen Yu's report: This is because for MB resource, the r->cache.cbm_len is zero, thus calculating size in rdtgroup_cbm_to_size() will trigger the exception. Fix this issue in the 'size' file by getting correct memory bandwidth value which is in MBps when MBA software controller is enabled or in percentage when MBA software controller is disabled. Fixes: d9b48c86eb38 ("x86/intel_rdt: Display resource groups' allocations in bytes") Reported-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Chen Yu <yu.c.chen@intel.com> Cc: "H Peter Anvin" <hpa@zytor.com> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Xiaochen Shen" <xiaochen.shen@intel.com> Link: https://lkml.kernel.org/r/20180904174614.26682-1-yu.c.chen@intel.com Link: https://lkml.kernel.org/r/1537048707-76280-3-git-send-email-fenghua.yu@intel.com
2018-09-18x86/intel_rdt: Fix data type in parsing callbacksXiaochen Shen
Each resource is associated with a parsing callback to parse the data provided from user space when writing schemata file. The 'data' parameter in the callbacks is defined as a void pointer which is error prone due to lack of type check. parse_bw() processes the 'data' parameter as a string while its caller actually passes the parameter as a pointer to struct rdt_cbm_parse_data. Thus, parse_bw() takes wrong data and causes failure of parsing MBA throttle value. To fix the issue, the 'data' parameter in all parsing callbacks is defined and handled as a pointer to struct rdt_parse_data (renamed from struct rdt_cbm_parse_data). Fixes: 7604df6e16ae ("x86/intel_rdt: Support flexible data to parsing callbacks") Fixes: 9ab9aa15c309 ("x86/intel_rdt: Ensure requested schemata respects mode") Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "H Peter Anvin" <hpa@zytor.com> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Chen Yu" <yu.c.chen@intel.com> Link: https://lkml.kernel.org/r/1537048707-76280-2-git-send-email-fenghua.yu@intel.com
2018-09-18Merge tag 'gvt-fixes-2018-09-18' of https://github.com/intel/gvt-linux into ↵Rodrigo Vivi
drm-intel-fixes gvt-fixes-2018-09-18 - Fix initial DPIO PHY register state for BXT (Colin) - BXT untracked GEN9_CLKGATE_DIS_4 warning fix (Colin) - Fix srcu lock for GFN valid check (Weinan) - Should clear GGTT entry value after vGPU destroy (Zhipeng) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> From: Zhenyu Wang <zhenyuw@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180918073349.GQ20737@zhen-hp.sh.intel.com
2018-09-18perf Documentation: Fix out-of-tree asciidoctor man page generationBen Hutchings
The dependency for the man page rule using asciidoctor incorrectly specifies a source file in $(OUTPUT). When building out-of-tree, the source file is not found, resulting in a fall-back to the following rule which uses xmlto. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180916151704.GF4765@decadent.org.uk Fixes: ffef80ecf89f ("perf Documentation: Support for asciidoctor") Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-09-18tools lib bpf: Provide wrapper for strerror_r to build in !_GNU_SOURCE systemsArnaldo Carvalho de Melo
Same problem that got fixed in a similar fashion in tools/perf/ in c8b5f2c96d1b ("tools: Introduce str_error_r()"), fix it in the same way, licensing needs to be sorted out to libbpf to use libapi, so, for this simple case, just get the same wrapper in tools/lib/bpf. This makes libbpf and its users (bpftool, selftests, perf) to build again in Alpine Linux 3.[45678] and edge. Acked-by: Alexei Starovoitov <ast@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David Ahern <dsahern@gmail.com> Cc: Hendrik Brueckner <brueckner@linux.ibm.com> Cc: Jakub Kicinski <jakub.kicinski@netronome.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Quentin Monnet <quentin.monnet@netronome.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Wang Nan <wangnan0@huawei.com> Cc: Yonghong Song <yhs@fb.com> Fixes: 1ce6a9fc1549 ("bpf: fix build error in libbpf with EXTRA_CFLAGS="-Wp, -D_FORTIFY_SOURCE=2 -O2"") Link: https://lkml.kernel.org/r/20180917151636.GA21790@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-09-18Merge tag 'kvm-ppc-fixes-4.19-2' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD Second set of PPC KVM fixes for 4.19 Two fixes for KVM on POWER machines. Both of these relate to memory corruption and host crashes seen when transparent huge pages are enabled. The first fixes a host crash that can occur when a DMA mapping is removed by the guest and the page mapped was part of a transparent huge page; the second fixes corruption that could occur when a hypervisor page fault for a radix guest is being serviced at the same time that the backing page is being collapsed or split.
2018-09-18Merge tag 'kvm-s390-master-4.19-2' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: Fixes for 4.19 - more fallout from the hugetlbfs enablement - bugfix for vma handling
2018-09-18drm: fix drm_drv_uses_atomic_modeset on non modesetting drivers.Dave Airlie
vgem seems to oops on the intel CI due to the vgem debugfs init hitting this path now. Check if we have mode_config funcs before checking one. Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20180918062018.24942-1-airlied@gmail.com
2018-09-18mtd: devices: m25p80: Make sure the buffer passed in op is DMA-ableBoris Brezillon
As documented in spi-mem.h, spi_mem_op->data.buf.{in,out} must be DMA-able, and commit 4120f8d158ef ("mtd: spi-nor: Use the spi_mem_xx() API") failed to follow this rule as buffers passed to ->{read,write}_reg() are usually placed on the stack. Fix that by allocating a scratch buffer and copying the data around. Fixes: 4120f8d158ef ("mtd: spi-nor: Use the spi_mem_xx() API") Reported-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com> Tested-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Reviewed-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
2018-09-18Merge gitolite.kernel.org:/pub/scm/linux/kernel/git/davem/netGreg Kroah-Hartman
Dave writes: "Various fixes, all over the place: 1) OOB data generation fix in bluetooth, from Matias Karhumaa. 2) BPF BTF boundary calculation fix, from Martin KaFai Lau. 3) Don't bug on excessive frags, to be compatible in situations mixing older and newer kernels on each end. From Juergen Gross. 4) Scheduling in RCU fix in hv_netvsc, from Stephen Hemminger. 5) Zero keying information in TLS layer before freeing copies of them, from Sabrina Dubroca. 6) Fix NULL deref in act_sample, from Davide Caratti. 7) Orphan SKB before GRO in veth to prevent crashes with XDP, from Toshiaki Makita. 8) Fix use after free in ip6_xmit, from Eric Dumazet. 9) Fix VF mac address regression in bnxt_en, from Micahel Chan. 10) Fix MSG_PEEK behavior in TLS layer, from Daniel Borkmann. 11) Programming adjustments to r8169 which fix not being to enter deep sleep states on some machines, from Kai-Heng Feng and Hans de Goede. 12) Fix DST_NOCOUNT flag handling for ipv6 routes, from Peter Oskolkov." * gitolite.kernel.org:/pub/scm/linux/kernel/git/davem/net: (45 commits) net/ipv6: do not copy dst flags on rt init qmi_wwan: set DTR for modems in forced USB2 mode clk: x86: Stop marking clocks as CLK_IS_CRITICAL r8169: Get and enable optional ether_clk clock clk: x86: add "ether_clk" alias for Bay Trail / Cherry Trail r8169: enable ASPM on RTL8106E r8169: Align ASPM/CLKREQ setting function with vendor driver Revert "kcm: remove any offset before parsing messages" kcm: remove any offset before parsing messages net: ethernet: Fix a unused function warning. net: dsa: mv88e6xxx: Fix ATU Miss Violation tls: fix currently broken MSG_PEEK behavior hv_netvsc: pair VF based on serial number PCI: hv: support reporting serial number as slot information bnxt_en: Fix VF mac address regression. ipv6: fix possible use-after-free in ip6_xmit() net: hp100: fix always-true check for link up state ARM: dts: at91: add new compatibility string for macb on sama5d3 net: macb: disable scatter-gather for macb on sama5d3 net: mvpp2: let phylink manage the carrier state ...
2018-09-17net/ipv6: do not copy dst flags on rt initPeter Oskolkov
DST_NOCOUNT in dst_entry::flags tracks whether the entry counts toward route cache size (net->ipv6.sysctl.ip6_rt_max_size). If the flag is NOT set, dst_ops::pcpuc_entries counter is incremented in dist_init() and decremented in dst_destroy(). This flag is tied to allocation/deallocation of dst_entry and should not be copied from another dst/route. Otherwise it can happen that dst_ops::pcpuc_entries counter grows until no new routes can be allocated because the counter reached ip6_rt_max_size due to DST_NOCOUNT not set and thus no counter decrements on gc-ed routes. Fixes: 3b6761d18bc1 ("net/ipv6: Move dst flags to booleans in fib entries") Cc: David Ahern <dsahern@gmail.com> Acked-by: Wei Wang <weiwan@google.com> Signed-off-by: Peter Oskolkov <posk@google.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-18drm/i915/gvt: clear ggtt entries when destroy vgpuZhipeng Gong
When one vgpu is destroyed, its ggtt entries are not cleared. This patch clears ggtt entries to avoid information leak. v2: add 'Fixes' tag (Zhenyu) Fixes: 2707e4446688 ("drm/i915/gvt: vGPU graphics memory virtualization") Signed-off-by: Zhipeng Gong <zhipeng.gong@intel.com> Reviewed-by: Hang Yuan <hang.yuan@intel.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
2018-09-18drm/i915/gvt: request srcu_read_lock before checking if one gfn is validWeinan Li
Fix the suspicious RCU usage issue in intel_vgpu_emulate_mmio_write. Here need to request the srcu read lock of kvm->srcu before doing gfn_to_memslot(). The detailed log is as below: [ 218.710688] ============================= [ 218.710690] WARNING: suspicious RCU usage [ 218.710693] 4.14.15-dd+ #314 Tainted: G U [ 218.710695] ----------------------------- [ 218.710697] ./include/linux/kvm_host.h:575 suspicious rcu_dereference_check() usage! [ 218.710699] other info that might help us debug this: [ 218.710702] rcu_scheduler_active = 2, debug_locks = 1 [ 218.710704] 1 lock held by qemu-system-x86/2144: [ 218.710706] #0: (&gvt->lock){+.+.}, at: [<ffffffff816a1eea>] intel_vgpu_emulate_mmio_write+0x5a/0x2d0 [ 218.710721] stack backtrace: [ 218.710724] CPU: 0 PID: 2144 Comm: qemu-system-x86 Tainted: G U 4.14.15-dd+ #314 [ 218.710727] Hardware name: Dell Inc. OptiPlex 7040/0Y7WYT, BIOS 1.1.1 10/07/2015 [ 218.710729] Call Trace: [ 218.710734] dump_stack+0x7c/0xb3 [ 218.710739] gfn_to_memslot+0x15f/0x170 [ 218.710743] kvm_is_visible_gfn+0xa/0x30 [ 218.710746] intel_vgpu_emulate_gtt_mmio_write+0x267/0x3c0 [ 218.710751] ? __mutex_unlock_slowpath+0x3b/0x260 [ 218.710754] intel_vgpu_emulate_mmio_write+0x182/0x2d0 [ 218.710759] intel_vgpu_rw+0xba/0x170 [kvmgt] [ 218.710763] intel_vgpu_write+0x14d/0x1a0 [kvmgt] [ 218.710767] __vfs_write+0x23/0x130 [ 218.710770] vfs_write+0xb0/0x1b0 [ 218.710774] SyS_pwrite64+0x73/0x90 [ 218.710777] entry_SYSCALL_64_fastpath+0x25/0x9c [ 218.710780] RIP: 0033:0x7f33e8a91da3 [ 218.710783] RSP: 002b:00007f33dddc8700 EFLAGS: 00000293 v2: add 'Fixes' tag, refine log format.(Zhenyu) Fixes: cc753fbe1ac4 ("drm/i915/gvt: validate gfn before set shadow page") Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com> Signed-off-by: Weinan Li <weinan.z.li@intel.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
2018-09-18drm/i915/gvt: Add GEN9_CLKGATE_DIS_4 to default BXT mmio handlerColin Xu
Host prints lots of untracked MMIO at 0x4653c when creating linux guest. "gvt: vgpu 2: untracked MMIO 0004653c len 4" GEN9_CLKGATE_DIS_4 (0x4653c) is accessed by i915 for gmbus clockgating. However vgpu doesn't support any clockgating powergating operations on related mmio access trap so need add it to default handler. GEN9_CLKGATE_DIS_4 is accessed in bxt_gmbus_clock_gating() which only applies to GEN9_LP so doens't show the warning on other platforms. The solution is to add it to default handler init_bxt_mmio_info(). Reviewed-by: He, Min <min.he@intel.com> Signed-off-by: Colin Xu <colin.xu@intel.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>