aboutsummaryrefslogtreecommitdiffstats
path: root/arch/x86/include/asm/msr-index.h
AgeCommit message (Collapse)Author
2016-07-07x86: remove duplicate turbo ratio limit MSRsSrinivas Pandruvada
Remove MSR_NHM_TURBO_RATIO_LIMIT and MSR_IVT_TURBO_RATIO_LIMIT as they are duplicate. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-05-05perf/x86/intel/pt: Add IP filtering register/CPUID bitsAlexander Shishkin
New versions of Intel PT support address range-based filtering. Add the new registers, bit definitions and relevant CPUID bits. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: vince@deater.net Link: http://lkml.kernel.org/r/1461771888-10409-4-git-send-email-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-05perf/x86/intel/pt: Move PT specific MSR bit definitions to a private headerAlexander Shishkin
Nothing outside of the Intel PT driver should ever care about its MSR bits, so there is no reason to keep them in msr-index.h. This patch moves them to a pt-local header. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: vince@deater.net Link: http://lkml.kernel.org/r/1461771888-10409-3-git-send-email-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-23perf/x86/intel/rapl: Support Skylake RAPL domainsSrinivas Pandruvada
Add Skylake client support for RAPL domains. In addition to RAPL domains in Broadwell clients, it has support for platform domain (aka PSys). The PSys domain controls the entire SoC instead of just a CPU package. Unlike package domain, PSys support requires more than just processor level implementation. The other parts in the system need additional HW level signaling, which OEMs need to support. When not supported, the energy counter register in PSys domain returns 0. Also corrected error in comment for GPU counter, which previously was DRAM counter. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com [ Cnverted to model_match stuff. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: bp@alien8.de Cc: hpa@zytor.com Cc: jacob.jun.pan@linux.intel.com Cc: rjw@rjwysocki.net Link: http://lkml.kernel.org/r/1460930581-29748-2-git-send-email-srinivas.pandruvada@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13Merge tag 'v4.6-rc3' into perf/core, to refresh the treeIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-08Merge branches 'pm-core', 'powercap' and 'pm-tools'Rafael J. Wysocki
* pm-core: PM / wakeirq: fix wakeirq setting after wakup re-configuration from sysfs PM / runtime: Document steps for device removal * powercap: powercap: intel_rapl: Add missing Haswell model * pm-tools: tools/power turbostat: work around RC6 counter wrap tools/power turbostat: initial KBL support tools/power turbostat: initial SKX support tools/power turbostat: decode BXT TSC frequency via CPUID tools/power turbostat: initial BXT support tools/power turbostat: print IRTL MSRs tools/power turbostat: SGX state should print only if --debug
2016-04-07tools/power turbostat: print IRTL MSRsLen Brown
Some processors use the Interrupt Response Time Limit (IRTL) MSR value to describe the maximum IRQ response time latency for deep package C-states. (Though others have the register, but do not use it) Lets print it out to give insight into the cases where it is used. IRTL begain in SNB, with PC3/PC6/PC7, and HSW added PC8/PC9/PC10. Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-31perf/x86/msr: Add AMD IRPERF (Instructions Retired) performance counterHuang Rui
AMD Zeppelin (Family 17h, Model 00h) introduces an instructions retired performance counter which is indicated by CPUID.8000_0008H:EBX[1]. A dedicated Instructions Retired MSR register (MSR 0xC000_000E9) increments once for every instruction retired. Signed-off-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Jacob Shin <jacob.w.shin@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <rric@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1454056197-5893-3-git-send-email-ray.huang@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-31perf/x86/msr: Add AMD PTSC (Performance Time-Stamp Counter) supportHuang Rui
AMD Carrizo (Family 15h, Model 60h) introduces a time-stamp counter which is indicated by CPUID.8000_0001H:ECX[27]. It increments at a 100 MHz rate in all P-states, and C states, S0, or S1. The frequency is about 100MHz. This counter will be used to calculate processor power and other parts. So add an interface into the MSR PMU to get the PTSC counter value. Signed-off-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Jacob Shin <jacob.w.shin@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <rric@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1454056197-5893-2-git-send-email-ray.huang@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-29x86/cpufreq: Remove duplicated TDP MSR macro definitionsVladimir Zapolskiy
The list of CPU model specific registers contains two copies of TDP registers, remove the one, which is out of numerical order in the list. Fixes: 6a35fc2d6c22 ("cpufreq: intel_pstate: get P1 from TAR when available") Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com> Cc: Len Brown <len.brown@intel.com> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Kristen Carlson Accardi <kristen@linux.intel.com> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: http://lkml.kernel.org/r/1459018020-24577-1-git-send-email-vladimir_zapolskiy@mentor.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-03-16Merge tag 'pm+acpi-4.6-rc1-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management and ACPI updates from Rafael Wysocki: "This time the majority of changes go into cpufreq and they are significant. First off, the way CPU frequency updates are triggered is different now. Instead of having to set up and manage a deferrable timer for each CPU in the system to evaluate and possibly change its frequency periodically, cpufreq governors set up callbacks to be invoked by the scheduler on a regular basis (basically on utilization updates). The "old" governors, "ondemand" and "conservative", still do all of their work in process context (although that is triggered by the scheduler now), but intel_pstate does it all in the callback invoked by the scheduler with no need for any additional asynchronous processing. Of course, this eliminates the overhead related to the management of all those timers, but also it allows the cpufreq governor code to be simplified quite a bit. On top of that, the common code and data structures used by the "ondemand" and "conservative" governors are cleaned up and made more straightforward and some long-standing and quite annoying problems are addressed. In particular, the handling of governor sysfs attributes is modified and the related locking becomes more fine grained which allows some concurrency problems to be avoided (particularly deadlocks with the core cpufreq code). In principle, the new mechanism for triggering frequency updates allows utilization information to be passed from the scheduler to cpufreq. Although the current code doesn't make use of it, in the works is a new cpufreq governor that will make decisions based on the scheduler's utilization data. That should allow the scheduler and cpufreq to work more closely together in the long run. In addition to the core and governor changes, cpufreq drivers are updated too. Fixes and optimizations go into intel_pstate, the cpufreq-dt driver is updated on top of some modification in the Operating Performance Points (OPP) framework and there are fixes and other updates in the powernv cpufreq driver. Apart from the cpufreq updates there is some new ACPICA material, including a fix for a problem introduced by previous ACPICA updates, and some less significant changes in the ACPI code, like CPPC code optimizations, ACPI processor driver cleanups and support for loading ACPI tables from initrd. Also updated are the generic power domains framework, the Intel RAPL power capping driver and the turbostat utility and we have a bunch of traditional assorted fixes and cleanups. Specifics: - Redesign of cpufreq governors and the intel_pstate driver to make them use callbacks invoked by the scheduler to trigger CPU frequency evaluation instead of using per-CPU deferrable timers for that purpose (Rafael Wysocki). - Reorganization and cleanup of cpufreq governor code to make it more straightforward and fix some concurrency problems in it (Rafael Wysocki, Viresh Kumar). - Cleanup and improvements of locking in the cpufreq core (Viresh Kumar). - Assorted cleanups in the cpufreq core (Rafael Wysocki, Viresh Kumar, Eric Biggers). - intel_pstate driver updates including fixes, optimizations and a modification to make it enable enable hardware-coordinated P-state selection (HWP) by default if supported by the processor (Philippe Longepe, Srinivas Pandruvada, Rafael Wysocki, Viresh Kumar, Felipe Franciosi). - Operating Performance Points (OPP) framework updates to improve its handling of voltage regulators and device clocks and updates of the cpufreq-dt driver on top of that (Viresh Kumar, Jon Hunter). - Updates of the powernv cpufreq driver to fix initialization and cleanup problems in it and correct its worker thread handling with respect to CPU offline, new powernv_throttle tracepoint (Shilpasri Bhat). - ACPI cpufreq driver optimization and cleanup (Rafael Wysocki). - ACPICA updates including one fix for a regression introduced by previos changes in the ACPICA code (Bob Moore, Lv Zheng, David Box, Colin Ian King). - Support for installing ACPI tables from initrd (Lv Zheng). - Optimizations of the ACPI CPPC code (Prashanth Prakash, Ashwin Chaugule). - Support for _HID(ACPI0010) devices (ACPI processor containers) and ACPI processor driver cleanups (Sudeep Holla). - Support for ACPI-based enumeration of the AMBA bus (Graeme Gregory, Aleksey Makarov). - Modification of the ACPI PCI IRQ management code to make it treat 255 in the Interrupt Line register as "not connected" on x86 (as per the specification) and avoid attempts to use that value as a valid interrupt vector (Chen Fan). - ACPI APEI fixes related to resource leaks (Josh Hunt). - Removal of modularity from a few ACPI drivers (BGRT, GHES, intel_pmic_crc) that cannot be built as modules in practice (Paul Gortmaker). - PNP framework update to make it treat ACPI_RESOURCE_TYPE_SERIAL_BUS as a valid resource type (Harb Abdulhamid). - New device ID (future AMD I2C controller) in the ACPI driver for AMD SoCs (APD) and in the designware I2C driver (Xiangliang Yu). - Assorted ACPI cleanups (Colin Ian King, Kaiyen Chang, Oleg Drokin). - cpuidle menu governor optimization to avoid a square root computation in it (Rasmus Villemoes). - Fix for potential use-after-free in the generic device properties framework (Heikki Krogerus). - Updates of the generic power domains (genpd) framework including support for multiple power states of a domain, fixes and debugfs output improvements (Axel Haslam, Jon Hunter, Laurent Pinchart, Geert Uytterhoeven). - Intel RAPL power capping driver updates to reduce IPI overhead in it (Jacob Pan). - System suspend/hibernation code cleanups (Eric Biggers, Saurabh Sengar). - Year 2038 fix for the process freezer (Abhilash Jindal). - turbostat utility updates including new features (decoding of more registers and CPUID fields, sub-second intervals support, GFX MHz and RC6 printout, --out command line option), fixes (syscall jitter detection and workaround, reductioin of the number of syscalls made, fixes related to Xeon x200 processors, compiler warning fixes) and cleanups (Len Brown, Hubert Chrzaniuk, Chen Yu)" * tag 'pm+acpi-4.6-rc1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (182 commits) tools/power turbostat: bugfix: TDP MSRs print bits fixing tools/power turbostat: correct output for MSR_NHM_SNB_PKG_CST_CFG_CTL dump tools/power turbostat: call __cpuid() instead of __get_cpuid() tools/power turbostat: indicate SMX and SGX support tools/power turbostat: detect and work around syscall jitter tools/power turbostat: show GFX%rc6 tools/power turbostat: show GFXMHz tools/power turbostat: show IRQs per CPU tools/power turbostat: make fewer systems calls tools/power turbostat: fix compiler warnings tools/power turbostat: add --out option for saving output in a file tools/power turbostat: re-name "%Busy" field to "Busy%" tools/power turbostat: Intel Xeon x200: fix turbo-ratio decoding tools/power turbostat: Intel Xeon x200: fix erroneous bclk value tools/power turbostat: allow sub-sec intervals ACPI / APEI: ERST: Fixed leaked resources in erst_init ACPI / APEI: Fix leaked resources intel_pstate: Do not skip samples partially intel_pstate: Remove freq calculation from intel_pstate_calc_busy() intel_pstate: Move intel_pstate_calc_busy() into get_target_pstate_use_performance() ...
2016-03-14Merge branch 'turbostat' of ↵Rafael J. Wysocki
git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux into pm-tools Pull turbostat updates for 4.6 from Len Brown. * 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: tools/power turbostat: bugfix: TDP MSRs print bits fixing tools/power turbostat: correct output for MSR_NHM_SNB_PKG_CST_CFG_CTL dump tools/power turbostat: call __cpuid() instead of __get_cpuid() tools/power turbostat: indicate SMX and SGX support tools/power turbostat: detect and work around syscall jitter tools/power turbostat: show GFX%rc6 tools/power turbostat: show GFXMHz tools/power turbostat: show IRQs per CPU tools/power turbostat: make fewer systems calls tools/power turbostat: fix compiler warnings tools/power turbostat: add --out option for saving output in a file tools/power turbostat: re-name "%Busy" field to "Busy%" tools/power turbostat: Intel Xeon x200: fix turbo-ratio decoding tools/power turbostat: Intel Xeon x200: fix erroneous bclk value tools/power turbostat: allow sub-sec intervals tools/power turbostat: Decode MSR_MISC_PWR_MGMT tools/power turbostat: decode HWP registers x86 msr-index: Simplify syntax for HWP fields tools/power turbostat: CPUID(0x16) leaf shows base, max, and bus frequency tools/power turbostat: decode more CPUID fields
2016-02-17x86/msr: Document msr-index.h rule for additionBorislav Petkov
In order to keep this file's size sensible and not cause too much unnecessary churn, make the rule explicit - similar to pci_ids.h - that only MSRs which are used in multiple compilation units, should get added to it. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: alex.williamson@redhat.com Cc: gleb@kernel.org Cc: joro@8bytes.org Cc: kvm@vger.kernel.org Cc: sherry.hurwitz@amd.com Cc: wei@redhat.com Link: http://lkml.kernel.org/r/1455612202-14414-5-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-17x86 msr-index: Simplify syntax for HWP fieldsLen Brown
syntax only, no functional change Signed-off-by: Len Brown <len.brown@intel.com>
2015-11-24x86/cpu/amd, kvm: Satisfy guest kernel reads of IC_CFG MSRBorislav Petkov
The kernel accesses IC_CFG MSR (0xc0011021) on AMD because it checks whether the way access filter is enabled on some F15h models, and, if so, disables it. kvm doesn't handle that MSR access and complains about it, which can get really noisy in dmesg when one starts kvm guests all the time for testing. And it is useless anyway - guest kernel shouldn't be doing such changes anyway so tell it that that filter is disabled. Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1448273546-2567-4-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-16Merge branch 'pm-tools'Rafael J. Wysocki
* pm-tools: x86: remove unused definition of MSR_NHM_PLATFORM_INFO tools/power turbostat: use new name for MSR_PLATFORM_INFO
2015-11-13x86: remove unused definition of MSR_NHM_PLATFORM_INFOLen Brown
MSR_NHM_PLATFORM_INFO has been replaced by... MSR_PLATFORM_INFO Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-10-15cpufreq: intel_pstate: get P1 from TAR when availableSrinivas Pandruvada
After Ivybridge, the max non turbo ratio obtained from platform info msr is not always guaranteed P1 on client platforms. The max non turbo activation ratio (TAR), determines the max for the current level of TDP. The ratio in platform info is physical max. The TAR MSR can be locked, so updating this value is not possible on all platforms. This change gets this ratio from MSR TURBO_ACTIVATION_RATIO if available, but also do some sanity checking to make sure that this value is correct. The sanity check involves reading the TDP ratio for the current tdp control value when platform has configurable TDP present and matching TAC with this. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Acked-by: Kristen Carlson Accardi <kristen@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-09-27Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Thomas Gleixner: "Another pile of fixes for perf: - Plug overflows and races in the core code - Sanitize the flow of the perf syscall so we error out before handling the more complex and hard to undo setups - Improve and fix Broadwell and Skylake hardware support - Revert a fix which broke what it tried to fix in perf tools - A couple of smaller fixes in various places of perf tools" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf tools: Fix copying of /proc/kcore perf intel-pt: Remove no_force_psb from documentation perf probe: Use existing routine to look for a kernel module by dso->short_name perf/x86: Change test_aperfmperf() and test_intel() to static tools lib traceevent: Fix string handling in heterogeneous arch environments perf record: Avoid infinite loop at buildid processing with no samples perf: Fix races in computing the header sizes perf: Fix u16 overflows perf: Restructure perf syscall point of no return perf/x86/intel: Fix Skylake FRONTEND MSR extrareg mask perf/x86/intel/pebs: Add PEBS frontend profiling for Skylake perf/x86/intel: Make the CYCLE_ACTIVITY.* constraint on Broadwell more specific perf tools: Bool functions shouldn't return -1 tools build: Add test for presence of __get_cpuid() gcc builtin tools build: Add test for presence of numa_num_possible_cpus() in libnuma Revert "perf symbols: Fix mismatched declarations for elf_getphdrnum" perf stat: Fix per-pkg event reporting bug
2015-09-21KVM: x86: trap AMD MSRs for the TSeg base and maskPaolo Bonzini
These have roughly the same purpose as the SMRR, which we do not need to implement in KVM. However, Linux accesses MSR_K8_TSEG_ADDR at boot, which causes problems when running a Xen dom0 under KVM. Just return 0, meaning that processor protection of SMRAM is not in effect. Reported-by: M A Young <m.a.young@durham.ac.uk> Cc: stable@vger.kernel.org Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-09-18perf/x86/intel/pebs: Add PEBS frontend profiling for SkylakeAndi Kleen
Skylake has a new FRONTEND_LATENCY PEBS event to accurately profile frontend problems (like ITLB or decoding issues). The new event is configured through a separate MSR, which selects a range of sub events. Define the extra MSR as a extra reg and export support for it through sysfs. To avoid duplicating the existing tables use a new function to add new entries to existing tables. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1435707205-6676-4-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-01Merge tag 'pm+acpi-4.3-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management and ACPI updates from Rafael Wysocki: "From the number of commits perspective, the biggest items are ACPICA and cpufreq changes with the latter taking the lead (over 50 commits). On the cpufreq front, there are many cleanups and minor fixes in the core and governors, driver updates etc. We also have a new cpufreq driver for Mediatek MT8173 chips. ACPICA mostly updates its debug infrastructure and adds a number of fixes and cleanups for a good measure. The Operating Performance Points (OPP) framework is updated with new DT bindings and support for them among other things. We have a few updates of the generic power domains framework and a reorganization of the ACPI device enumeration code and bus type operations. And a lot of fixes and cleanups all over. Included is one branch from the MFD tree as it contains some PM-related driver core and ACPI PM changes a few other commits are based on. Specifics: - ACPICA update to upstream revision 20150818 including method tracing extensions to allow more in-depth AML debugging in the kernel and a number of assorted fixes and cleanups (Bob Moore, Lv Zheng, Markus Elfring). - ACPI sysfs code updates and a documentation update related to AML method tracing (Lv Zheng). - ACPI EC driver fix related to serialized evaluations of _Qxx methods and ACPI tools updates allowing the EC userspace tool to be built from the kernel source (Lv Zheng). - ACPI processor driver updates preparing it for future introduction of CPPC support and ACPI PCC mailbox driver updates (Ashwin Chaugule). - ACPI interrupts enumeration fix for a regression related to the handling of IRQ attribute conflicts between MADT and the ACPI namespace (Jiang Liu). - Fixes related to ACPI device PM (Mika Westerberg, Srinidhi Kasagar). - ACPI device registration code reorganization to separate the sysfs-related code and bus type operations from the rest (Rafael J Wysocki). - Assorted cleanups in the ACPI core (Jarkko Nikula, Mathias Krause, Andy Shevchenko, Rafael J Wysocki, Nicolas Iooss). - ACPI cpufreq driver and ia64 cpufreq driver fixes and cleanups (Pan Xinhui, Rafael J Wysocki). - cpufreq core cleanups on top of the previous changes allowing it to preseve its sysfs directories over system suspend/resume (Viresh Kumar, Rafael J Wysocki, Sebastian Andrzej Siewior). - cpufreq fixes and cleanups related to governors (Viresh Kumar). - cpufreq updates (core and the cpufreq-dt driver) related to the turbo/boost mode support (Viresh Kumar, Bartlomiej Zolnierkiewicz). - New DT bindings for Operating Performance Points (OPP), support for them in the OPP framework and in the cpufreq-dt driver plus related OPP framework fixes and cleanups (Viresh Kumar). - cpufreq powernv driver updates (Shilpasri G Bhat). - New cpufreq driver for Mediatek MT8173 (Pi-Cheng Chen). - Assorted cpufreq driver (speedstep-lib, sfi, integrator) cleanups and fixes (Abhilash Jindal, Andrzej Hajda, Cristian Ardelean). - intel_pstate driver updates including Skylake-S support, support for enabling HW P-states per CPU and an additional vendor bypass list entry (Kristen Carlson Accardi, Chen Yu, Ethan Zhao). - cpuidle core fixes related to the handling of coupled idle states (Xunlei Pang). - intel_idle driver updates including Skylake Client support and support for freeze-mode-specific idle states (Len Brown). - Driver core updates related to power management (Andy Shevchenko, Rafael J Wysocki). - Generic power domains framework fixes and cleanups (Jon Hunter, Geert Uytterhoeven, Rajendra Nayak, Ulf Hansson). - Device PM QoS framework update to allow the latency tolerance setting to be exposed to user space via sysfs (Mika Westerberg). - devfreq support for PPMUv2 in Exynos5433 and a fix for an incorrect exynos-ppmu DT binding (Chanwoo Choi, Javier Martinez Canillas). - System sleep support updates (Alan Stern, Len Brown, SungEun Kim). - rockchip-io AVS support updates (Heiko Stuebner). - PM core clocks support fixup (Colin Ian King). - Power capping RAPL driver update including support for Skylake H/S and Broadwell-H (Radivoje Jovanovic, Seiichi Ikarashi). - Generic device properties framework fixes related to the handling of static (driver-provided) property sets (Andy Shevchenko). - turbostat and cpupower updates (Len Brown, Shilpasri G Bhat, Shreyas B Prabhu)" * tag 'pm+acpi-4.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (180 commits) cpufreq: speedstep-lib: Use monotonic clock cpufreq: powernv: Increase the verbosity of OCC console messages cpufreq: sfi: use kmemdup rather than duplicating its implementation cpufreq: drop !cpufreq_driver check from cpufreq_parse_governor() cpufreq: rename cpufreq_real_policy as cpufreq_user_policy cpufreq: remove redundant 'policy' field from user_policy cpufreq: remove redundant 'governor' field from user_policy cpufreq: update user_policy.* on success cpufreq: use memcpy() to copy policy cpufreq: remove redundant CPUFREQ_INCOMPATIBLE notifier event cpufreq: mediatek: Add MT8173 cpufreq driver dt-bindings: mediatek: Add MT8173 CPU DVFS clock bindings PM / Domains: Fix typo in description of genpd_dev_pm_detach() PM / Domains: Remove unusable governor dummies PM / Domains: Make pm_genpd_init() available to modules PM / domains: Align column headers and data in pm_genpd_summary output powercap / RAPL: disable the 2nd power limit properly tools: cpupower: Fix error when running cpupower monitor PM / OPP: Drop unlikely before IS_ERR(_OR_NULL) PM / OPP: Fix static checker warning (broken 64bit big endian systems) ...
2015-08-24Merge branch 'turbostat' of ↵Rafael J. Wysocki
https://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux into pm-tools Pull turbostat changes for v4.3 from Len Brown. * 'turbostat' of https://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: tools/power turbostat: fix typo on DRAM column in Joules-mode tools/power turbostat: fix parameter passing for forked command tools/power turbostat: dump CONFIG_TDP tools/power turbostat: cpu0 is no longer hard-coded, so update output tools/power turbostat: update turbostat(8)
2015-08-04x86: Add new MSRs and MSR bits used for Intel Skylake PMU supportAndi Kleen
Add new MSRs (LBR_INFO) and some new MSR bits used by the Intel Skylake PMU driver. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: eranian@google.com Link: http://lkml.kernel.org/r/1431285767-27027-4-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-04perf/x86/intel/pt: Add new timing packet enablesAlexander Shishkin
Intel PT chapter in the new Intel Architecture SDM adds several packets corresponding enable bits and registers that control packet generation. Also, additional bits in the Intel PT CPUID leaf were added to enumerate presence and parameters of these new packets and features. The packets and enables are: * CYC: cycle accurate mode, provides the number of cycles elapsed since previous CYC packet; its presence and available threshold values are enumerated via CPUID; * MTC: mini time counter packets, used for tracking TSC time between full TSC packets; its presence and available resolution options are enumerated via CPUID; * PSB packet period is now configurable, available period values are enumerated via CPUID. This patch adds corresponding bit and register definitions, pmu driver capabilities based on CPUID enumeration, new attribute format bits for the new featurens and extends event configuration validation function to take these into account. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@infradead.org Cc: adrian.hunter@intel.com Cc: hpa@zytor.com Link: http://lkml.kernel.org/r/1438262131-12725-1-git-send-email-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07x86/uapi: Do not export <asm/msr-index.h> as part of the user API headersBorislav Petkov
This header containing all MSRs and respective bit definitions got exported to userspace in conjunction with the big UAPI shuffle. But, it doesn't belong in the UAPI headers because userspace can do its own MSR defines and exporting them from the kernel blocks us from doing cleanups/renames in that header. Which is ridiculous - it is not kernel's job to export such a header and keep MSRs list and their names stable. Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: David Howells <dhowells@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1433436928-31903-19-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-12-14UAPI: (Scripted) Disintegrate arch/x86/include/asmDavid Howells
Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Michael Kerrisk <mtk.manpages@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Dave Jones <davej@redhat.com>
2012-12-13Merge tag 'kvm-3.8-1' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Marcelo Tosatti: "Considerable KVM/PPC work, x86 kvmclock vsyscall support, IA32_TSC_ADJUST MSR emulation, amongst others." Fix up trivial conflict in kernel/sched/core.c due to cross-cpu migration notifier added next to rq migration call-back. * tag 'kvm-3.8-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (156 commits) KVM: emulator: fix real mode segment checks in address linearization VMX: remove unneeded enable_unrestricted_guest check KVM: VMX: fix DPL during entry to protected mode x86/kexec: crash_vmclear_local_vmcss needs __rcu kvm: Fix irqfd resampler list walk KVM: VMX: provide the vmclear function and a bitmap to support VMCLEAR in kdump x86/kexec: VMCLEAR VMCSs loaded on all cpus if necessary KVM: MMU: optimize for set_spte KVM: PPC: booke: Get/set guest EPCR register using ONE_REG interface KVM: PPC: bookehv: Add EPCR support in mtspr/mfspr emulation KVM: PPC: bookehv: Add guest computation mode for irq delivery KVM: PPC: Make EPCR a valid field for booke64 and bookehv KVM: PPC: booke: Extend MAS2 EPN mask for 64-bit KVM: PPC: e500: Mask MAS2 EPN high 32-bits in 32/64 tlbwe emulation KVM: PPC: Mask ea's high 32-bits in 32/64 instr emulation KVM: PPC: e500: Add emulation helper for getting instruction ea KVM: PPC: bookehv64: Add support for interrupt handling KVM: PPC: bookehv: Remove GET_VCPU macro from exception handler KVM: PPC: booke: Fix get_tb() compile error on 64-bit KVM: PPC: e500: Silence bogus GCC warning in tlb code ...
2012-11-30KVM: x86: Emulate IA32_TSC_ADJUST MSRWill Auld
CPUID.7.0.EBX[1]=1 indicates IA32_TSC_ADJUST MSR 0x3b is supported Basic design is to emulate the MSR by allowing reads and writes to a guest vcpu specific location to store the value of the emulated MSR while adding the value to the vmcs tsc_offset. In this way the IA32_TSC_ADJUST value will be included in all reads to the TSC MSR whether through rdmsr or rdtsc. This is of course as long as the "use TSC counter offsetting" VM-execution control is enabled as well as the IA32_TSC_ADJUST control. However, because hardware will only return the TSC + IA32_TSC_ADJUST + vmsc tsc_offset for a guest process when it does and rdtsc (with the correct settings) the value of our virtualized IA32_TSC_ADJUST must be stored in one of these three locations. The argument against storing it in the actual MSR is performance. This is likely to be seldom used while the save/restore is required on every transition. IA32_TSC_ADJUST was created as a way to solve some issues with writing TSC itself so that is not an option either. The remaining option, defined above as our solution has the problem of returning incorrect vmcs tsc_offset values (unless we intercept and fix, not done here) as mentioned above. However, more problematic is that storing the data in vmcs tsc_offset will have a different semantic effect on the system than does using the actual MSR. This is illustrated in the following example: The hypervisor set the IA32_TSC_ADJUST, then the guest sets it and a guest process performs a rdtsc. In this case the guest process will get TSC + IA32_TSC_ADJUST_hyperviser + vmsc tsc_offset including IA32_TSC_ADJUST_guest. While the total system semantics changed the semantics as seen by the guest do not and hence this will not cause a problem. Signed-off-by: Will Auld <will.auld@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-11-02x86: apic: Use tsc deadline for oneshot when availableSuresh Siddha
If the TSC deadline mode is supported, LAPIC timer one-shot mode can be implemented using IA32_TSC_DEADLINE MSR. An interrupt will be generated when the TSC value equals or exceeds the value in the IA32_TSC_DEADLINE MSR. This enables us to skip the APIC calibration during boot. Also, in xapic mode, this enables us to skip the uncached apic access to re-arm the APIC timer. As this timer ticks at the high frequency TSC rate, we use the TSC_DIVISOR (32) to work with the 32-bit restrictions in the clockevent API's to avoid 64-bit divides etc (frequency is u32 and "unsigned long" in the set_next_event(), max_delta limits the next event to 32-bit for 32-bit kernel). Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: venki@google.com Cc: len.brown@intel.com Link: http://lkml.kernel.org/r/1350941878.6017.31.camel@sbsiddha-desk.sc.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-10-13Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "This tree includes some late late perf items that missed the first round: tools: - Bash auto completion improvements, now we can auto complete the tools long options, tracepoint event names, etc, from Namhyung Kim. - Look up thread using tid instead of pid in 'perf sched'. - Move global variables into a perf_kvm struct, from David Ahern. - Hists refactorings, preparatory for improved 'diff' command, from Jiri Olsa. - Hists refactorings, preparatory for event group viewieng work, from Namhyung Kim. - Remove double negation on optional feature macro definitions, from Namhyung Kim. - Remove several cases of needless global variables, on most builtins. - misc fixes kernel: - sysfs support for IBS on AMD CPUs, from Robert Richter. - Support for an upcoming Intel CPU, the Xeon-Phi / Knights Corner HPC blade PMU, from Vince Weaver. - misc fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits) perf: Fix perf_cgroup_switch for sw-events perf: Clarify perf_cpu_context::active_pmu usage by renaming it to ::unique_pmu perf/AMD/IBS: Add sysfs support perf hists: Add more helpers for hist entry stat perf hists: Move he->stat.nr_events initialization to a template perf hists: Introduce struct he_stat perf diff: Removing the total_period argument from output code perf tool: Add hpp interface to enable/disable hpp column perf tools: Removing hists pair argument from output path perf hists: Separate overhead and baseline columns perf diff: Refactor diff displacement possition info perf hists: Add struct hists pointer to struct hist_entry perf tools: Complete tracepoint event names perf/x86: Add support for Intel Xeon-Phi Knights Corner PMU perf evlist: Remove some unused methods perf evlist: Introduce add_newtp method perf kvm: Move global variables into a perf_kvm struct perf tools: Convert to BACKTRACE_SUPPORT perf tools: Long option completion support for each subcommands perf tools: Complete long option names of perf command ...
2012-10-04perf/x86: Add support for Intel Xeon-Phi Knights Corner PMUVince Weaver
The following patch adds perf_event support for the Xeon-Phi PMU, as documented in the "Intel Xeon Phi Coprocessor (codename: Knights Corner) Performance Monitoring Units" manual. Even though it is a co-processor, a Phi runs a full Linux environment and can support performance counters. This is just barebones support, it does not add support for interesting new features such as the SPFLT intruction that allows starting/stopping events without entering the kernel. The PMU internally is just like that of an original Pentium, but a "P6-like" MSR interface is provided. The interface is different enough from a real P6 that it's not easy (or practical) to re-use the code in perf_event_p6.c Acked-by: Lawrence F Meadows <lawrence.f.meadows@intel.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Vince Weaver <vincent.weaver@maine.edu> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: eranian@gmail.com Cc: Lawrence F <lawrence.f.meadows@intel.com> Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1209261405320.8398@vincent-weaver-1.um.maine.edu Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-09ACPI: Add fixups for AMD P-state figuresMatthew Garrett
Some AMD systems may round the frequencies in ACPI tables to 100MHz boundaries. We can obtain the real frequencies from MSRs, so add a quirk to fix these frequencies up on AMD systems. Signed-off-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2012-09-09acpi-cpufreq: Add support for modern AMD CPUsMatthew Garrett
The programming model for P-states on modern AMD CPUs is very similar to that of Intel and VIA. It makes sense to consolidate this support into one driver rather than duplicating functionality between two of them. This patch adds support for AMDs with hardware P-state control to acpi-cpufreq. Signed-off-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2012-05-09Merge branch 'perf/x86-ibs' into perf/coreIngo Molnar
2012-03-08perf/x86: Implement IBS interrupt handlerRobert Richter
This patch implements code to handle ibs interrupts. If ibs data is available a raw perf_event data sample is created and sent back to the userland. This patch only implements the storage of ibs data in the raw sample, but this could be extended in a later patch by generating generic event data such as the rip from the ibs sampling data. Signed-off-by: Robert Richter <robert.richter@amd.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1323968199-9326-3-git-send-email-robert.richter@amd.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-05perf/x86: Add Intel LBR MSR definitionsStephane Eranian
This patch adds the LBR definitions for NHM/WSM/SNB and Core. It also adds the definitions for the architected LBR MSR: LBR_SELECT, LBRT_TOS. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1328826068-11713-3-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-09-25x86: TSC deadline definitionsLiu, Jinsong
This pre-defination is preparing for KVM tsc deadline timer emulation, but theirself are not kvm specific. Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-24Merge branch 'kvm-updates/3.1' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
* 'kvm-updates/3.1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (143 commits) KVM: IOMMU: Disable device assignment without interrupt remapping KVM: MMU: trace mmio page fault KVM: MMU: mmio page fault support KVM: MMU: reorganize struct kvm_shadow_walk_iterator KVM: MMU: lockless walking shadow page table KVM: MMU: do not need atomicly to set/clear spte KVM: MMU: introduce the rules to modify shadow page table KVM: MMU: abstract some functions to handle fault pfn KVM: MMU: filter out the mmio pfn from the fault pfn KVM: MMU: remove bypass_guest_pf KVM: MMU: split kvm_mmu_free_page KVM: MMU: count used shadow pages on prepareing path KVM: MMU: rename 'pt_write' to 'emulate' KVM: MMU: cleanup for FNAME(fetch) KVM: MMU: optimize to handle dirty bit KVM: MMU: cache mmio info on page fault path KVM: x86: introduce vcpu_mmio_gva_to_gpa to cleanup the code KVM: MMU: do not update slot bitmap if spte is nonpresent KVM: MMU: fix walking shadow page table KVM guest: KVM Steal time registration ...
2011-07-14x86, msr: Fix typo in ENERGY_PERF_BIAS_POWERSAVEH. Peter Anvin
Fix a trivial typo in the name of the constant ENERGY_PERF_BIAS_POWERSAVE. This didn't cause trouble because this constant is not currently used for anything. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Len Brown <len.brown@intel.com> Link: http://lkml.kernel.org/r/tip-abe48b108247e9b90b4c6739662a2e5c765ed114@git.kernel.org
2011-07-14x86, intel, power: Initialize MSR_IA32_ENERGY_PERF_BIASLen Brown
Since 2.6.36 (23016bf0d25), Linux prints the existence of "epb" in /proc/cpuinfo, Since 2.6.38 (d5532ee7b40), the x86_energy_perf_policy(8) utility has been available in-tree to update MSR_IA32_ENERGY_PERF_BIAS. However, the typical BIOS fails to initialize the MSR, presumably because this is handled by high-volume shrink-wrap operating systems... Linux distros, on the other hand, do not yet invoke x86_energy_perf_policy(8). As a result, WSM-EP, SNB, and later hardware from Intel will run in its default hardware power-on state (performance), which assumes that users care for performance at all costs and not for energy efficiency. While that is fine for performance benchmarks, the hardware's intended default operating point is "normal" mode... Initialize the MSR to the "normal" by default during kernel boot. x86_energy_perf_policy(8) is available to change the default after boot, should the user have a different preference. Signed-off-by: Len Brown <len.brown@intel.com> Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1107140051020.18606@x980 Acked-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: <stable@kernel.org>
2011-07-12KVM: nVMX: Implement reading and writing of VMX MSRsNadav Har'El
When the guest can use VMX instructions (when the "nested" module option is on), it should also be able to read and write VMX MSRs, e.g., to query about VMX capabilities. This patch adds this support. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-05-11KVM: SVM: Implement infrastructure for TSC_RATE_MSRJoerg Roedel
This patch enhances the kvm_amd module with functions to support the TSC_RATE_MSR which can be used to set a given tsc frequency for the guest vcpu. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-04-15x86, amd: Disable GartTlbWlkErr when BIOS forgets itJoerg Roedel
This patch disables GartTlbWlk errors on AMD Fam10h CPUs if the BIOS forgets to do is (or is just too old). Letting these errors enabled can cause a sync-flood on the CPU causing a reboot. The AMD BKDG recommends disabling GART TLB Wlk Error completely. This patch is the fix for https://bugzilla.kernel.org/show_bug.cgi?id=33012 on my machine. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Link: http://lkml.kernel.org/r/20110415131152.GJ18463@8bytes.org Tested-by: Alexandre Demers <alexandre.f.demers@gmail.com> Cc: <stable@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-03-17KVM: x86: handle guest access to BBL_CR_CTL3 MSRjohn cooper
A correction to Intel cpu model CPUID data (patch queued) caused winxp to BSOD when booted with a Penryn model. This was traced to the CPUID "model" field correction from 6 -> 23 (as is proper for a Penryn class of cpu). Only in this case does the problem surface. The cause for this failure is winxp accessing the BBL_CR_CTL3 MSR which is unsupported by current kvm, appears to be a legacy MSR not fully characterized yet existing in current silicon, and is apparently carried forward in MSR space to accommodate vintage code as here. It is not yet conclusive whether this MSR implements any of its legacy functionality or is just an ornamental dud for compatibility. While I found no silicon version specific documentation link to this MSR, a general description exists in Intel's developer's reference which agrees with the functional behavior of other bootloader/kernel code I've examined accessing BBL_CR_CTL3. Regrettably winxp appears to be setting bit #19 called out as "reserved" in the above document. So to minimally accommodate this MSR, kvm msr get will provide the equivalent mock data and kvm msr write will simply toss the guest passed data without interpretation. While this treatment of BBL_CR_CTL3 addresses the immediate problem, the approach may be modified pending clarification from Intel. Signed-off-by: john cooper <john.cooper@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-03-08Merge commit 'v2.6.38-rc8' into perf/coreIngo Molnar
Merge reason: Merge latest fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-04perf: Add support for supplementary event registersAndi Kleen
Change logs against Andi's original version: - Extends perf_event_attr:config to config{,1,2} (Peter Zijlstra) - Fixed a major event scheduling issue. There cannot be a ref++ on an event that has already done ref++ once and without calling put_constraint() in between. (Stephane Eranian) - Use thread_cpumask for percore allocation. (Lin Ming) - Use MSR names in the extra reg lists. (Lin Ming) - Remove redundant "c = NULL" in intel_percore_constraints - Fix comment of perf_event_attr::config1 Intel Nehalem/Westmere have a special OFFCORE_RESPONSE event that can be used to monitor any offcore accesses from a core. This is a very useful event for various tunings, and it's also needed to implement the generic LLC-* events correctly. Unfortunately this event requires programming a mask in a separate register. And worse this separate register is per core, not per CPU thread. This patch: - Teaches perf_events that OFFCORE_RESPONSE needs extra parameters. The extra parameters are passed by user space in the perf_event_attr::config1 field. - Adds support to the Intel perf_event core to schedule per core resources. This adds fairly generic infrastructure that can be also used for other per core resources. The basic code has is patterned after the similar AMD northbridge constraints code. Thanks to Stephane Eranian who pointed out some problems in the original version and suggested improvements. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1299119690-13991-2-git-send-email-ming.m.lin@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-17intel_idle: disable Atom/Lincroft HW C-state auto-demotionLen Brown
Just as we had to disable auto-demotion for NHM/WSM, we need to do the same for Atom (Lincroft version). In particular, auto-demotion will prevent Lincroft from entering the S0i3 idle power saving state. https://bugzilla.kernel.org/show_bug.cgi?id=25252 Signed-off-by: Len Brown <len.brown@intel.com>
2011-02-17intel_idle: disable NHM/WSM HW C-state auto-demotionLen Brown
Hardware C-state auto-demotion is a mechanism where the HW overrides the OS C-state request, instead demoting to a shallower state, which is less expensive, but saves less power. Modern Linux should generally get exactly the states it requests. In particular, when a CPU is taken off-line, it must not be demoted, else it can prevent the entire package from reaching deep C-states. https://bugzilla.kernel.org/show_bug.cgi?id=25252 Signed-off-by: Len Brown <len.brown@intel.com>
2011-01-06Merge branches 'x86-alternatives-for-linus', 'x86-fpu-for-linus', ↵Linus Torvalds
'x86-hwmon-for-linus', 'x86-paravirt-for-linus', 'core-locking-for-linus' and 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-alternatives-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, suspend: Avoid unnecessary smp alternatives switch during suspend/resume * 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86-64, asm: Use fxsaveq/fxrestorq in more places * 'x86-hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, hwmon: Add core threshold notification to therm_throt.c * 'x86-paravirt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, paravirt: Use native_halt on a halt, not native_safe_halt * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: locking, lockdep: Convert sprintf_symbol to %pS * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: irq: Better struct irqaction layout