aboutsummaryrefslogtreecommitdiffstats
path: root/arch/x86/events
AgeCommit message (Collapse)Author
2020-07-07perf/x86/intel: Add more available bits for OFFCORE_RESPONSE of Intel TremontKan Liang
commit 0813c40556fce1eeefb996e020cc5339e0b84137 upstream. The mask in the extra_regs for Intel Tremont need to be extended to allow more defined bits. "Outstanding Requests" (bit 63) is only available on MSR_OFFCORE_RSP0; Fixes: 6daeb8737f8a ("perf/x86/intel: Add Tremont core PMU support") Reported-by: Stephane Eranian <eranian@google.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20200501125442.7030-1-kan.liang@linux.intel.com Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-05-21perf/amd/uncore: Replace manual sampling check with CAP_NO_INTERRUPT flagKim Phillips
commit f967140dfb7442e2db0868b03b961f9c59418a1b upstream. Enable the sampling check in kernel/events/core.c::perf_event_open(), which returns the more appropriate -EOPNOTSUPP. BEFORE: $ sudo perf record -a -e instructions,l3_request_g1.caching_l3_cache_accesses true Error: The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (l3_request_g1.caching_l3_cache_accesses). /bin/dmesg | grep -i perf may provide additional information. With nothing relevant in dmesg. AFTER: $ sudo perf record -a -e instructions,l3_request_g1.caching_l3_cache_accesses true Error: l3_request_g1.caching_l3_cache_accesses: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat' Fixes: c43ca5091a37 ("perf/x86/amd: Add support for AMD NB and L2I "uncore" counters") Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20200311191323.13124-1-kim.phillips@amd.com Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-05-15perf/x86/amd: Constrain Large Increment per Cycle eventsKim Phillips
commit 471af006a747f1c535c8a8c6c0973c320fe01b22 upstream. AMD Family 17h processors and above gain support for Large Increment per Cycle events. Unfortunately there is no CPUID or equivalent bit that indicates whether the feature exists or not, so we continue to determine eligibility based on a CPU family number comparison. For Large Increment per Cycle events, we add a f17h-and-compatibles get_event_constraints_f17h() that returns an even counter bitmask: Large Increment per Cycle events can only be placed on PMCs 0, 2, and 4 out of the currently available 0-5. The only currently public event that requires this feature to report valid counts is PMCx003 "Retired SSE/AVX Operations". Note that the CPU family logic in amd_core_pmu_init() is changed so as to be able to selectively add initialization for features available in ranges of backward-compatible CPU families. This Large Increment per Cycle feature is expected to be retained in future families. A side-effect of assigning a new get_constraints function for f17h disables calling the old (prior to f15h) amd_get_event_constraints implementation left enabled by commit e40ed1542dd7 ("perf/x86: Add perf support for AMD family-17h processors"), which is no longer necessary since those North Bridge event codes are obsoleted. Also fix a spelling mistake whilst in the area (calulating -> calculating). Fixes: e40ed1542dd7 ("perf/x86: Add perf support for AMD family-17h processors") Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20191114183720.19887-2-kim.phillips@amd.com Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-05-04perf/x86/intel: Fix inaccurate period in context switch for auto-reloadKan Liang
commit f861854e1b435b27197417f6f90d87188003cb24 upstream. Perf doesn't take the left period into account when auto-reload is enabled with fixed period sampling mode in context switch. Here is the MSR trace of the perf command as below. (The MSR trace is simplified from a ftrace log.) #perf record -e cycles:p -c 2000000 -- ./triad_loop //The MSR trace of task schedule out //perf disable all counters, disable PEBS, disable GP counter 0, //read GP counter 0, and re-enable all counters. //The counter 0 stops at 0xfffffff82840 write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value 0 write_msr: MSR_IA32_PEBS_ENABLE(3f1), value 0 write_msr: MSR_P6_EVNTSEL0(186), value 40003003c rdpmc: 0, value fffffff82840 write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value f000000ff //The MSR trace of the same task schedule in again //perf disable all counters, enable and set GP counter 0, //enable PEBS, and re-enable all counters. //0xffffffe17b80 (-2000000) is written to GP counter 0. write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value 0 write_msr: MSR_IA32_PMC0(4c1), value ffffffe17b80 write_msr: MSR_P6_EVNTSEL0(186), value 40043003c write_msr: MSR_IA32_PEBS_ENABLE(3f1), value 1 write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value f000000ff When the same task schedule in again, the counter should starts from previous left. However, it starts from the fixed period -2000000 again. A special variant of intel_pmu_save_and_restart() is used for auto-reload, which doesn't update the hwc->period_left. When the monitored task schedules in again, perf doesn't know the left period. The fixed period is used, which is inaccurate. With auto-reload, the counter always has a negative counter value. So the left period is -value. Update the period_left in intel_pmu_save_and_restart_reload(). With the patch: //The MSR trace of task schedule out write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value 0 write_msr: MSR_IA32_PEBS_ENABLE(3f1), value 0 write_msr: MSR_P6_EVNTSEL0(186), value 40003003c rdpmc: 0, value ffffffe25cbc write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value f000000ff //The MSR trace of the same task schedule in again write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value 0 write_msr: MSR_IA32_PMC0(4c1), value ffffffe25cbc write_msr: MSR_P6_EVNTSEL0(186), value 40043003c write_msr: MSR_IA32_PEBS_ENABLE(3f1), value 1 write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value f000000ff Fixes: d31fc13fdcb2 ("perf/x86/intel: Fix event update for auto-reload") Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20200121190125.3389-1-kan.liang@linux.intel.com Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-05-04perf/x86/amd: Add missing L2 misses event spec to AMD Family 17h's event mapKim Phillips
commit 25d387287cf0330abf2aad761ce6eee67326a355 upstream. Commit 3fe3331bb285 ("perf/x86/amd: Add event map for AMD Family 17h"), claimed L2 misses were unsupported, due to them not being found in its referenced documentation, whose link has now moved [1]. That old documentation listed PMCx064 unit mask bit 3 as: "LsRdBlkC: LS Read Block C S L X Change to X Miss." and bit 0 as: "IcFillMiss: IC Fill Miss" We now have new public documentation [2] with improved descriptions, that clearly indicate what events those unit mask bits represent: Bit 3 now clearly states: "LsRdBlkC: Data Cache Req Miss in L2 (all types)" and bit 0 is: "IcFillMiss: Instruction Cache Req Miss in L2." So we can now add support for L2 misses in perf's genericised events as PMCx064 with both the above unit masks. [1] The commit's original documentation reference, "Processor Programming Reference (PPR) for AMD Family 17h Model 01h, Revision B1 Processors", originally available here: https://www.amd.com/system/files/TechDocs/54945_PPR_Family_17h_Models_00h-0Fh.pdf is now available here: https://developer.amd.com/wordpress/media/2017/11/54945_PPR_Family_17h_Models_00h-0Fh.pdf [2] "Processor Programming Reference (PPR) for Family 17h Model 31h, Revision B0 Processors", available here: https://developer.amd.com/wp-content/resources/55803_0.54-PUB.pdf Fixes: 3fe3331bb285 ("perf/x86/amd: Add event map for AMD Family 17h") Reported-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Babu Moger <babu.moger@amd.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20200121171232.28839-1-kim.phillips@amd.com Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-16perf/x86/intel/uncore: Add PCI ID of IMC for Xeon E3 V5 FamilyKan Liang
commit e74383045119fb8055cf31cb39e0fe951d67163a upstream. The IMC uncore support is missed for E3-1585 v5 CPU. Intel Xeon E3 V5 Family has Sky Lake CPU. Add the PCI ID of IMC for Intel Xeon E3 V5 Family. Reported-by: Rosales-fernandez, Carlos <carlos.rosales-fernandez@intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Rosales-fernandez, Carlos <carlos.rosales-fernandez@intel.com> Link: https://lkml.kernel.org/r/1578687311-158748-1-git-send-email-kan.liang@linux.intel.com Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-03-20perf/x86/intel: Fix PT PMI handlingAlexander Shishkin
commit 92ca7da4bdc24d63bb0bcd241c11441ddb63b80a upstream. Commit: ccbebba4c6bf ("perf/x86/intel/pt: Bypass PT vs. LBR exclusivity if the core supports it") skips the PT/LBR exclusivity check on CPUs where PT and LBRs coexist, but also inadvertently skips the active_events bump for PT in that case, which is a bug. If there aren't any hardware events at the same time as PT, the PMI handler will ignore PT PMIs, as active_events reads zero in that case, resulting in the "Uhhuh" spurious NMI warning and PT data loss. Fix this by always increasing active_events for PT events. Fixes: ccbebba4c6bf ("perf/x86/intel/pt: Bypass PT vs. LBR exclusivity if the core supports it") Reported-by: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lkml.kernel.org/r/20191210105101.77210-1-alexander.shishkin@linux.intel.com Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-03-20perf/x86/intel/bts: Fix the use of page_private()Alexander Shishkin
commit ff61541cc6c1962957758ba433c574b76f588d23 upstream. Commit 8062382c8dbe2 ("perf/x86/intel/bts: Add BTS PMU driver") brought in a warning with the BTS buffer initialization that is easily tripped with (assuming KPTI is disabled): instantly throwing: > ------------[ cut here ]------------ > WARNING: CPU: 2 PID: 326 at arch/x86/events/intel/bts.c:86 bts_buffer_setup_aux+0x117/0x3d0 > Modules linked in: > CPU: 2 PID: 326 Comm: perf Not tainted 5.4.0-rc8-00291-gceb9e77324fa #904 > RIP: 0010:bts_buffer_setup_aux+0x117/0x3d0 > Call Trace: > rb_alloc_aux+0x339/0x550 > perf_mmap+0x607/0xc70 > mmap_region+0x76b/0xbd0 ... It appears to assume (for lost raisins) that PagePrivate() is set, while later it actually tests for PagePrivate() before using page_private(). Make it consistent and always check PagePrivate() before using page_private(). Fixes: 8062382c8dbe2 ("perf/x86/intel/bts: Add BTS PMU driver") Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lkml.kernel.org/r/20191205142853.28894-2-alexander.shishkin@linux.intel.com Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-12-29perf/x86/uncore: Fix event group supportKan Liang
commit 75be6f703a141b048590d659a3954c4fedd30bba upstream. The events in the same group don't start or stop simultaneously. Here is the ftrace when enabling event group for uncore_iio_0: # perf stat -e "{uncore_iio_0/event=0x1/,uncore_iio_0/event=0xe/}" <idle>-0 [000] d.h. 8959.064832: read_msr: a41, value b2b0b030 //Read counter reg of IIO unit0 counter0 <idle>-0 [000] d.h. 8959.064835: write_msr: a48, value 400001 //Write Ctrl reg of IIO unit0 counter0 to enable counter0. <------ Although counter0 is enabled, Unit Ctrl is still freezed. Nothing will count. We are still good here. <idle>-0 [000] d.h. 8959.064836: read_msr: a40, value 30100 //Read Unit Ctrl reg of IIO unit0 <idle>-0 [000] d.h. 8959.064838: write_msr: a40, value 30000 //Write Unit Ctrl reg of IIO unit0 to enable all counters in the unit by clear Freeze bit <------Unit0 is un-freezed. Counter0 has been enabled. Now it starts counting. But counter1 has not been enabled yet. The issue starts here. <idle>-0 [000] d.h. 8959.064846: read_msr: a42, value 0 //Read counter reg of IIO unit0 counter1 <idle>-0 [000] d.h. 8959.064847: write_msr: a49, value 40000e //Write Ctrl reg of IIO unit0 counter1 to enable counter1. <------ Now, counter1 just starts to count. Counter0 has been running for a while. Current code un-freezes the Unit Ctrl right after the first counter is enabled. The subsequent group events always loses some counter values. Implement pmu_enable and pmu_disable support for uncore, which can help to batch hardware accesses. No one uses uncore_enable_box and uncore_disable_box. Remove them. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-drivers-review@eclists.intel.com Cc: linux-perf@eclists.intel.com Fixes: 087bfbb03269 ("perf/x86: Add generic Intel uncore PMU support") Link: https://lkml.kernel.org/r/1572014593-31591-1-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-12-29perf/x86/amd/ibs: Handle erratum #420 only on the affected CPU family (10h)Kim Phillips
commit e431e79b60603079d269e0c2a5177943b95fa4b6 upstream. This saves us writing the IBS control MSR twice when disabling the event. I searched revision guides for all families since 10h, and did not find occurrence of erratum #420, nor anything remotely similar: so we isolate the secondary MSR write to family 10h only. Also unconditionally update the count mask for IBS Op implementations that have read & writeable current count (CurCnt) fields in addition to the MaxCnt field. These bits were reserved on prior implementations, and therefore shouldn't have negative impact. Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Fixes: c9574fe0bdb9 ("perf/x86-ibs: Implement workaround for IBS erratum #420") Link: https://lkml.kernel.org/r/20191023150955.30292-2-kim.phillips@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-12-29perf/x86/amd/ibs: Fix reading of the IBS OpData register and thus precise ↵Kim Phillips
RIP validity commit 317b96bb14303c7998dbcd5bc606bd8038fdd4b4 upstream. The loop that reads all the IBS MSRs into *buf stopped one MSR short of reading the IbsOpData register, which contains the RipInvalid status bit. Fix the offset_max assignment so the MSR gets read, so the RIP invalid evaluation is based on what the IBS h/w output, instead of what was left in memory. Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Fixes: d47e8238cd76 ("perf/x86-ibs: Take instruction pointer from ibs sample") Link: https://lkml.kernel.org/r/20191023150955.30292-1-kim.phillips@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-12-10perf/x86/amd: Change/fix NMI latency mitigation to use a timestampTom Lendacky
commit df4d29732fdad43a51284f826bec3e6ded177540 upstream. It turns out that the NMI latency workaround from commit: 6d3edaae16c6 ("x86/perf/amd: Resolve NMI latency issues for active PMCs") ends up being too conservative and results in the perf NMI handler claiming NMIs too easily on AMD hardware when the NMI watchdog is active. This has an impact, for example, on the hpwdt (HPE watchdog timer) module. This module can produce an NMI that is used to reset the system. It registers an NMI handler for the NMI_UNKNOWN type and relies on the fact that nothing has claimed an NMI so that its handler will be invoked when the watchdog device produces an NMI. After the referenced commit, the hpwdt module is unable to process its generated NMI if the NMI watchdog is active, because the current NMI latency mitigation results in the NMI being claimed by the perf NMI handler. Update the AMD perf NMI latency mitigation workaround to, instead, use a window of time. Whenever a PMC is handled in the perf NMI handler, set a timestamp which will act as a perf NMI window. Any NMIs arriving within that window will be claimed by perf. Anything outside that window will not be claimed by perf. The value for the NMI window is set to 100 msecs. This is a conservative value that easily covers any NMI latency in the hardware. While this still results in a window in which the hpwdt module will not receive its NMI, the window is now much, much smaller. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Jerry Hoemann <jerry.hoemann@hpe.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 6d3edaae16c6 ("x86/perf/amd: Resolve NMI latency issues for active PMCs") Link: https://lkml.kernel.org/r/Message-ID: Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-09-21perf/x86/amd/ibs: Fix sample bias for dispatched micro-opsKim Phillips
[ Upstream commit 0f4cd769c410e2285a4e9873a684d90423f03090 ] When counting dispatched micro-ops with cnt_ctl=1, in order to prevent sample bias, IBS hardware preloads the least significant 7 bits of current count (IbsOpCurCnt) with random values, such that, after the interrupt is handled and counting resumes, the next sample taken will be slightly perturbed. The current count bitfield is in the IBS execution control h/w register, alongside the maximum count field. Currently, the IBS driver writes that register with the maximum count, leaving zeroes to fill the current count field, thereby overwriting the random bits the hardware preloaded for itself. Fix the driver to actually retain and carry those random bits from the read of the IBS control register, through to its write, instead of overwriting the lower current count bits with zeroes. Tested with: perf record -c 100001 -e ibs_op/cnt_ctl=1/pp -a -C 0 taskset -c 0 <workload> 'perf annotate' output before: 15.70 65: addsd %xmm0,%xmm1 17.30 add $0x1,%rax 15.88 cmp %rdx,%rax je 82 17.32 72: test $0x1,%al jne 7c 7.52 movapd %xmm1,%xmm0 5.90 jmp 65 8.23 7c: sqrtsd %xmm1,%xmm0 12.15 jmp 65 'perf annotate' output after: 16.63 65: addsd %xmm0,%xmm1 16.82 add $0x1,%rax 16.81 cmp %rdx,%rax je 82 16.69 72: test $0x1,%al jne 7c 8.30 movapd %xmm1,%xmm0 8.13 jmp 65 8.24 7c: sqrtsd %xmm1,%xmm0 8.39 jmp 65 Tested on Family 15h and 17h machines. Machines prior to family 10h Rev. C don't have the RDWROPCNT capability, and have the IbsOpCurCnt bitfield reserved, so this patch shouldn't affect their operation. It is unknown why commit db98c5faf8cb ("perf/x86: Implement 64-bit counter support for IBS") ignored the lower 4 bits of the IbsOpCurCnt field; the number of preloaded random bits has always been 7, AFAICT. Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: "Arnaldo Carvalho de Melo" <acme@kernel.org> Cc: <x86@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "Borislav Petkov" <bp@alien8.de> Cc: Stephane Eranian <eranian@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: "Namhyung Kim" <namhyung@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lkml.kernel.org/r/20190826195730.30614-1-kim.phillips@amd.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-09-21perf/x86/intel: Restrict period on NehalemJosh Hunt
[ Upstream commit 44d3bbb6f5e501b873218142fe08cdf62a4ac1f3 ] We see our Nehalem machines reporting 'perfevents: irq loop stuck!' in some cases when using perf: perfevents: irq loop stuck! WARNING: CPU: 0 PID: 3485 at arch/x86/events/intel/core.c:2282 intel_pmu_handle_irq+0x37b/0x530 ... RIP: 0010:intel_pmu_handle_irq+0x37b/0x530 ... Call Trace: <NMI> ? perf_event_nmi_handler+0x2e/0x50 ? intel_pmu_save_and_restart+0x50/0x50 perf_event_nmi_handler+0x2e/0x50 nmi_handle+0x6e/0x120 default_do_nmi+0x3e/0x100 do_nmi+0x102/0x160 end_repeat_nmi+0x16/0x50 ... ? native_write_msr+0x6/0x20 ? native_write_msr+0x6/0x20 </NMI> intel_pmu_enable_event+0x1ce/0x1f0 x86_pmu_start+0x78/0xa0 x86_pmu_enable+0x252/0x310 __perf_event_task_sched_in+0x181/0x190 ? __switch_to_asm+0x41/0x70 ? __switch_to_asm+0x35/0x70 ? __switch_to_asm+0x41/0x70 ? __switch_to_asm+0x35/0x70 finish_task_switch+0x158/0x260 __schedule+0x2f6/0x840 ? hrtimer_start_range_ns+0x153/0x210 schedule+0x32/0x80 schedule_hrtimeout_range_clock+0x8a/0x100 ? hrtimer_init+0x120/0x120 ep_poll+0x2f7/0x3a0 ? wake_up_q+0x60/0x60 do_epoll_wait+0xa9/0xc0 __x64_sys_epoll_wait+0x1a/0x20 do_syscall_64+0x4e/0x110 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7fdeb1e96c03 ... Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: acme@kernel.org Cc: Josh Hunt <johunt@akamai.com> Cc: bpuranda@akamai.com Cc: mingo@redhat.com Cc: jolsa@redhat.com Cc: tglx@linutronix.de Cc: namhyung@kernel.org Cc: alexander.shishkin@linux.intel.com Link: https://lkml.kernel.org/r/1566256411-18820-1-git-send-email-johunt@akamai.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-08-16perf/x86: Apply more accurate check on hypervisor platformZhenzhong Duan
[ Upstream commit 5ea3f6fb37b79da33ac9211df336fd2b9f47c39f ] check_msr is used to fix a bug report in guest where KVM doesn't support LBR MSR and cause #GP. The msr check is bypassed on real HW to workaround a false failure, see commit d0e1a507bdc7 ("perf/x86/intel: Disable check_msr for real HW") When running a guest with CONFIG_HYPERVISOR_GUEST not set or "nopv" enabled, current check isn't enough and #GP could trigger. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1564022366-18293-1-git-send-email-zhenzhong.duan@oracle.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-08-16perf/x86/intel: Fix invalid Bit 13 for Icelake MSR_OFFCORE_RSP_x registerYunying Sun
[ Upstream commit 3b238a64c3009fed36eaea1af629d9377759d87d ] The Intel SDM states that bit 13 of Icelake's MSR_OFFCORE_RSP_x register is valid, and used for counting hardware generated prefetches of L3 cache. Update the bitmask to allow bit 13. Before: $ perf stat -e cpu/event=0xb7,umask=0x1,config1=0x1bfff/u sleep 3 Performance counter stats for 'sleep 3': <not supported> cpu/event=0xb7,umask=0x1,config1=0x1bfff/u After: $ perf stat -e cpu/event=0xb7,umask=0x1,config1=0x1bfff/u sleep 3 Performance counter stats for 'sleep 3': 9,293 cpu/event=0xb7,umask=0x1,config1=0x1bfff/u Signed-off-by: Yunying Sun <yunying.sun@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@kernel.org Cc: alexander.shishkin@linux.intel.com Cc: bp@alien8.de Cc: hpa@zytor.com Cc: jolsa@redhat.com Cc: namhyung@kernel.org Link: https://lkml.kernel.org/r/20190724082932.12833-1-yunying.sun@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-08-16perf/x86/intel: Fix SLOTS PEBS event constraintKan Liang
[ Upstream commit 3d0c3953601d250175c7684ec0d9df612061dae5 ] Sampling SLOTS event and ref-cycles event in a group on Icelake gives EINVAL. SLOTS event is the event stands for the fixed counter 3, not fixed counter 2. Wrong mask was set to SLOTS event in intel_icl_pebs_event_constraints[]. Reported-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 6017608936c1 ("perf/x86/intel: Add Icelake support") Link: https://lkml.kernel.org/r/20190723200429.8180-1-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-26perf/x86/amd/uncore: Set the thread mask for F17h L3 PMCsKim Phillips
commit 2f217d58a8a086d3399fecce39fb358848e799c4 upstream. Fill in the L3 performance event select register ThreadMask bitfield, to enable per hardware thread accounting. Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Gary Hook <Gary.Hook@amd.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Liska <mliska@suse.cz> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Pu Wen <puwen@hygon.cn> Cc: Stephane Eranian <eranian@google.com> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lkml.kernel.org/r/20190628215906.4276-2-kim.phillips@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-26perf/x86/amd/uncore: Do not set 'ThreadMask' and 'SliceMask' for non-L3 PMCsKim Phillips
commit 16f4641166b10e199f0d7b68c2c5f004fef0bda3 upstream. The following commit: d7cbbe49a930 ("perf/x86/amd/uncore: Set ThreadMask and SliceMask for L3 Cache perf events") enables L3 PMC events for all threads and slices by writing 1's in 'ChL3PmcCfg' (L3 PMC PERF_CTL) register fields. Those bitfields overlap with high order event select bits in the Data Fabric PMC control register, however. So when a user requests raw Data Fabric events (-e amd_df/event=0xYYY/), the two highest order bits get inadvertently set, changing the counter select to events that don't exist, and for which no counts are read. This patch changes the logic to write the L3 masks only when dealing with L3 PMC counters. AMD Family 16h and below Northbridge (NB) counters were not affected. Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Gary Hook <Gary.Hook@amd.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Liska <mliska@suse.cz> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Pu Wen <puwen@hygon.cn> Cc: Stephane Eranian <eranian@google.com> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Fixes: d7cbbe49a930 ("perf/x86/amd/uncore: Set ThreadMask and SliceMask for L3 Cache perf events") Link: https://lkml.kernel.org/r/20190628215906.4276-1-kim.phillips@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-26perf/x86/intel: Fix spurious NMI on fixed counterKan Liang
commit e4557c1a46b0d32746bd309e1941914b5a6912b4 upstream. If a user first sample a PEBS event on a fixed counter, then sample a non-PEBS event on the same fixed counter on Icelake, it will trigger spurious NMI. For example: perf record -e 'cycles:p' -a perf record -e 'cycles' -a The error message for spurious NMI: [June 21 15:38] Uhhuh. NMI received for unknown reason 30 on CPU 2. [ +0.000000] Do you have a strange power saving mode enabled? [ +0.000000] Dazed and confused, but trying to continue The bug was introduced by the following commit: commit 6f55967ad9d9 ("perf/x86/intel: Fix race in intel_pmu_disable_event()") The commit moves the intel_pmu_pebs_disable() after intel_pmu_disable_fixed(), which returns immediately. The related bit of PEBS_ENABLE MSR will never be cleared for the fixed counter. Then a non-PEBS event runs on the fixed counter, but the bit on PEBS_ENABLE is still set, which triggers spurious NMIs. Check and disable PEBS for fixed counters after intel_pmu_disable_fixed(). Reported-by: Yi, Ammy <ammy.yi@intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: <stable@vger.kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Fixes: 6f55967ad9d9 ("perf/x86/intel: Fix race in intel_pmu_disable_event()") Link: https://lkml.kernel.org/r/20190625142135.22112-1-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-26perf/x86/intel/uncore: Handle invalid event coding for free-running counterKan Liang
[ Upstream commit 543ac280b3576c0009e8c0fcd4d6bfc9978d7bd0 ] Counting with invalid event coding for free-running counter may cause OOPs, e.g. uncore_iio_free_running_0/event=1/. Current code only validate the event with free-running event format, event=0xff,umask=0xXY. Non-free-running event format never be checked for the PMU with free-running counters. Add generic hw_config() to check and reject the invalid event coding for free-running PMU. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@kernel.org Cc: eranian@google.com Fixes: 0f519f0352e3 ("perf/x86/intel/uncore: Support IIO free-running counters on SKX") Link: https://lkml.kernel.org/r/1556672028-119221-2-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-26perf/x86/intel: Disable check_msr for real HWJiri Olsa
[ Upstream commit d0e1a507bdc761a14906f03399d933ea639a1756 ] Tom Vaden reported false failure of the check_msr() function, because some servers can do POST tracing and enable LBR tracing during bootup. Kan confirmed that check_msr patch was to fix a bug report in guest, so it's ok to disable it for real HW. Reported-by: Tom Vaden <tom.vaden@hpe.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Tom Vaden <tom.vaden@hpe.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Liang Kan <kan.liang@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190616141313.GD2500@krava [ Readability edits. ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-26perf/x86/intel: Add more Icelake CPUIDsKan Liang
[ Upstream commit faaeff98666c24376cebd0b106504d05a36881d1 ] Add new model number for Icelake desktop and server to perf. The data source encoding for Icelake server is the same as Skylake server. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bp@alien8.de Cc: qiuxu.zhuo@intel.com Cc: rui.zhang@intel.com Cc: tony.luck@intel.com Link: https://lkml.kernel.org/r/20190603134122.13853-2-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-26perf/x86: Add Intel Ice Lake NNPI uncore supportRajneesh Bhardwaj
[ Upstream commit 5f4318c1b1d23a9290e4def78ee76017c288bf60 ] Intel Ice Lake uncore support already included IMC PCI ID but ICL-NNPI CPUID is missing so add it to fix the probe function. Fixes: e39875d15ad6 ("perf/x86: add Intel Icelake uncore support") Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: alexander.shishkin@linux.intel.com Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: Len Brown <lenb@kernel.org> Cc: Linux PM <linux-pm@vger.kernel.org> Link: https://lkml.kernel.org/r/20190614081701.13828-1-rajneesh.bhardwaj@linux.intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-06-29Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes all over the place: - might_sleep() atomicity fix in the microcode loader - resctrl boundary condition fix - APIC arithmethics bug fix for frequencies >= 4.2 GHz - three 5-level paging crash fixes - two speculation fixes - a perf/stacktrace fix" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/unwind/orc: Fall back to using frame pointers for generated code perf/x86: Always store regs->ip in perf_callchain_kernel() x86/speculation: Allow guests to use SSBD even if host does not x86/mm: Handle physical-virtual alignment mismatch in phys_p4d_init() x86/boot/64: Add missing fixup_pointer() for next_early_pgt access x86/boot/64: Fix crash if kernel image crosses page table boundary x86/apic: Fix integer overflow on 10 bit left shift of cpu_khz x86/resctrl: Prevent possible overrun during bitmap operations x86/microcode: Fix the microcode load on CPU hotplug for real
2019-06-29Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Various fixes, most of them related to bugs perf fuzzing found in the x86 code" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/regs: Use PERF_REG_EXTENDED_MASK perf/x86: Remove pmu->pebs_no_xmm_regs perf/x86: Clean up PEBS_XMM_REGS perf/x86/regs: Check reserved bits perf/x86: Disable extended registers for non-supported PMUs perf/ioctl: Add check for the sample_period value perf/core: Fix perf_sample_regs_user() mm check
2019-06-28perf/x86: Always store regs->ip in perf_callchain_kernel()Song Liu
The stacktrace_map_raw_tp BPF selftest is failing because the RIP saved by perf_arch_fetch_caller_regs() isn't getting saved by perf_callchain_kernel(). This was broken by the following commit: d15d356887e7 ("perf/x86: Make perf callchains work without CONFIG_FRAME_POINTER") With that change, when starting with non-HW regs, the unwinder starts with the current stack frame and unwinds until it passes up the frame which called perf_arch_fetch_caller_regs(). So regs->ip needs to be saved deliberately. Fixes: d15d356887e7 ("perf/x86: Make perf callchains work without CONFIG_FRAME_POINTER") Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Kairui Song <kasong@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Borislav Petkov <bp@alien8.de> Link: https://lkml.kernel.org/r/3975a298fa52b506fea32666d8ff6a13467eee6d.1561595111.git.jpoimboe@redhat.com
2019-06-24perf/x86: Remove pmu->pebs_no_xmm_regsKan Liang
We don't need pmu->pebs_no_xmm_regs anymore, the capabilities PERF_PMU_CAP_EXTENDED_REGS can be used to check if XMM registers collection is supported. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lkml.kernel.org/r/1559081314-9714-4-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-24perf/x86: Clean up PEBS_XMM_REGSKan Liang
Use generic macro PERF_REG_EXTENDED_MASK to replace PEBS_XMM_REGS to avoid duplication. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lkml.kernel.org/r/1559081314-9714-3-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-24perf/x86: Disable extended registers for non-supported PMUsKan Liang
The perf fuzzer caused Skylake machine to crash: [ 9680.085831] Call Trace: [ 9680.088301] <IRQ> [ 9680.090363] perf_output_sample_regs+0x43/0xa0 [ 9680.094928] perf_output_sample+0x3aa/0x7a0 [ 9680.099181] perf_event_output_forward+0x53/0x80 [ 9680.103917] __perf_event_overflow+0x52/0xf0 [ 9680.108266] ? perf_trace_run_bpf_submit+0xc0/0xc0 [ 9680.113108] perf_swevent_hrtimer+0xe2/0x150 [ 9680.117475] ? check_preempt_wakeup+0x181/0x230 [ 9680.122091] ? check_preempt_curr+0x62/0x90 [ 9680.126361] ? ttwu_do_wakeup+0x19/0x140 [ 9680.130355] ? try_to_wake_up+0x54/0x460 [ 9680.134366] ? reweight_entity+0x15b/0x1a0 [ 9680.138559] ? __queue_work+0x103/0x3f0 [ 9680.142472] ? update_dl_rq_load_avg+0x1cd/0x270 [ 9680.147194] ? timerqueue_del+0x1e/0x40 [ 9680.151092] ? __remove_hrtimer+0x35/0x70 [ 9680.155191] __hrtimer_run_queues+0x100/0x280 [ 9680.159658] hrtimer_interrupt+0x100/0x220 [ 9680.163835] smp_apic_timer_interrupt+0x6a/0x140 [ 9680.168555] apic_timer_interrupt+0xf/0x20 [ 9680.172756] </IRQ> The XMM registers can only be collected by PEBS hardware events on the platforms with PEBS baseline support, e.g. Icelake, not software/probe events. Add capabilities flag PERF_PMU_CAP_EXTENDED_REGS to indicate the PMU which support extended registers. For X86, the extended registers are XMM registers. Add has_extended_regs() to check if extended registers are applied. The generic code define the mask of extended registers as 0 if arch headers haven't overridden it. Originally-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 878068ea270e ("perf/x86: Support outputting XMM registers") Link: https://lkml.kernel.org/r/1559081314-9714-1-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-19treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500Thomas Gleixner
Based on 2 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation # extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 4122 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Enrico Weigelt <info@metux.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-05treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 288Thomas Gleixner
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms and conditions of the gnu general public license version 2 as published by the free software foundation this program is distributed in the hope it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 263 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190529141901.208660670@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-02Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "On the kernel side there's a bunch of ring-buffer ordering fixes for a reproducible bug, plus a PEBS constraints regression fix. Plus tooling fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tools headers UAPI: Sync kvm.h headers with the kernel sources perf record: Fix s390 missing module symbol and warning for non-root users perf machine: Read also the end of the kernel perf test vmlinux-kallsyms: Ignore aliases to _etext when searching on kallsyms perf session: Add missing swap ops for namespace events perf namespace: Protect reading thread's namespace tools headers UAPI: Sync drm/drm.h with the kernel tools headers UAPI: Sync drm/i915_drm.h with the kernel tools headers UAPI: Sync linux/fs.h with the kernel tools headers UAPI: Sync linux/sched.h with the kernel tools arch x86: Sync asm/cpufeatures.h with the with the kernel tools include UAPI: Update copy of files related to new fspick, fsmount, fsconfig, fsopen, move_mount and open_tree syscalls perf arm64: Fix mksyscalltbl when system kernel headers are ahead of the kernel perf data: Fix 'strncat may truncate' build failure with recent gcc perf/ring-buffer: Use regular variables for nesting perf/ring-buffer: Always use {READ,WRITE}_ONCE() for rb->user_page data perf/ring_buffer: Add ordering to rb->nest increment perf/ring_buffer: Fix exposing a temporarily decreased data_head perf/x86/intel/ds: Fix EVENT vs. UEVENT PEBS constraints
2019-05-21treewide: Add SPDX license identifier - Makefile/KconfigThomas Gleixner
Add SPDX license identifiers to all Make/Kconfig files which: - Have no license information of any form These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-21treewide: Add SPDX license identifier for more missed filesThomas Gleixner
Add SPDX license identifiers to all files which: - Have no license information of any form - Have MODULE_LICENCE("GPL*") inside which was used in the initial scan/conversion to ignore the file These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-21treewide: Add SPDX license identifier for missed filesThomas Gleixner
Add SPDX license identifiers to all files which: - Have no license information of any form - Have EXPORT_.*_SYMBOL_GPL inside which was used in the initial scan/conversion to ignore the file These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-21perf/x86/intel/ds: Fix EVENT vs. UEVENT PEBS constraintsStephane Eranian
This patch fixes an bug revealed by the following commit: 6b89d4c1ae85 ("perf/x86/intel: Fix INTEL_FLAGS_EVENT_CONSTRAINT* masking") That patch modified INTEL_FLAGS_EVENT_CONSTRAINT() to only look at the event code when matching a constraint. If code+umask were needed, then the INTEL_FLAGS_UEVENT_CONSTRAINT() macro was needed instead. This broke with some of the constraints for PEBS events. Several of them, including the one used for cycles:p, cycles:pp, cycles:ppp fell in that category and caused the event to be rejected in PEBS mode. In other words, on some platforms a cmdline such as: $ perf top -e cycles:pp would fail with -EINVAL. This patch fixes this bug by properly using INTEL_FLAGS_UEVENT_CONSTRAINT() when needed in the PEBS constraint tables. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Stephane Eranian <eranian@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: kan.liang@intel.com Link: http://lkml.kernel.org/r/20190521005246.423-1-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-05-17Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Paolo Bonzini: "ARM: - support for SVE and Pointer Authentication in guests - PMU improvements POWER: - support for direct access to the POWER9 XIVE interrupt controller - memory and performance optimizations x86: - support for accessing memory not backed by struct page - fixes and refactoring Generic: - dirty page tracking improvements" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (155 commits) kvm: fix compilation on aarch64 Revert "KVM: nVMX: Expose RDPMC-exiting only when guest supports PMU" kvm: x86: Fix L1TF mitigation for shadow MMU KVM: nVMX: Disable intercept for FS/GS base MSRs in vmcs02 when possible KVM: PPC: Book3S: Remove useless checks in 'release' method of KVM device KVM: PPC: Book3S HV: XIVE: Fix spelling mistake "acessing" -> "accessing" KVM: PPC: Book3S HV: Make sure to load LPID for radix VCPUs kvm: nVMX: Set nested_run_pending in vmx_set_nested_state after checks complete tests: kvm: Add tests for KVM_SET_NESTED_STATE KVM: nVMX: KVM_SET_NESTED_STATE - Tear down old EVMCS state before setting new state tests: kvm: Add tests for KVM_CAP_MAX_VCPUS and KVM_CAP_MAX_CPU_ID tests: kvm: Add tests to .gitignore KVM: Introduce KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 KVM: Fix kvm_clear_dirty_log_protect off-by-(minus-)one KVM: Fix the bitmap range to copy during clear dirty KVM: arm64: Fix ptrauth ID register masking logic KVM: x86: use direct accessors for RIP and RSP KVM: VMX: Use accessors for GPRs outside of dedicated caching logic KVM: x86: Omit caching logic for always-available GPRs kvm, x86: Properly check whether a pfn is an MMIO or not ...
2019-05-16Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "An x86 PMU constraint fix, an interface fix, and a Sparse fix" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel: Allow PEBS multi-entry in watermark mode perf/x86/intel: Fix INTEL_FLAGS_EVENT_CONSTRAINT* masking perf/x86/amd/iommu: Make the 'amd_iommu_attr_groups' symbol static
2019-05-14treewide: replace #include <asm/sizes.h> with #include <linux/sizes.h>Masahiro Yamada
Since commit dccd2304cc90 ("ARM: 7430/1: sizes.h: move from asm-generic to <linux/sizes.h>"), <asm/sizes.h> and <asm-generic/sizes.h> are just wrappers of <linux/sizes.h>. This commit replaces all <asm/sizes.h> and <asm-generic/sizes.h> to prepare for the removal. Link: http://lkml.kernel.org/r/1553267665-27228-1-git-send-email-yamada.masahiro@socionext.com Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14perf/x86/intel: Allow PEBS multi-entry in watermark modeStephane Eranian
This patch fixes a restriction/bug introduced by: 583feb08e7f7 ("perf/x86/intel: Fix handling of wakeup_events for multi-entry PEBS") The original patch prevented using multi-entry PEBS when wakeup_events != 0. However given that wakeup_events is part of a union with wakeup_watermark, it means that in watermark mode, PEBS multi-entry is also disabled which is not the intent. This patch fixes this by checking is watermark mode is enabled. Signed-off-by: Stephane Eranian <eranian@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: jolsa@redhat.com Cc: kan.liang@intel.com Cc: vincent.weaver@maine.edu Fixes: 583feb08e7f7 ("perf/x86/intel: Fix handling of wakeup_events for multi-entry PEBS") Link: http://lkml.kernel.org/r/20190514003400.224340-1-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-05-10perf/x86/intel: Fix INTEL_FLAGS_EVENT_CONSTRAINT* maskingStephane Eranian
On Intel Westmere, a cmdline as follows: $ perf record -e cpu/event=0xc4,umask=0x2,name=br_inst_retired.near_call/p .... was failing. Yet the event+ umask support PEBS. It turns out this is due to a bug in the the PEBS event constraint table for westmere. All forms of BR_INST_RETIRED.* support PEBS. Therefore the constraint mask should ignore the umask. The name of the macro INTEL_FLAGS_EVENT_CONSTRAINT() hint that this is the case but it was not. That macros was checking both the event code and event umask. Therefore, it was only matching on 0x00c4. There are code+umask macros, they all have *UEVENT*. This bug fixes the issue by checking only the event code in the mask. Both single and range version are modified. Signed-off-by: Stephane Eranian <eranian@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: kan.liang@intel.com Link: http://lkml.kernel.org/r/20190509214556.123493-1-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-05-08perf/x86/amd/iommu: Make the 'amd_iommu_attr_groups' symbol staticWang Hai
Fixes the following sparse warning: arch/x86/events/amd/iommu.c:396:30: warning: symbol 'amd_iommu_attr_groups' was not declared. Should it be static? Signed-off-by: Wang Hai <wanghai26@huawei.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@kernel.org Cc: alexander.shishkin@linux.intel.com Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: namhyung@kernel.org Fixes: 51686546304f (x86/events/amd/iommu: Fix sysfs perf attribute groups) Link: http://lkml.kernel.org/r/20190508020418.19568-1-wanghai26@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-05-06Merge branch 'perf-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "The main kernel changes were: - add support for Intel's "adaptive PEBS v4" - which embedds LBS data in PEBS records and can thus batch up and reduce the IRQ (NMI) rate significantly - reducing overhead and making call-graph profiling less intrusive. - add Intel CPU core and uncore support updates for Tremont, Icelake, - extend the x86 PMU constraints scheduler with 'constraint ranges' to better support Icelake hw constraints, - make x86 call-chain support work better with CONFIG_FRAME_POINTER=y - misc other changes Tooling changes: - updates to the main tools: 'perf record', 'perf trace', 'perf stat' - updated Intel and S/390 vendor events - libtraceevent updates - misc other updates and fixes" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (69 commits) perf/x86: Make perf callchains work without CONFIG_FRAME_POINTER watchdog: Fix typo in comment perf/x86/intel: Add Tremont core PMU support perf/x86/intel/uncore: Add Intel Icelake uncore support perf/x86/msr: Add Icelake support perf/x86/intel/rapl: Add Icelake support perf/x86/intel/cstate: Add Icelake support perf/x86/intel: Add Icelake support perf/x86: Support constraint ranges perf/x86/lbr: Avoid reading the LBRs when adaptive PEBS handles them perf/x86/intel: Support adaptive PEBS v4 perf/x86/intel/ds: Extract code of event update in short period perf/x86/intel: Extract memory code PEBS parser for reuse perf/x86: Support outputting XMM registers perf/x86/intel: Force resched when TFA sysctl is modified perf/core: Add perf_pmu_resched() as global function perf/headers: Fix stale comment for struct perf_addr_filter perf/core: Make perf_swevent_init_cpu() static perf/x86: Add sanity checks to x86_schedule_events() perf/x86: Optimize x86_schedule_events() ...
2019-05-05perf/x86/intel: Fix race in intel_pmu_disable_event()Jiri Olsa
New race in x86_pmu_stop() was introduced by replacing the atomic __test_and_clear_bit() of cpuc->active_mask by separate test_bit() and __clear_bit() calls in the following commit: 3966c3feca3f ("x86/perf/amd: Remove need to check "running" bit in NMI handler") The race causes panic for PEBS events with enabled callchains: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 ... RIP: 0010:perf_prepare_sample+0x8c/0x530 Call Trace: <NMI> perf_event_output_forward+0x2a/0x80 __perf_event_overflow+0x51/0xe0 handle_pmi_common+0x19e/0x240 intel_pmu_handle_irq+0xad/0x170 perf_event_nmi_handler+0x2e/0x50 nmi_handle+0x69/0x110 default_do_nmi+0x3e/0x100 do_nmi+0x11a/0x180 end_repeat_nmi+0x16/0x1a RIP: 0010:native_write_msr+0x6/0x20 ... </NMI> intel_pmu_disable_event+0x98/0xf0 x86_pmu_stop+0x6e/0xb0 x86_pmu_del+0x46/0x140 event_sched_out.isra.97+0x7e/0x160 ... The event is configured to make samples from PEBS drain code, but when it's disabled, we'll go through NMI path instead, where data->callchain will not get allocated and we'll crash: x86_pmu_stop test_bit(hwc->idx, cpuc->active_mask) intel_pmu_disable_event(event) { ... intel_pmu_pebs_disable(event); ... EVENT OVERFLOW -> <NMI> intel_pmu_handle_irq handle_pmi_common TEST PASSES -> test_bit(bit, cpuc->active_mask)) perf_event_overflow perf_prepare_sample { ... if (!(sample_type & __PERF_SAMPLE_CALLCHAIN_EARLY)) data->callchain = perf_callchain(event, regs); CRASH -> size += data->callchain->nr; } </NMI> ... x86_pmu_disable_event(event) } __clear_bit(hwc->idx, cpuc->active_mask); Fixing this by disabling the event itself before setting off the PEBS bit. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Arcari <darcari@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Lendacky Thomas <Thomas.Lendacky@amd.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Fixes: 3966c3feca3f ("x86/perf/amd: Remove need to check "running" bit in NMI handler") Link: http://lkml.kernel.org/r/20190504151556.31031-1-jolsa@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-05-03perf/x86/intel/pt: Remove software double buffering PMU capabilityAlexander Shishkin
Now that all AUX allocations are high-order by default, the software double buffering PMU capability doesn't make sense any more, get rid of it. In case some PMUs choose to opt out, we can re-introduce it. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: adrian.hunter@intel.com Link: http://lkml.kernel.org/r/20190503085536.24119-3-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-05-02perf/x86/amd: Update generic hardware cache events for Family 17hKim Phillips
Add a new amd_hw_cache_event_ids_f17h assignment structure set for AMD families 17h and above, since a lot has changed. Specifically: L1 Data Cache The data cache access counter remains the same on Family 17h. For DC misses, PMCx041's definition changes with Family 17h, so instead we use the L2 cache accesses from L1 data cache misses counter (PMCx060,umask=0xc8). For DC hardware prefetch events, Family 17h breaks compatibility for PMCx067 "Data Prefetcher", so instead, we use PMCx05a "Hardware Prefetch DC Fills." L1 Instruction Cache PMCs 0x80 and 0x81 (32-byte IC fetches and misses) are backward compatible on Family 17h. For prefetches, we remove the erroneous PMCx04B assignment which counts how many software data cache prefetch load instructions were dispatched. LL - Last Level Cache Removing PMCs 7D, 7E, and 7F assignments, as they do not exist on Family 17h, where the last level cache is L3. L3 counters can be accessed using the existing AMD Uncore driver. Data TLB On Intel machines, data TLB accesses ("dTLB-loads") are assigned to counters that count load/store instructions retired. This is inconsistent with instruction TLB accesses, where Intel implementations report iTLB misses that hit in the STLB. Ideally, dTLB-loads would count higher level dTLB misses that hit in lower level TLBs, and dTLB-load-misses would report those that also missed in those lower-level TLBs, therefore causing a page table walk. That would be consistent with instruction TLB operation, remove the redundancy between dTLB-loads and L1-dcache-loads, and prevent perf from producing artificially low percentage ratios, i.e. the "0.01%" below: 42,550,869 L1-dcache-loads 41,591,860 dTLB-loads 4,802 dTLB-load-misses # 0.01% of all dTLB cache hits 7,283,682 L1-dcache-stores 7,912,392 dTLB-stores 310 dTLB-store-misses On AMD Families prior to 17h, the "Data Cache Accesses" counter is used, which is slightly better than load/store instructions retired, but still counts in terms of individual load/store operations instead of TLB operations. So, for AMD Families 17h and higher, this patch assigns "dTLB-loads" to a counter for L1 dTLB misses that hit in the L2 dTLB, and "dTLB-load-misses" to a counter for L1 DTLB misses that caused L2 DTLB misses and therefore also caused page table walks. This results in a much more accurate view of data TLB performance: 60,961,781 L1-dcache-loads 4,601 dTLB-loads 963 dTLB-load-misses # 20.93% of all dTLB cache hits Note that for all AMD families, data loads and stores are combined in a single accesses counter, so no 'L1-dcache-stores' are reported separately, and stores are counted with loads in 'L1-dcache-loads'. Also note that the "% of all dTLB cache hits" string is misleading because (a) "dTLB cache": although TLBs can be considered caches for page tables, in this context, it can be misinterpreted as data cache hits because the figures are similar (at least on Intel), and (b) not all those loads (technically accesses) technically "hit" at that hardware level. "% of all dTLB accesses" would be more clear/accurate. Instruction TLB On Intel machines, 'iTLB-loads' measure iTLB misses that hit in the STLB, and 'iTLB-load-misses' measure iTLB misses that also missed in the STLB and completed a page table walk. For AMD Family 17h and above, for 'iTLB-loads' we replace the erroneous instruction cache fetches counter with PMCx084 "L1 ITLB Miss, L2 ITLB Hit". For 'iTLB-load-misses' we still use PMCx085 "L1 ITLB Miss, L2 ITLB Miss", but set a 0xff umask because without it the event does not get counted. Branch Predictor (BPU) PMCs 0xc2 and 0xc3 continue to be valid across all AMD Families. Node Level Events Family 17h does not have a PMCx0e9 counter, and corresponding counters have not been made available publicly, so for now, we mark them as unsupported for Families 17h and above. Reference: "Open-Source Register Reference For AMD Family 17h Processors Models 00h-2Fh" Released 7/17/2018, Publication #56255, Revision 3.03: https://www.amd.com/system/files/TechDocs/56255_OSRR.pdf [ mingo: tidied up the line breaks. ] Signed-off-by: Kim Phillips <kim.phillips@amd.com> Cc: <stable@vger.kernel.org> # v4.9+ Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Liška <mliska@suse.cz> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Pu Wen <puwen@hygon.cn> Cc: Stephane Eranian <eranian@google.com> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Lendacky <Thomas.Lendacky@amd.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-kernel@vger.kernel.org Cc: linux-perf-users@vger.kernel.org Fixes: e40ed1542dd7 ("perf/x86: Add perf support for AMD family-17h processors") Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-30KVM: x86: Inject PMI for KVM guestLuwei Kang
Inject a PMI for KVM guest when Intel PT working in Host-Guest mode and Guest ToPA entry memory buffer was completely filled. Signed-off-by: Luwei Kang <luwei.kang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-29perf/x86: Make perf callchains work without CONFIG_FRAME_POINTERKairui Song
Currently perf callchain doesn't work well with ORC unwinder when sampling from trace point. We'll get useless in kernel callchain like this: perf 6429 [000] 22.498450: kmem:mm_page_alloc: page=0x176a17 pfn=1534487 order=0 migratetype=0 gfp_flags=GFP_KERNEL ffffffffbe23e32e __alloc_pages_nodemask+0x22e (/lib/modules/5.1.0-rc3+/build/vmlinux) 7efdf7f7d3e8 __poll+0x18 (/usr/lib64/libc-2.28.so) 5651468729c1 [unknown] (/usr/bin/perf) 5651467ee82a main+0x69a (/usr/bin/perf) 7efdf7eaf413 __libc_start_main+0xf3 (/usr/lib64/libc-2.28.so) 5541f689495641d7 [unknown] ([unknown]) The root cause is that, for trace point events, it doesn't provide a real snapshot of the hardware registers. Instead perf tries to get required caller's registers and compose a fake register snapshot which suppose to contain enough information for start a unwinding. However without CONFIG_FRAME_POINTER, if failed to get caller's BP as the frame pointer, so current frame pointer is returned instead. We get a invalid register combination which confuse the unwinder, and end the stacktrace early. So in such case just don't try dump BP, and let the unwinder start directly when the register is not a real snapshot. Use SP as the skip mark, unwinder will skip all the frames until it meet the frame of the trace point caller. Tested with frame pointer unwinder and ORC unwinder, this makes perf callchain get the full kernel space stacktrace again like this: perf 6503 [000] 1567.570191: kmem:mm_page_alloc: page=0x16c904 pfn=1493252 order=0 migratetype=0 gfp_flags=GFP_KERNEL ffffffffb523e2ae __alloc_pages_nodemask+0x22e (/lib/modules/5.1.0-rc3+/build/vmlinux) ffffffffb52383bd __get_free_pages+0xd (/lib/modules/5.1.0-rc3+/build/vmlinux) ffffffffb52fd28a __pollwait+0x8a (/lib/modules/5.1.0-rc3+/build/vmlinux) ffffffffb521426f perf_poll+0x2f (/lib/modules/5.1.0-rc3+/build/vmlinux) ffffffffb52fe3e2 do_sys_poll+0x252 (/lib/modules/5.1.0-rc3+/build/vmlinux) ffffffffb52ff027 __x64_sys_poll+0x37 (/lib/modules/5.1.0-rc3+/build/vmlinux) ffffffffb500418b do_syscall_64+0x5b (/lib/modules/5.1.0-rc3+/build/vmlinux) ffffffffb5a0008c entry_SYSCALL_64_after_hwframe+0x44 (/lib/modules/5.1.0-rc3+/build/vmlinux) 7f71e92d03e8 __poll+0x18 (/usr/lib64/libc-2.28.so) 55a22960d9c1 [unknown] (/usr/bin/perf) 55a22958982a main+0x69a (/usr/bin/perf) 7f71e9202413 __libc_start_main+0xf3 (/usr/lib64/libc-2.28.so) 5541f689495641d7 [unknown] ([unknown]) Co-developed-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Kairui Song <kasong@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Young <dyoung@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190422162652.15483-1-kasong@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-25perf/x86/intel: Update KBL Package C-state events to also include ↵Harry Pan
PC8/PC9/PC10 counters Kaby Lake (and Coffee Lake) has PC8/PC9/PC10 residency counters. This patch updates the list of Kaby/Coffee Lake PMU event counters from the snb_cstates[] list of events to the hswult_cstates[] list of events, which keeps all previously supported events and also adds the PKG_C8, PKG_C9 and PKG_C10 residency counters. This allows user space tools to profile them through the perf interface. Signed-off-by: Harry Pan <harry.pan@intel.com> Cc: <stable@vger.kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: gs0622@gmail.com Link: http://lkml.kernel.org/r/20190424145033.1924-1-harry.pan@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>