aboutsummaryrefslogtreecommitdiffstats
path: root/arch/x86/events
AgeCommit message (Collapse)Author
2023-01-18perf/x86/intel/uncore: Fix reference count leak in hswep_has_limit_sbox()Xiongfeng Wang
[ Upstream commit 1ff9dd6e7071a561f803135c1d684b13c7a7d01d ] pci_get_device() will increase the reference count for the returned 'dev'. We need to call pci_dev_put() to decrease the reference count. Since 'dev' is only used in pci_read_config_dword(), let's add pci_dev_put() right after it. Fixes: 9d480158ee86 ("perf/x86/intel/uncore: Remove uncore extra PCI dev HSWEP_PCI_PCU_3") Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20221118063137.121512-3-wangxiongfeng2@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14perf/amd/ibs: Use interrupt regs ip for stack unwindingRavi Bangoria
[ Upstream commit 3d47083b9ff46863e8374ad3bb5edb5e464c75f8 ] IbsOpRip is recorded when IBS interrupt is triggered. But there is a skid from the time IBS interrupt gets triggered to the time the interrupt is presented to the core. Meanwhile processor would have moved ahead and thus IbsOpRip will be inconsistent with rsp and rbp recorded as part of the interrupt regs. This causes issues while unwinding stack using the ORC unwinder as it needs consistent rip, rsp and rbp. Fix this by using rip from interrupt regs instead of IbsOpRip for stack unwinding. Fixes: ee9f8fce99640 ("x86/unwind: Add the ORC unwinder") Reported-by: Dmitry Monakhov <dmtrmonakhov@yandex-team.ru> Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220429051441.14251-1-ravi.bangoria@amd.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-15perf/x86/intel/pt: Fix address filter config for 32-bit kernelAdrian Hunter
[ Upstream commit e5524bf1047eb3b3f3f33b5f59897ba67b3ade87 ] Change from shifting 'unsigned long' to 'u64' to prevent the config bits being lost on a 32-bit kernel. Fixes: eadf48cab4b6b0 ("perf/x86/intel/pt: Add support for address range filtering in PT") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220131072453.2839535-5-adrian.hunter@intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-26perf/x86/intel/uncore: Fix IIO event constraints for Skylake ServerAlexander Antonov
[ Upstream commit 3866ae319c846a612109c008f43cba80b8c15e86 ] According to the latest uncore document, COMP_BUF_OCCUPANCY (0xd5) event can be collected on 2-3 counters. Update uncore IIO event constraints for Skylake Server. Fixes: cd34cd97b7b4 ("perf/x86/intel/uncore: Add Skylake server uncore support") Signed-off-by: Alexander Antonov <alexander.antonov@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20211115090334.3789-3-alexander.antonov@linux.intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-26perf/x86/intel/uncore: Fix filter_tid mask for CHA events on Skylake ServerAlexander Antonov
[ Upstream commit e324234e0aa881b7841c7c713306403e12b069ff ] According Uncore Reference Manual: any of the CHA events may be filtered by Thread/Core-ID by using tid modifier in CHA Filter 0 Register. Update skx_cha_hw_config() to follow Uncore Guide. Fixes: cd34cd97b7b4 ("perf/x86/intel/uncore: Add Skylake server uncore support") Signed-off-by: Alexander Antonov <alexander.antonov@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20211115090334.3789-2-alexander.antonov@linux.intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-17perf/x86: Reset destroy callback on event init failureAnand K Mistry
[ Upstream commit 02d029a41dc986e2d5a77ecca45803857b346829 ] perf_init_event tries multiple init callbacks and does not reset the event state between tries. When x86_pmu_event_init runs, it unconditionally sets the destroy callback to hw_perf_event_destroy. On the next init attempt after x86_pmu_event_init, in perf_try_init_event, if the pmu's capabilities includes PERF_PMU_CAP_NO_EXCLUDE, the destroy callback will be run. However, if the next init didn't set the destroy callback, hw_perf_event_destroy will be run (since the callback wasn't reset). Looking at other pmu init functions, the common pattern is to only set the destroy callback on a successful init. Resetting the callback on failure tries to replicate that pattern. This was discovered after commit f11dd0d80555 ("perf/x86/amd/ibs: Extend PERF_PMU_CAP_NO_EXCLUDE to IBS Op") when the second (and only second) run of the perf tool after a reboot results in 0 samples being generated. The extra run of hw_perf_event_destroy results in active_events having an extra decrement on each perf run. The second run has active_events == 0 and every subsequent run has active_events < 0. When active_events == 0, the NMI handler will early-out and not record any samples. Signed-off-by: Anand K Mistry <amistry@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210929170405.1.I078b98ee7727f9ae9d6df8262bad7e325e40faf0@changeid Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-22perf/x86/amd/ibs: Work around erratum #1197Kim Phillips
[ Upstream commit 26db2e0c51fe83e1dd852c1321407835b481806e ] Erratum #1197 "IBS (Instruction Based Sampling) Register State May be Incorrect After Restore From CC6" is published in a document: "Revision Guide for AMD Family 19h Models 00h-0Fh Processors" 56683 Rev. 1.04 July 2021 https://bugzilla.kernel.org/show_bug.cgi?id=206537 Implement the erratum's suggested workaround and ignore IBS samples if MSRC001_1031 == 0. Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210817221048.88063-3-kim.phillips@amd.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-22perf/x86/intel/pt: Fix mask of num_address_rangesXiaoyao Li
[ Upstream commit c53c6b7409f4cd9e542991b53d597fbe2751d7db ] Per SDM, bit 2:0 of CPUID(0x14,1).EAX[2:0] reports the number of configurable address ranges for filtering, not bit 1:0. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Link: https://lkml.kernel.org/r/20210824040622.4081502-1-xiaoyao.li@intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-08-12perf/x86/amd: Don't touch the AMD64_EVENTSEL_HOSTONLY bit inside the guestLike Xu
commit df51fe7ea1c1c2c3bfdb81279712fdd2e4ea6c27 upstream. If we use "perf record" in an AMD Milan guest, dmesg reports a #GP warning from an unchecked MSR access error on MSR_F15H_PERF_CTLx: [] unchecked MSR access error: WRMSR to 0xc0010200 (tried to write 0x0000020000110076) at rIP: 0xffffffff8106ddb4 (native_write_msr+0x4/0x20) [] Call Trace: [] amd_pmu_disable_event+0x22/0x90 [] x86_pmu_stop+0x4c/0xa0 [] x86_pmu_del+0x3a/0x140 The AMD64_EVENTSEL_HOSTONLY bit is defined and used on the host, while the guest perf driver should avoid such use. Fixes: 1018faa6cf23 ("perf/x86/kvm: Fix Host-Only/Guest-Only counting with SVM disabled") Signed-off-by: Like Xu <likexu@tencent.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Liam Merwick <liam.merwick@oracle.com> Tested-by: Kim Phillips <kim.phillips@amd.com> Tested-by: Liam Merwick <liam.merwick@oracle.com> Link: https://lkml.kernel.org/r/20210802070850.35295-1-likexu@tencent.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-22x86/events/amd/iommu: Fix sysfs type mismatchNathan Chancellor
[ Upstream commit de5bc7b425d4c27ae5faa00ea7eb6b9780b9a355 ] dev_attr_show() calls _iommu_event_show() via an indirect call but _iommu_event_show()'s type does not currently match the type of the show() member in 'struct device_attribute', resulting in a Control Flow Integrity violation. $ cat /sys/devices/amd_iommu_1/events/mem_dte_hit csource=0x0a $ dmesg | grep "CFI failure" [ 3526.735140] CFI failure (target: _iommu_event_show...): Change _iommu_event_show() and 'struct amd_iommu_event_desc' to 'struct device_attribute' so that there is no more CFI violation. Fixes: 7be6296fdd75 ("perf/x86/amd: AMD IOMMU Performance Counter PERF uncore PMU implementation") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210415001112.3024673-1-nathan@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-04-28perf/x86/intel/uncore: Remove uncore extra PCI dev HSWEP_PCI_PCU_3Kan Liang
[ Upstream commit 9d480158ee86ad606d3a8baaf81e6b71acbfd7d5 ] There may be a kernel panic on the Haswell server and the Broadwell server, if the snbep_pci2phy_map_init() return error. The uncore_extra_pci_dev[HSWEP_PCI_PCU_3] is used in the cpu_init() to detect the existence of the SBOX, which is a MSR type of PMON unit. The uncore_extra_pci_dev is allocated in the uncore_pci_init(). If the snbep_pci2phy_map_init() returns error, perf doesn't initialize the PCI type of the PMON units, so the uncore_extra_pci_dev will not be allocated. But perf may continue initializing the MSR type of PMON units. A null dereference kernel panic will be triggered. The sockets in a Haswell server or a Broadwell server are identical. Only need to detect the existence of the SBOX once. Current perf probes all available PCU devices and stores them into the uncore_extra_pci_dev. It's unnecessary. Use the pci_get_device() to replace the uncore_extra_pci_dev. Only detect the existence of the SBOX on the first available PCU device once. Factor out hswep_has_limit_sbox(), since the Haswell server and the Broadwell server uses the same way to detect the existence of the SBOX. Add some macros to replace the magic number. Fixes: 5306c31c5733 ("perf/x86/uncore/hsw-ep: Handle systems with only two SBOXes") Reported-by: Steve Wahl <steve.wahl@hpe.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Steve Wahl <steve.wahl@hpe.com> Link: https://lkml.kernel.org/r/1618521764-100923-1-git-send-email-kan.liang@linux.intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-24perf/x86/intel: Fix a crash caused by zero PEBS statusKan Liang
commit d88d05a9e0b6d9356e97129d4ff9942d765f46ea upstream. A repeatable crash can be triggered by the perf_fuzzer on some Haswell system. https://lore.kernel.org/lkml/7170d3b-c17f-1ded-52aa-cc6d9ae999f4@maine.edu/ For some old CPUs (HSW and earlier), the PEBS status in a PEBS record may be mistakenly set to 0. To minimize the impact of the defect, the commit was introduced to try to avoid dropping the PEBS record for some cases. It adds a check in the intel_pmu_drain_pebs_nhm(), and updates the local pebs_status accordingly. However, it doesn't correct the PEBS status in the PEBS record, which may trigger the crash, especially for the large PEBS. It's possible that all the PEBS records in a large PEBS have the PEBS status 0. If so, the first get_next_pebs_record_by_bit() in the __intel_pmu_pebs_event() returns NULL. The at = NULL. Since it's a large PEBS, the 'count' parameter must > 1. The second get_next_pebs_record_by_bit() will crash. Besides the local pebs_status, correct the PEBS status in the PEBS record as well. Fixes: 01330d7288e0 ("perf/x86: Allow zero PEBS status with only single active event") Reported-by: Vince Weaver <vincent.weaver@maine.edu> Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1615555298-140216-1-git-send-email-kan.liang@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-02perf/x86: fix sysfs type mismatchesSami Tolvanen
[ Upstream commit ebd19fc372e3e78bf165f230e7c084e304441c08 ] This change switches rapl to use PMU_FORMAT_ATTR, and fixes two other macros to use device_attribute instead of kobj_attribute to avoid callback type mismatches that trip indirect call checking with Clang's Control-Flow Integrity (CFI). Reported-by: Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20201113183126.1239404-1-samitolvanen@google.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-11-05perf/x86/amd/ibs: Fix raw sample data accumulationKim Phillips
commit 36e1be8ada994d509538b3b1d0af8b63c351e729 upstream. Neither IbsBrTarget nor OPDATA4 are populated in IBS Fetch mode. Don't accumulate them into raw sample user data in that case. Also, in Fetch mode, add saving the IBS Fetch Control Extended MSR. Technically, there is an ABI change here with respect to the IBS raw sample data format, but I don't see any perf driver version information being included in perf.data file headers, but, existing users can detect whether the size of the sample record has reduced by 8 bytes to determine whether the IBS driver has this fix. Fixes: 904cb3677f3a ("perf/x86/amd/ibs: Update IBS MSRs and feature definitions") Reported-by: Stephane Eranian <stephane.eranian@google.com> Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20200908214740.18097-6-kim.phillips@amd.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-05perf/x86/amd/ibs: Don't include randomized bits in get_ibs_op_count()Kim Phillips
commit 680d69635005ba0e58fe3f4c52fc162b8fc743b0 upstream. get_ibs_op_count() adds hardware's current count (IbsOpCurCnt) bits to its count regardless of hardware's valid status. According to the PPR for AMD Family 17h Model 31h B0 55803 Rev 0.54, if the counter rolls over, valid status is set, and the lower 7 bits of IbsOpCurCnt are randomized by hardware. Don't include those bits in the driver's event count. Fixes: 8b1e13638d46 ("perf/x86-ibs: Fix usage of IBS op current count") Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-05arch/x86/amd/ibs: Fix re-arming IBS FetchKim Phillips
commit 221bfce5ebbdf72ff08b3bf2510ae81058ee568b upstream. Stephane Eranian found a bug in that IBS' current Fetch counter was not being reset when the driver would write the new value to clear it along with the enable bit set, and found that adding an MSR write that would first disable IBS Fetch would make IBS Fetch reset its current count. Indeed, the PPR for AMD Family 17h Model 31h B0 55803 Rev 0.54 - Sep 12, 2019 states "The periodic fetch counter is set to IbsFetchCnt [...] when IbsFetchEn is changed from 0 to 1." Explicitly set IbsFetchEn to 0 and then to 1 when re-enabling IBS Fetch, so the driver properly resets the internal counter to 0 and IBS Fetch starts counting again. A family 15h machine tested does not have this problem, and the extra wrmsr is also not needed on Family 19h, so only do the extra wrmsr on families 16h through 18h. Reported-by: Stephane Eranian <stephane.eranian@google.com> Signed-off-by: Kim Phillips <kim.phillips@amd.com> [peterz: optimized] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-29x86/events/amd/iommu: Fix sizeof mismatchColin Ian King
[ Upstream commit 59d5396a4666195f89a67e118e9e627ddd6f53a1 ] An incorrect sizeof is being used, struct attribute ** is not correct, it should be struct attribute *. Note that since ** is the same size as * this is not causing any issues. Improve this fix by using sizeof(*attrs) as this allows us to not even reference the type of the pointer. Addresses-Coverity: ("Sizeof not portable (SIZEOF_MISMATCH)") Fixes: 51686546304f ("x86/events/amd/iommu: Fix sysfs perf attribute groups") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201001113900.58889-1-colin.king@canonical.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-03-20perf/amd/uncore: Replace manual sampling check with CAP_NO_INTERRUPT flagKim Phillips
[ Upstream commit f967140dfb7442e2db0868b03b961f9c59418a1b ] Enable the sampling check in kernel/events/core.c::perf_event_open(), which returns the more appropriate -EOPNOTSUPP. BEFORE: $ sudo perf record -a -e instructions,l3_request_g1.caching_l3_cache_accesses true Error: The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (l3_request_g1.caching_l3_cache_accesses). /bin/dmesg | grep -i perf may provide additional information. With nothing relevant in dmesg. AFTER: $ sudo perf record -a -e instructions,l3_request_g1.caching_l3_cache_accesses true Error: l3_request_g1.caching_l3_cache_accesses: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat' Fixes: c43ca5091a37 ("perf/x86/amd: Add support for AMD NB and L2I "uncore" counters") Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20200311191323.13124-1-kim.phillips@amd.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-19perf/x86/intel: Fix inaccurate period in context switch for auto-reloadKan Liang
commit f861854e1b435b27197417f6f90d87188003cb24 upstream. Perf doesn't take the left period into account when auto-reload is enabled with fixed period sampling mode in context switch. Here is the MSR trace of the perf command as below. (The MSR trace is simplified from a ftrace log.) #perf record -e cycles:p -c 2000000 -- ./triad_loop //The MSR trace of task schedule out //perf disable all counters, disable PEBS, disable GP counter 0, //read GP counter 0, and re-enable all counters. //The counter 0 stops at 0xfffffff82840 write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value 0 write_msr: MSR_IA32_PEBS_ENABLE(3f1), value 0 write_msr: MSR_P6_EVNTSEL0(186), value 40003003c rdpmc: 0, value fffffff82840 write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value f000000ff //The MSR trace of the same task schedule in again //perf disable all counters, enable and set GP counter 0, //enable PEBS, and re-enable all counters. //0xffffffe17b80 (-2000000) is written to GP counter 0. write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value 0 write_msr: MSR_IA32_PMC0(4c1), value ffffffe17b80 write_msr: MSR_P6_EVNTSEL0(186), value 40043003c write_msr: MSR_IA32_PEBS_ENABLE(3f1), value 1 write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value f000000ff When the same task schedule in again, the counter should starts from previous left. However, it starts from the fixed period -2000000 again. A special variant of intel_pmu_save_and_restart() is used for auto-reload, which doesn't update the hwc->period_left. When the monitored task schedules in again, perf doesn't know the left period. The fixed period is used, which is inaccurate. With auto-reload, the counter always has a negative counter value. So the left period is -value. Update the period_left in intel_pmu_save_and_restart_reload(). With the patch: //The MSR trace of task schedule out write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value 0 write_msr: MSR_IA32_PEBS_ENABLE(3f1), value 0 write_msr: MSR_P6_EVNTSEL0(186), value 40003003c rdpmc: 0, value ffffffe25cbc write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value f000000ff //The MSR trace of the same task schedule in again write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value 0 write_msr: MSR_IA32_PMC0(4c1), value ffffffe25cbc write_msr: MSR_P6_EVNTSEL0(186), value 40043003c write_msr: MSR_IA32_PEBS_ENABLE(3f1), value 1 write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value f000000ff Fixes: d31fc13fdcb2 ("perf/x86/intel: Fix event update for auto-reload") Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20200121190125.3389-1-kan.liang@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-19perf/x86/amd: Add missing L2 misses event spec to AMD Family 17h's event mapKim Phillips
commit 25d387287cf0330abf2aad761ce6eee67326a355 upstream. Commit 3fe3331bb285 ("perf/x86/amd: Add event map for AMD Family 17h"), claimed L2 misses were unsupported, due to them not being found in its referenced documentation, whose link has now moved [1]. That old documentation listed PMCx064 unit mask bit 3 as: "LsRdBlkC: LS Read Block C S L X Change to X Miss." and bit 0 as: "IcFillMiss: IC Fill Miss" We now have new public documentation [2] with improved descriptions, that clearly indicate what events those unit mask bits represent: Bit 3 now clearly states: "LsRdBlkC: Data Cache Req Miss in L2 (all types)" and bit 0 is: "IcFillMiss: Instruction Cache Req Miss in L2." So we can now add support for L2 misses in perf's genericised events as PMCx064 with both the above unit masks. [1] The commit's original documentation reference, "Processor Programming Reference (PPR) for AMD Family 17h Model 01h, Revision B1 Processors", originally available here: https://www.amd.com/system/files/TechDocs/54945_PPR_Family_17h_Models_00h-0Fh.pdf is now available here: https://developer.amd.com/wordpress/media/2017/11/54945_PPR_Family_17h_Models_00h-0Fh.pdf [2] "Processor Programming Reference (PPR) for Family 17h Model 31h, Revision B0 Processors", available here: https://developer.amd.com/wp-content/resources/55803_0.54-PUB.pdf Fixes: 3fe3331bb285 ("perf/x86/amd: Add event map for AMD Family 17h") Reported-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Babu Moger <babu.moger@amd.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20200121171232.28839-1-kim.phillips@amd.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-27perf, pt, coresight: Fix address filters for vmas with non-zero offsetAlexander Shishkin
[ Upstream commit c60f83b813e5b25ccd5de7e8c8925c31b3aebcc1 ] Currently, the address range calculation for file-based filters works as long as the vma that maps the matching part of the object file starts from offset zero into the file (vm_pgoff==0). Otherwise, the resulting filter range would be off by vm_pgoff pages. Another related problem is that in case of a partially matching vma, that is, a vma that matches part of a filter region, the filter range size wouldn't be adjusted. Fix the arithmetics around address filter range calculations, taking into account vma offset, so that the entire calculation is done before the filter configuration is passed to the PMU drivers instead of having those drivers do the final bit of arithmetics. Based on the patch by Adrian Hunter <adrian.hunter.intel.com>. Reported-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Tested-by: Mathieu Poirier <mathieu.poirier@linaro.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Jiri Olsa <jolsa@redhat.com> Fixes: 375637bc5249 ("perf/core: Introduce address range filtering") Link: http://lkml.kernel.org/r/20190215115655.63469-3-alexander.shishkin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-12perf/x86/intel: Fix PT PMI handlingAlexander Shishkin
[ Upstream commit 92ca7da4bdc24d63bb0bcd241c11441ddb63b80a ] Commit: ccbebba4c6bf ("perf/x86/intel/pt: Bypass PT vs. LBR exclusivity if the core supports it") skips the PT/LBR exclusivity check on CPUs where PT and LBRs coexist, but also inadvertently skips the active_events bump for PT in that case, which is a bug. If there aren't any hardware events at the same time as PT, the PMI handler will ignore PT PMIs, as active_events reads zero in that case, resulting in the "Uhhuh" spurious NMI warning and PT data loss. Fix this by always increasing active_events for PT events. Fixes: ccbebba4c6bf ("perf/x86/intel/pt: Bypass PT vs. LBR exclusivity if the core supports it") Reported-by: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lkml.kernel.org/r/20191210105101.77210-1-alexander.shishkin@linux.intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-09perf/x86/intel/bts: Fix the use of page_private()Alexander Shishkin
[ Upstream commit ff61541cc6c1962957758ba433c574b76f588d23 ] Commit 8062382c8dbe2 ("perf/x86/intel/bts: Add BTS PMU driver") brought in a warning with the BTS buffer initialization that is easily tripped with (assuming KPTI is disabled): instantly throwing: > ------------[ cut here ]------------ > WARNING: CPU: 2 PID: 326 at arch/x86/events/intel/bts.c:86 bts_buffer_setup_aux+0x117/0x3d0 > Modules linked in: > CPU: 2 PID: 326 Comm: perf Not tainted 5.4.0-rc8-00291-gceb9e77324fa #904 > RIP: 0010:bts_buffer_setup_aux+0x117/0x3d0 > Call Trace: > rb_alloc_aux+0x339/0x550 > perf_mmap+0x607/0xc70 > mmap_region+0x76b/0xbd0 ... It appears to assume (for lost raisins) that PagePrivate() is set, while later it actually tests for PagePrivate() before using page_private(). Make it consistent and always check PagePrivate() before using page_private(). Fixes: 8062382c8dbe2 ("perf/x86/intel/bts: Add BTS PMU driver") Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lkml.kernel.org/r/20191205142853.28894-2-alexander.shishkin@linux.intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-12perf/x86/uncore: Fix event group supportKan Liang
[ Upstream commit 75be6f703a141b048590d659a3954c4fedd30bba ] The events in the same group don't start or stop simultaneously. Here is the ftrace when enabling event group for uncore_iio_0: # perf stat -e "{uncore_iio_0/event=0x1/,uncore_iio_0/event=0xe/}" <idle>-0 [000] d.h. 8959.064832: read_msr: a41, value b2b0b030 //Read counter reg of IIO unit0 counter0 <idle>-0 [000] d.h. 8959.064835: write_msr: a48, value 400001 //Write Ctrl reg of IIO unit0 counter0 to enable counter0. <------ Although counter0 is enabled, Unit Ctrl is still freezed. Nothing will count. We are still good here. <idle>-0 [000] d.h. 8959.064836: read_msr: a40, value 30100 //Read Unit Ctrl reg of IIO unit0 <idle>-0 [000] d.h. 8959.064838: write_msr: a40, value 30000 //Write Unit Ctrl reg of IIO unit0 to enable all counters in the unit by clear Freeze bit <------Unit0 is un-freezed. Counter0 has been enabled. Now it starts counting. But counter1 has not been enabled yet. The issue starts here. <idle>-0 [000] d.h. 8959.064846: read_msr: a42, value 0 //Read counter reg of IIO unit0 counter1 <idle>-0 [000] d.h. 8959.064847: write_msr: a49, value 40000e //Write Ctrl reg of IIO unit0 counter1 to enable counter1. <------ Now, counter1 just starts to count. Counter0 has been running for a while. Current code un-freezes the Unit Ctrl right after the first counter is enabled. The subsequent group events always loses some counter values. Implement pmu_enable and pmu_disable support for uncore, which can help to batch hardware accesses. No one uses uncore_enable_box and uncore_disable_box. Remove them. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-drivers-review@eclists.intel.com Cc: linux-perf@eclists.intel.com Fixes: 087bfbb03269 ("perf/x86: Add generic Intel uncore PMU support") Link: https://lkml.kernel.org/r/1572014593-31591-1-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-12perf/x86/amd/ibs: Handle erratum #420 only on the affected CPU family (10h)Kim Phillips
[ Upstream commit e431e79b60603079d269e0c2a5177943b95fa4b6 ] This saves us writing the IBS control MSR twice when disabling the event. I searched revision guides for all families since 10h, and did not find occurrence of erratum #420, nor anything remotely similar: so we isolate the secondary MSR write to family 10h only. Also unconditionally update the count mask for IBS Op implementations that have read & writeable current count (CurCnt) fields in addition to the MaxCnt field. These bits were reserved on prior implementations, and therefore shouldn't have negative impact. Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Fixes: c9574fe0bdb9 ("perf/x86-ibs: Implement workaround for IBS erratum #420") Link: https://lkml.kernel.org/r/20191023150955.30292-2-kim.phillips@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-12perf/x86/amd/ibs: Fix reading of the IBS OpData register and thus precise ↵Kim Phillips
RIP validity [ Upstream commit 317b96bb14303c7998dbcd5bc606bd8038fdd4b4 ] The loop that reads all the IBS MSRs into *buf stopped one MSR short of reading the IbsOpData register, which contains the RipInvalid status bit. Fix the offset_max assignment so the MSR gets read, so the RIP invalid evaluation is based on what the IBS h/w output, instead of what was left in memory. Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Fixes: d47e8238cd76 ("perf/x86-ibs: Take instruction pointer from ibs sample") Link: https://lkml.kernel.org/r/20191023150955.30292-1-kim.phillips@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-06perf/x86/amd: Change/fix NMI latency mitigation to use a timestampTom Lendacky
[ Upstream commit df4d29732fdad43a51284f826bec3e6ded177540 ] It turns out that the NMI latency workaround from commit: 6d3edaae16c6 ("x86/perf/amd: Resolve NMI latency issues for active PMCs") ends up being too conservative and results in the perf NMI handler claiming NMIs too easily on AMD hardware when the NMI watchdog is active. This has an impact, for example, on the hpwdt (HPE watchdog timer) module. This module can produce an NMI that is used to reset the system. It registers an NMI handler for the NMI_UNKNOWN type and relies on the fact that nothing has claimed an NMI so that its handler will be invoked when the watchdog device produces an NMI. After the referenced commit, the hpwdt module is unable to process its generated NMI if the NMI watchdog is active, because the current NMI latency mitigation results in the NMI being claimed by the perf NMI handler. Update the AMD perf NMI latency mitigation workaround to, instead, use a window of time. Whenever a PMC is handled in the perf NMI handler, set a timestamp which will act as a perf NMI window. Any NMIs arriving within that window will be claimed by perf. Anything outside that window will not be claimed by perf. The value for the NMI window is set to 100 msecs. This is a conservative value that easily covers any NMI latency in the hardware. While this still results in a window in which the hpwdt module will not receive its NMI, the window is now much, much smaller. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Jerry Hoemann <jerry.hoemann@hpe.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 6d3edaae16c6 ("x86/perf/amd: Resolve NMI latency issues for active PMCs") Link: https://lkml.kernel.org/r/Message-ID: Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-09-21perf/x86/amd/ibs: Fix sample bias for dispatched micro-opsKim Phillips
[ Upstream commit 0f4cd769c410e2285a4e9873a684d90423f03090 ] When counting dispatched micro-ops with cnt_ctl=1, in order to prevent sample bias, IBS hardware preloads the least significant 7 bits of current count (IbsOpCurCnt) with random values, such that, after the interrupt is handled and counting resumes, the next sample taken will be slightly perturbed. The current count bitfield is in the IBS execution control h/w register, alongside the maximum count field. Currently, the IBS driver writes that register with the maximum count, leaving zeroes to fill the current count field, thereby overwriting the random bits the hardware preloaded for itself. Fix the driver to actually retain and carry those random bits from the read of the IBS control register, through to its write, instead of overwriting the lower current count bits with zeroes. Tested with: perf record -c 100001 -e ibs_op/cnt_ctl=1/pp -a -C 0 taskset -c 0 <workload> 'perf annotate' output before: 15.70 65: addsd %xmm0,%xmm1 17.30 add $0x1,%rax 15.88 cmp %rdx,%rax je 82 17.32 72: test $0x1,%al jne 7c 7.52 movapd %xmm1,%xmm0 5.90 jmp 65 8.23 7c: sqrtsd %xmm1,%xmm0 12.15 jmp 65 'perf annotate' output after: 16.63 65: addsd %xmm0,%xmm1 16.82 add $0x1,%rax 16.81 cmp %rdx,%rax je 82 16.69 72: test $0x1,%al jne 7c 8.30 movapd %xmm1,%xmm0 8.13 jmp 65 8.24 7c: sqrtsd %xmm1,%xmm0 8.39 jmp 65 Tested on Family 15h and 17h machines. Machines prior to family 10h Rev. C don't have the RDWROPCNT capability, and have the IbsOpCurCnt bitfield reserved, so this patch shouldn't affect their operation. It is unknown why commit db98c5faf8cb ("perf/x86: Implement 64-bit counter support for IBS") ignored the lower 4 bits of the IbsOpCurCnt field; the number of preloaded random bits has always been 7, AFAICT. Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: "Arnaldo Carvalho de Melo" <acme@kernel.org> Cc: <x86@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "Borislav Petkov" <bp@alien8.de> Cc: Stephane Eranian <eranian@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: "Namhyung Kim" <namhyung@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lkml.kernel.org/r/20190826195730.30614-1-kim.phillips@amd.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-09-21perf/x86/intel: Restrict period on NehalemJosh Hunt
[ Upstream commit 44d3bbb6f5e501b873218142fe08cdf62a4ac1f3 ] We see our Nehalem machines reporting 'perfevents: irq loop stuck!' in some cases when using perf: perfevents: irq loop stuck! WARNING: CPU: 0 PID: 3485 at arch/x86/events/intel/core.c:2282 intel_pmu_handle_irq+0x37b/0x530 ... RIP: 0010:intel_pmu_handle_irq+0x37b/0x530 ... Call Trace: <NMI> ? perf_event_nmi_handler+0x2e/0x50 ? intel_pmu_save_and_restart+0x50/0x50 perf_event_nmi_handler+0x2e/0x50 nmi_handle+0x6e/0x120 default_do_nmi+0x3e/0x100 do_nmi+0x102/0x160 end_repeat_nmi+0x16/0x50 ... ? native_write_msr+0x6/0x20 ? native_write_msr+0x6/0x20 </NMI> intel_pmu_enable_event+0x1ce/0x1f0 x86_pmu_start+0x78/0xa0 x86_pmu_enable+0x252/0x310 __perf_event_task_sched_in+0x181/0x190 ? __switch_to_asm+0x41/0x70 ? __switch_to_asm+0x35/0x70 ? __switch_to_asm+0x41/0x70 ? __switch_to_asm+0x35/0x70 finish_task_switch+0x158/0x260 __schedule+0x2f6/0x840 ? hrtimer_start_range_ns+0x153/0x210 schedule+0x32/0x80 schedule_hrtimeout_range_clock+0x8a/0x100 ? hrtimer_init+0x120/0x120 ep_poll+0x2f7/0x3a0 ? wake_up_q+0x60/0x60 do_epoll_wait+0xa9/0xc0 __x64_sys_epoll_wait+0x1a/0x20 do_syscall_64+0x4e/0x110 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7fdeb1e96c03 ... Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: acme@kernel.org Cc: Josh Hunt <johunt@akamai.com> Cc: bpuranda@akamai.com Cc: mingo@redhat.com Cc: jolsa@redhat.com Cc: tglx@linutronix.de Cc: namhyung@kernel.org Cc: alexander.shishkin@linux.intel.com Link: https://lkml.kernel.org/r/1566256411-18820-1-git-send-email-johunt@akamai.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-26perf/x86/amd/uncore: Set the thread mask for F17h L3 PMCsKim Phillips
commit 2f217d58a8a086d3399fecce39fb358848e799c4 upstream. Fill in the L3 performance event select register ThreadMask bitfield, to enable per hardware thread accounting. Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Gary Hook <Gary.Hook@amd.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Liska <mliska@suse.cz> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Pu Wen <puwen@hygon.cn> Cc: Stephane Eranian <eranian@google.com> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lkml.kernel.org/r/20190628215906.4276-2-kim.phillips@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-26perf/x86/amd/uncore: Do not set 'ThreadMask' and 'SliceMask' for non-L3 PMCsKim Phillips
commit 16f4641166b10e199f0d7b68c2c5f004fef0bda3 upstream. The following commit: d7cbbe49a930 ("perf/x86/amd/uncore: Set ThreadMask and SliceMask for L3 Cache perf events") enables L3 PMC events for all threads and slices by writing 1's in 'ChL3PmcCfg' (L3 PMC PERF_CTL) register fields. Those bitfields overlap with high order event select bits in the Data Fabric PMC control register, however. So when a user requests raw Data Fabric events (-e amd_df/event=0xYYY/), the two highest order bits get inadvertently set, changing the counter select to events that don't exist, and for which no counts are read. This patch changes the logic to write the L3 masks only when dealing with L3 PMC counters. AMD Family 16h and below Northbridge (NB) counters were not affected. Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Gary Hook <Gary.Hook@amd.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Liska <mliska@suse.cz> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Pu Wen <puwen@hygon.cn> Cc: Stephane Eranian <eranian@google.com> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Fixes: d7cbbe49a930 ("perf/x86/amd/uncore: Set ThreadMask and SliceMask for L3 Cache perf events") Link: https://lkml.kernel.org/r/20190628215906.4276-1-kim.phillips@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-26perf/x86/intel: Fix spurious NMI on fixed counterKan Liang
commit e4557c1a46b0d32746bd309e1941914b5a6912b4 upstream. If a user first sample a PEBS event on a fixed counter, then sample a non-PEBS event on the same fixed counter on Icelake, it will trigger spurious NMI. For example: perf record -e 'cycles:p' -a perf record -e 'cycles' -a The error message for spurious NMI: [June 21 15:38] Uhhuh. NMI received for unknown reason 30 on CPU 2. [ +0.000000] Do you have a strange power saving mode enabled? [ +0.000000] Dazed and confused, but trying to continue The bug was introduced by the following commit: commit 6f55967ad9d9 ("perf/x86/intel: Fix race in intel_pmu_disable_event()") The commit moves the intel_pmu_pebs_disable() after intel_pmu_disable_fixed(), which returns immediately. The related bit of PEBS_ENABLE MSR will never be cleared for the fixed counter. Then a non-PEBS event runs on the fixed counter, but the bit on PEBS_ENABLE is still set, which triggers spurious NMIs. Check and disable PEBS for fixed counters after intel_pmu_disable_fixed(). Reported-by: Yi, Ammy <ammy.yi@intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: <stable@vger.kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Fixes: 6f55967ad9d9 ("perf/x86/intel: Fix race in intel_pmu_disable_event()") Link: https://lkml.kernel.org/r/20190625142135.22112-1-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-26perf/x86/intel/uncore: Handle invalid event coding for free-running counterKan Liang
[ Upstream commit 543ac280b3576c0009e8c0fcd4d6bfc9978d7bd0 ] Counting with invalid event coding for free-running counter may cause OOPs, e.g. uncore_iio_free_running_0/event=1/. Current code only validate the event with free-running event format, event=0xff,umask=0xXY. Non-free-running event format never be checked for the PMU with free-running counters. Add generic hw_config() to check and reject the invalid event coding for free-running PMU. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@kernel.org Cc: eranian@google.com Fixes: 0f519f0352e3 ("perf/x86/intel/uncore: Support IIO free-running counters on SKX") Link: https://lkml.kernel.org/r/1556672028-119221-2-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-06-22perf/x86/intel/ds: Fix EVENT vs. UEVENT PEBS constraintsStephane Eranian
[ Upstream commit 23e3983a466cd540ffdd2bbc6e0c51e31934f941 ] This patch fixes an bug revealed by the following commit: 6b89d4c1ae85 ("perf/x86/intel: Fix INTEL_FLAGS_EVENT_CONSTRAINT* masking") That patch modified INTEL_FLAGS_EVENT_CONSTRAINT() to only look at the event code when matching a constraint. If code+umask were needed, then the INTEL_FLAGS_UEVENT_CONSTRAINT() macro was needed instead. This broke with some of the constraints for PEBS events. Several of them, including the one used for cycles:p, cycles:pp, cycles:ppp fell in that category and caused the event to be rejected in PEBS mode. In other words, on some platforms a cmdline such as: $ perf top -e cycles:pp would fail with -EINVAL. This patch fixes this bug by properly using INTEL_FLAGS_UEVENT_CONSTRAINT() when needed in the PEBS constraint tables. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Stephane Eranian <eranian@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: kan.liang@intel.com Link: http://lkml.kernel.org/r/20190521005246.423-1-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-06-15perf/x86/intel: Allow PEBS multi-entry in watermark modeStephane Eranian
[ Upstream commit c7a286577d7592720c2f179aadfb325a1ff48c95 ] This patch fixes a restriction/bug introduced by: 583feb08e7f7 ("perf/x86/intel: Fix handling of wakeup_events for multi-entry PEBS") The original patch prevented using multi-entry PEBS when wakeup_events != 0. However given that wakeup_events is part of a union with wakeup_watermark, it means that in watermark mode, PEBS multi-entry is also disabled which is not the intent. This patch fixes this by checking is watermark mode is enabled. Signed-off-by: Stephane Eranian <eranian@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: jolsa@redhat.com Cc: kan.liang@intel.com Cc: vincent.weaver@maine.edu Fixes: 583feb08e7f7 ("perf/x86/intel: Fix handling of wakeup_events for multi-entry PEBS") Link: http://lkml.kernel.org/r/20190514003400.224340-1-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-05-31perf/x86/intel/cstate: Add Icelake supportKan Liang
[ Upstream commit f08c47d1f86c6dc666c7e659d94bf6d4492aa9d7 ] Icelake uses the same C-state residency events as Sandy Bridge. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Cc: jolsa@kernel.org Link: https://lkml.kernel.org/r/20190402194509.2832-10-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-05-31perf/x86/intel/rapl: Add Icelake supportKan Liang
[ Upstream commit b3377c3acb9e54cf86efcfe25f2e792bca599ed4 ] Icelake support the same RAPL counters as Skylake. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Cc: jolsa@kernel.org Link: https://lkml.kernel.org/r/20190402194509.2832-11-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-05-31perf/x86/msr: Add Icelake supportKan Liang
[ Upstream commit cf50d79a8cfe5adae37fec026220b009559bbeed ] Icelake is the same as the existing Skylake parts. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Cc: jolsa@kernel.org Link: https://lkml.kernel.org/r/20190402194509.2832-12-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-05-25perf/x86/intel: Fix race in intel_pmu_disable_event()Jiri Olsa
[ Upstream commit 6f55967ad9d9752813e36de6d5fdbd19741adfc7 ] New race in x86_pmu_stop() was introduced by replacing the atomic __test_and_clear_bit() of cpuc->active_mask by separate test_bit() and __clear_bit() calls in the following commit: 3966c3feca3f ("x86/perf/amd: Remove need to check "running" bit in NMI handler") The race causes panic for PEBS events with enabled callchains: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 ... RIP: 0010:perf_prepare_sample+0x8c/0x530 Call Trace: <NMI> perf_event_output_forward+0x2a/0x80 __perf_event_overflow+0x51/0xe0 handle_pmi_common+0x19e/0x240 intel_pmu_handle_irq+0xad/0x170 perf_event_nmi_handler+0x2e/0x50 nmi_handle+0x69/0x110 default_do_nmi+0x3e/0x100 do_nmi+0x11a/0x180 end_repeat_nmi+0x16/0x1a RIP: 0010:native_write_msr+0x6/0x20 ... </NMI> intel_pmu_disable_event+0x98/0xf0 x86_pmu_stop+0x6e/0xb0 x86_pmu_del+0x46/0x140 event_sched_out.isra.97+0x7e/0x160 ... The event is configured to make samples from PEBS drain code, but when it's disabled, we'll go through NMI path instead, where data->callchain will not get allocated and we'll crash: x86_pmu_stop test_bit(hwc->idx, cpuc->active_mask) intel_pmu_disable_event(event) { ... intel_pmu_pebs_disable(event); ... EVENT OVERFLOW -> <NMI> intel_pmu_handle_irq handle_pmi_common TEST PASSES -> test_bit(bit, cpuc->active_mask)) perf_event_overflow perf_prepare_sample { ... if (!(sample_type & __PERF_SAMPLE_CALLCHAIN_EARLY)) data->callchain = perf_callchain(event, regs); CRASH -> size += data->callchain->nr; } </NMI> ... x86_pmu_disable_event(event) } __clear_bit(hwc->idx, cpuc->active_mask); Fixing this by disabling the event itself before setting off the PEBS bit. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Arcari <darcari@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Lendacky Thomas <Thomas.Lendacky@amd.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Fixes: 3966c3feca3f ("x86/perf/amd: Remove need to check "running" bit in NMI handler") Link: http://lkml.kernel.org/r/20190504151556.31031-1-jolsa@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-05-14x86/cpu: Sanitize FAM6_ATOM namingPeter Zijlstra
commit f2c4db1bd80720cd8cb2a5aa220d9bc9f374f04e upstream Going primarily by: https://en.wikipedia.org/wiki/List_of_Intel_Atom_microprocessors with additional information gleaned from other related pages; notably: - Bonnell shrink was called Saltwell - Moorefield is the Merriefield refresh which makes it Airmont The general naming scheme is: FAM6_ATOM_UARCH_SOCTYPE for i in `git grep -l FAM6_ATOM` ; do sed -i -e 's/ATOM_PINEVIEW/ATOM_BONNELL/g' \ -e 's/ATOM_LINCROFT/ATOM_BONNELL_MID/' \ -e 's/ATOM_PENWELL/ATOM_SALTWELL_MID/g' \ -e 's/ATOM_CLOVERVIEW/ATOM_SALTWELL_TABLET/g' \ -e 's/ATOM_CEDARVIEW/ATOM_SALTWELL/g' \ -e 's/ATOM_SILVERMONT1/ATOM_SILVERMONT/g' \ -e 's/ATOM_SILVERMONT2/ATOM_SILVERMONT_X/g' \ -e 's/ATOM_MERRIFIELD/ATOM_SILVERMONT_MID/g' \ -e 's/ATOM_MOOREFIELD/ATOM_AIRMONT_MID/g' \ -e 's/ATOM_DENVERTON/ATOM_GOLDMONT_X/g' \ -e 's/ATOM_GEMINI_LAKE/ATOM_GOLDMONT_PLUS/g' ${i} done Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: dave.hansen@linux.intel.com Cc: len.brown@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-10perf/x86/intel: Initialize TFA MSRPeter Zijlstra
[ Upstream commit d7262457e35dbe239659e62654e56f8ddb814bed ] Stephane reported that the TFA MSR is not initialized by the kernel, but the TFA bit could set by firmware or as a leftover from a kexec, which makes the state inconsistent. Reported-by: Stephane Eranian <eranian@google.com> Tested-by: Nelson DSouza <nelson.dsouza@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: tonyj@suse.com Link: https://lkml.kernel.org/r/20190321123849.GN6521@hirez.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-05-10perf/x86/intel: Fix handling of wakeup_events for multi-entry PEBSStephane Eranian
[ Upstream commit 583feb08e7f7ac9d533b446882eb3a54737a6dbb ] When an event is programmed with attr.wakeup_events=N (N>0), it means the caller is interested in getting a user level notification after N samples have been recorded in the kernel sampling buffer. With precise events on Intel processors, the kernel uses PEBS. The kernel tries minimize sampling overhead by verifying if the event configuration is compatible with multi-entry PEBS mode. If so, the kernel is notified only when the buffer has reached its threshold. Other PEBS operates in single-entry mode, the kenrel is notified for each PEBS sample. The problem is that the current implementation look at frequency mode and event sample_type but ignores the wakeup_events field. Thus, it may not be possible to receive a notification after each precise event. This patch fixes this problem by disabling multi-entry PEBS if wakeup_events is non-zero. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: kan.liang@intel.com Link: https://lkml.kernel.org/r/20190306195048.189514-1-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-05-08perf/x86/amd: Update generic hardware cache events for Family 17hKim Phillips
commit 0e3b74e26280f2cf8753717a950b97d424da6046 upstream. Add a new amd_hw_cache_event_ids_f17h assignment structure set for AMD families 17h and above, since a lot has changed. Specifically: L1 Data Cache The data cache access counter remains the same on Family 17h. For DC misses, PMCx041's definition changes with Family 17h, so instead we use the L2 cache accesses from L1 data cache misses counter (PMCx060,umask=0xc8). For DC hardware prefetch events, Family 17h breaks compatibility for PMCx067 "Data Prefetcher", so instead, we use PMCx05a "Hardware Prefetch DC Fills." L1 Instruction Cache PMCs 0x80 and 0x81 (32-byte IC fetches and misses) are backward compatible on Family 17h. For prefetches, we remove the erroneous PMCx04B assignment which counts how many software data cache prefetch load instructions were dispatched. LL - Last Level Cache Removing PMCs 7D, 7E, and 7F assignments, as they do not exist on Family 17h, where the last level cache is L3. L3 counters can be accessed using the existing AMD Uncore driver. Data TLB On Intel machines, data TLB accesses ("dTLB-loads") are assigned to counters that count load/store instructions retired. This is inconsistent with instruction TLB accesses, where Intel implementations report iTLB misses that hit in the STLB. Ideally, dTLB-loads would count higher level dTLB misses that hit in lower level TLBs, and dTLB-load-misses would report those that also missed in those lower-level TLBs, therefore causing a page table walk. That would be consistent with instruction TLB operation, remove the redundancy between dTLB-loads and L1-dcache-loads, and prevent perf from producing artificially low percentage ratios, i.e. the "0.01%" below: 42,550,869 L1-dcache-loads 41,591,860 dTLB-loads 4,802 dTLB-load-misses # 0.01% of all dTLB cache hits 7,283,682 L1-dcache-stores 7,912,392 dTLB-stores 310 dTLB-store-misses On AMD Families prior to 17h, the "Data Cache Accesses" counter is used, which is slightly better than load/store instructions retired, but still counts in terms of individual load/store operations instead of TLB operations. So, for AMD Families 17h and higher, this patch assigns "dTLB-loads" to a counter for L1 dTLB misses that hit in the L2 dTLB, and "dTLB-load-misses" to a counter for L1 DTLB misses that caused L2 DTLB misses and therefore also caused page table walks. This results in a much more accurate view of data TLB performance: 60,961,781 L1-dcache-loads 4,601 dTLB-loads 963 dTLB-load-misses # 20.93% of all dTLB cache hits Note that for all AMD families, data loads and stores are combined in a single accesses counter, so no 'L1-dcache-stores' are reported separately, and stores are counted with loads in 'L1-dcache-loads'. Also note that the "% of all dTLB cache hits" string is misleading because (a) "dTLB cache": although TLBs can be considered caches for page tables, in this context, it can be misinterpreted as data cache hits because the figures are similar (at least on Intel), and (b) not all those loads (technically accesses) technically "hit" at that hardware level. "% of all dTLB accesses" would be more clear/accurate. Instruction TLB On Intel machines, 'iTLB-loads' measure iTLB misses that hit in the STLB, and 'iTLB-load-misses' measure iTLB misses that also missed in the STLB and completed a page table walk. For AMD Family 17h and above, for 'iTLB-loads' we replace the erroneous instruction cache fetches counter with PMCx084 "L1 ITLB Miss, L2 ITLB Hit". For 'iTLB-load-misses' we still use PMCx085 "L1 ITLB Miss, L2 ITLB Miss", but set a 0xff umask because without it the event does not get counted. Branch Predictor (BPU) PMCs 0xc2 and 0xc3 continue to be valid across all AMD Families. Node Level Events Family 17h does not have a PMCx0e9 counter, and corresponding counters have not been made available publicly, so for now, we mark them as unsupported for Families 17h and above. Reference: "Open-Source Register Reference For AMD Family 17h Processors Models 00h-2Fh" Released 7/17/2018, Publication #56255, Revision 3.03: https://www.amd.com/system/files/TechDocs/56255_OSRR.pdf [ mingo: tidied up the line breaks. ] Signed-off-by: Kim Phillips <kim.phillips@amd.com> Cc: <stable@vger.kernel.org> # v4.9+ Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Liška <mliska@suse.cz> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Pu Wen <puwen@hygon.cn> Cc: Stephane Eranian <eranian@google.com> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Lendacky <Thomas.Lendacky@amd.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-kernel@vger.kernel.org Cc: linux-perf-users@vger.kernel.org Fixes: e40ed1542dd7 ("perf/x86: Add perf support for AMD family-17h processors") Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02perf/x86/intel: Update KBL Package C-state events to also include ↵Harry Pan
PC8/PC9/PC10 counters commit 82c99f7a81f28f8c1be5f701c8377d14c4075b10 upstream. Kaby Lake (and Coffee Lake) has PC8/PC9/PC10 residency counters. This patch updates the list of Kaby/Coffee Lake PMU event counters from the snb_cstates[] list of events to the hswult_cstates[] list of events, which keeps all previously supported events and also adds the PKG_C8, PKG_C9 and PKG_C10 residency counters. This allows user space tools to profile them through the perf interface. Signed-off-by: Harry Pan <harry.pan@intel.com> Cc: <stable@vger.kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: gs0622@gmail.com Link: http://lkml.kernel.org/r/20190424145033.1924-1-harry.pan@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-27perf/x86: Fix incorrect PEBS_REGSKan Liang
commit 9d5dcc93a6ddfc78124f006ccd3637ce070ef2fc upstream. PEBS_REGS used as mask for the supported registers for large PEBS. However, the mask cannot filter the sample_regs_user/sample_regs_intr correctly. (1ULL << PERF_REG_X86_*) should be used to replace PERF_REG_X86_*, which is only the index. Rename PEBS_REGS to PEBS_GP_REGS, because the mask is only for general purpose registers. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Cc: jolsa@kernel.org Fixes: 2fe1bc1f501d ("perf/x86: Enable free running PEBS for REGS_USER/INTR") Link: https://lkml.kernel.org/r/20190402194509.2832-2-kan.liang@linux.intel.com [ Renamed it to PEBS_GP_REGS - as 'GPRS' is used elsewhere ;-) ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-27perf/x86/amd: Add event map for AMD Family 17hKim Phillips
commit 3fe3331bb285700ab2253dbb07f8e478fcea2f1b upstream. Family 17h differs from prior families by: - Does not support an L2 cache miss event - It has re-enumerated PMC counters for: - L2 cache references - front & back end stalled cycles So we add a new amd_f17h_perfmon_event_map[] so that the generic perf event names will resolve to the correct h/w events on family 17h and above processors. Reference sections 2.1.13.3.3 (stalls) and 2.1.13.3.6 (L2): https://www.amd.com/system/files/TechDocs/54945_PPR_Family_17h_Models_00h-0Fh.pdf Signed-off-by: Kim Phillips <kim.phillips@amd.com> Cc: <stable@vger.kernel.org> # v4.9+ Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Liška <mliska@suse.cz> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Pu Wen <puwen@hygon.cn> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Fixes: e40ed1542dd7 ("perf/x86: Add perf support for AMD family-17h processors") [ Improved the formatting a bit. ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-17x86/perf/amd: Remove need to check "running" bit in NMI handlerLendacky, Thomas
commit 3966c3feca3fd10b2935caa0b4a08c7dd59469e5 upstream. Spurious interrupt support was added to perf in the following commit, almost a decade ago: 63e6be6d98e1 ("perf, x86: Catch spurious interrupts after disabling counters") The two previous patches (resolving the race condition when disabling a PMC and NMI latency mitigation) allow for the removal of this older spurious interrupt support. Currently in x86_pmu_stop(), the bit for the PMC in the active_mask bitmap is cleared before disabling the PMC, which sets up a race condition. This race condition was mitigated by introducing the running bitmap. That race condition can be eliminated by first disabling the PMC, waiting for PMC reset on overflow and then clearing the bit for the PMC in the active_mask bitmap. The NMI handler will not re-enable a disabled counter. If x86_pmu_stop() is called from the perf NMI handler, the NMI latency mitigation support will guard against any unhandled NMI messages. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> # 4.14.x- Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lkml.kernel.org/r/Message-ID: Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-17x86/perf/amd: Resolve NMI latency issues for active PMCsLendacky, Thomas
commit 6d3edaae16c6c7d238360f2841212c2b26774d5e upstream. On AMD processors, the detection of an overflowed PMC counter in the NMI handler relies on the current value of the PMC. So, for example, to check for overflow on a 48-bit counter, bit 47 is checked to see if it is 1 (not overflowed) or 0 (overflowed). When the perf NMI handler executes it does not know in advance which PMC counters have overflowed. As such, the NMI handler will process all active PMC counters that have overflowed. NMI latency in newer AMD processors can result in multiple overflowed PMC counters being processed in one NMI and then a subsequent NMI, that does not appear to be a back-to-back NMI, not finding any PMC counters that have overflowed. This may appear to be an unhandled NMI resulting in either a panic or a series of messages, depending on how the kernel was configured. To mitigate this issue, add an AMD handle_irq callback function, amd_pmu_handle_irq(), that will invoke the common x86_pmu_handle_irq() function and upon return perform some additional processing that will indicate if the NMI has been handled or would have been handled had an earlier NMI not handled the overflowed PMC. Using a per-CPU variable, a minimum value of the number of active PMCs or 2 will be set whenever a PMC is active. This is used to indicate the possible number of NMIs that can still occur. The value of 2 is used for when an NMI does not arrive at the LAPIC in time to be collapsed into an already pending NMI. Each time the function is called without having handled an overflowed counter, the per-CPU value is checked. If the value is non-zero, it is decremented and the NMI indicates that it handled the NMI. If the value is zero, then the NMI indicates that it did not handle the NMI. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> # 4.14.x- Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lkml.kernel.org/r/Message-ID: Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-17x86/perf/amd: Resolve race condition when disabling PMCLendacky, Thomas
commit 914123fa39042e651d79eaf86bbf63a1b938dddf upstream. On AMD processors, the detection of an overflowed counter in the NMI handler relies on the current value of the counter. So, for example, to check for overflow on a 48 bit counter, bit 47 is checked to see if it is 1 (not overflowed) or 0 (overflowed). There is currently a race condition present when disabling and then updating the PMC. Increased NMI latency in newer AMD processors makes this race condition more pronounced. If the counter value has overflowed, it is possible to update the PMC value before the NMI handler can run. The updated PMC value is not an overflowed value, so when the perf NMI handler does run, it will not find an overflowed counter. This may appear as an unknown NMI resulting in either a panic or a series of messages, depending on how the kernel is configured. To eliminate this race condition, the PMC value must be checked after disabling the counter. Add an AMD function, amd_pmu_disable_all(), that will wait for the NMI handler to reset any active and overflowed counter after calling x86_pmu_disable_all(). Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> # 4.14.x- Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lkml.kernel.org/r/Message-ID: Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-05perf/aux: Make perf_event accessible to setup_aux()Mathieu Poirier
[ Upstream commit 840018668ce2d96783356204ff282d6c9b0e5f66 ] When pmu::setup_aux() is called the coresight PMU needs to know which sink to use for the session by looking up the information in the event's attr::config2 field. As such simply replace the cpu information by the complete perf_event structure and change all affected customers. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Reviewed-by: Suzuki Poulouse <suzuki.poulose@arm.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-s390@vger.kernel.org Link: http://lkml.kernel.org/r/20190131184714.20388-2-mathieu.poirier@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>