aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/iommu/intel/perfmon.c
AgeCommit message (Collapse)Author
2023-03-31iommu/vt-d: Fix an IOMMU perfmon warning when CPU hotplugKan Liang
A warning can be triggered when hotplug CPU 0. $ echo 0 > /sys/devices/system/cpu/cpu0/online ------------[ cut here ]------------ Voluntary context switch within RCU read-side critical section! WARNING: CPU: 0 PID: 19 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4f4/0x580 RIP: 0010:rcu_note_context_switch+0x4f4/0x580 Call Trace: <TASK> ? perf_event_update_userpage+0x104/0x150 __schedule+0x8d/0x960 ? perf_event_set_state.part.82+0x11/0x50 schedule+0x44/0xb0 schedule_timeout+0x226/0x310 ? __perf_event_disable+0x64/0x1a0 ? _raw_spin_unlock+0x14/0x30 wait_for_completion+0x94/0x130 __wait_rcu_gp+0x108/0x130 synchronize_rcu+0x67/0x70 ? invoke_rcu_core+0xb0/0xb0 ? __bpf_trace_rcu_stall_warning+0x10/0x10 perf_pmu_migrate_context+0x121/0x370 iommu_pmu_cpu_offline+0x6a/0xa0 ? iommu_pmu_del+0x1e0/0x1e0 cpuhp_invoke_callback+0x129/0x510 cpuhp_thread_fun+0x94/0x150 smpboot_thread_fn+0x183/0x220 ? sort_range+0x20/0x20 kthread+0xe6/0x110 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 </TASK> ---[ end trace 0000000000000000 ]--- The synchronize_rcu() will be invoked in the perf_pmu_migrate_context(), when migrating a PMU to a new CPU. However, the current for_each_iommu() is within RCU read-side critical section. Two methods were considered to fix the issue. - Use the dmar_global_lock to replace the RCU read lock when going through the drhd list. But it triggers a lockdep warning. - Use the cpuhp_setup_state_multi() to set up a dedicated state for each IOMMU PMU. The lock can be avoided. The latter method is implemented in this patch. Since each IOMMU PMU has a dedicated state, add cpuhp_node and cpu in struct iommu_pmu to track the state. The state can be dynamically allocated now. Remove the CPUHP_AP_PERF_X86_IOMMU_PERF_ONLINE. Fixes: 46284c6ceb5e ("iommu/vt-d: Support cpumask for IOMMU perfmon") Reported-by: Ammy Yi <ammy.yi@intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20230328182028.1366416-1-kan.liang@linux.intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20230329134721.469447-4-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-02-03iommu/vt-d: Add IOMMU perfmon overflow handler supportKan Liang
While enabled to count events and an event occurrence causes the counter value to increment and roll over to or past zero, this is termed a counter overflow. The overflow can trigger an interrupt. The IOMMU perfmon needs to handle the case properly. New HW IRQs are allocated for each IOMMU device for perfmon. The IRQ IDs are after the SVM range. In the overflow handler, the counter is not frozen. It's very unlikely that the same counter overflows again during the period. But it's possible that other counters overflow at the same time. Read the overflow register at the end of the handler and check whether there are more. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20230128200428.1459118-7-kan.liang@linux.intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-02-03iommu/vt-d: Support cpumask for IOMMU perfmonKan Liang
The perf subsystem assumes that all counters are by default per-CPU. So the user space tool reads a counter from each CPU. However, the IOMMU counters are system-wide and can be read from any CPU. Here we use a CPU mask to restrict counting to one CPU to handle the issue. (with CPU hotplug notifier to choose a different CPU if the chosen one is taken off-line). The CPU is exposed to /sys/bus/event_source/devices/dmar*/cpumask for the user space perf tool. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20230128200428.1459118-6-kan.liang@linux.intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-02-03iommu/vt-d: Add IOMMU perfmon supportKan Liang
Implement the IOMMU performance monitor capability, which supports the collection of information about key events occurring during operation of the remapping hardware, to aid performance tuning and debug. The IOMMU perfmon support is implemented as part of the IOMMU driver and interfaces with the Linux perf subsystem. The IOMMU PMU has the following unique features compared with the other PMUs. - Support counting. Not support sampling. - Does not support per-thread counting. The scope is system-wide. - Support per-counter capability register. The event constraints can be enumerated. - The available event and event group can also be enumerated. - Extra Enhanced Commands are introduced to control the counters. Add a new variable, struct iommu_pmu *pmu, to in the struct intel_iommu to track the PMU related information. Add iommu_pmu_register() and iommu_pmu_unregister() to register and unregister a IOMMU PMU. The register function setup the IOMMU PMU ops and invoke the standard perf_pmu_register() interface to register a PMU in the perf subsystem. This patch only exposes the functions. The following patch will enable them in the IOMMU driver. The IOMMU PMUs can be found under /sys/bus/event_source/devices/dmar* The available filters and event format can be found at the format folder $ ls /sys/bus/event_source/devices/dmar1/format/ event event_group filter_ats filter_ats_en filter_page_table filter_page_table_en The supported events can be found at the events folder $ ls /sys/bus/event_source/devices/dmar1/events/ ats_blocked fs_nonleaf_hit int_cache_hit_posted iommu_mem_blocked iotlb_hit pasid_cache_lookup ss_nonleaf_hit ctxt_cache_hit fs_nonleaf_lookup int_cache_lookup iommu_mrds iotlb_lookup pg_req_posted ss_nonleaf_lookup ctxt_cache_lookup int_cache_hit_nonposted iommu_clocks iommu_requests pasid_cache_hit pw_occupancy The command below illustrates filter usage with a simple example. $ perf stat -e dmar1/iommu_requests,filter_ats_en=0x1,filter_ats=0x1/ -a sleep 1 Performance counter stats for 'system wide': 368,947 dmar1/iommu_requests,filter_ats_en=0x1,filter_ats=0x1/ 1.002592074 seconds time elapsed Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20230128200428.1459118-5-kan.liang@linux.intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-02-03iommu/vt-d: Retrieve IOMMU perfmon capability informationKan Liang
The performance monitoring infrastructure, perfmon, is to support collection of information about key events occurring during operation of the remapping hardware, to aid performance tuning and debug. Each remapping hardware unit has capability registers that indicate support for performance monitoring features and enumerate the capabilities. Add alloc_iommu_pmu() to retrieve IOMMU perfmon capability information for each iommu unit. The information is stored in the iommu->pmu data structure. Capability registers are read-only, so it's safe to prefetch and store them in the pmu structure. This could avoid unnecessary VMEXIT when this code is running in the virtualization environment. Add free_iommu_pmu() to free the saved capability information when freeing the iommu unit. Add a kernel config option for the IOMMU perfmon feature. Unless a user explicitly uses the perf tool to monitor the IOMMU perfmon event, there isn't any impact for the existing IOMMU. Enable it by default. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20230128200428.1459118-3-kan.liang@linux.intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>