aboutsummaryrefslogtreecommitdiffstats
path: root/kernel/irq
AgeCommit message (Collapse)Author
2023-11-28genirq/generic_chip: Make irq_remove_generic_chip() irqdomain awareHerve Codina
commit 5e7afb2eb7b2a7c81e9f608cbdf74a07606fd1b5 upstream. irq_remove_generic_chip() calculates the Linux interrupt number for removing the handler and interrupt chip based on gc::irq_base as a linear function of the bit positions of set bits in the @msk argument. When the generic chip is present in an irq domain, i.e. created with a call to irq_alloc_domain_generic_chips(), gc::irq_base contains not the base Linux interrupt number. It contains the base hardware interrupt for this chip. It is set to 0 for the first chip in the domain, 0 + N for the next chip, where $N is the number of hardware interrupts per chip. That means the Linux interrupt number cannot be calculated based on gc::irq_base for irqdomain based chips without a domain map lookup, which is currently missing. Rework the code to take the irqdomain case into account and calculate the Linux interrupt number by a irqdomain lookup of the domain specific hardware interrupt number. [ tglx: Massage changelog. Reshuffle the logic and add a proper comment. ] Fixes: cfefd21e693d ("genirq: Add chip suspend and resume callbacks") Signed-off-by: Herve Codina <herve.codina@bootlin.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20231024150335.322282-1-herve.codina@bootlin.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-20genirq/matrix: Exclude managed interrupts in irq_matrix_allocated()Chen Yu
[ Upstream commit a0b0bad10587ae2948a7c36ca4ffc206007fbcf3 ] When a CPU is about to be offlined, x86 validates that all active interrupts which are targeted to this CPU can be migrated to the remaining online CPUs. If not, the offline operation is aborted. The validation uses irq_matrix_allocated() to retrieve the number of vectors which are allocated on the outgoing CPU. The returned number of allocated vectors includes also vectors which are associated to managed interrupts. That's overaccounting because managed interrupts are: - not migrated when the affinity mask of the interrupt targets only the outgoing CPU - migrated to another CPU, but in that case the vector is already pre-allocated on the potential target CPUs and must not be taken into account. As a consequence the check whether the remaining online CPUs have enough capacity for migrating the allocated vectors from the outgoing CPU might fail incorrectly. Let irq_matrix_allocated() return only the number of allocated non-managed interrupts to make this validation check correct. [ tglx: Amend changelog and fixup kernel-doc comment ] Fixes: 2f75d9e1c905 ("genirq: Implement bitmap matrix allocator") Reported-by: Wendy Wang <wendy.wang@intel.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20231020072522.557846-1-yu.c.chen@intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-17genirq: Add IRQF_NO_AUTOEN for request_irq/nmi()Barry Song
[ Upstream commit cbe16f35bee6880becca6f20d2ebf6b457148552 ] Many drivers don't want interrupts enabled automatically via request_irq(). So they are handling this issue by either way of the below two: (1) irq_set_status_flags(irq, IRQ_NOAUTOEN); request_irq(dev, irq...); (2) request_irq(dev, irq...); disable_irq(irq); The code in the second way is silly and unsafe. In the small time gap between request_irq() and disable_irq(), interrupts can still come. The code in the first way is safe though it's subobtimal. Add a new IRQF_NO_AUTOEN flag which can be handed in by drivers to request_irq() and request_nmi(). It prevents the automatic enabling of the requested interrupt/nmi in the same safe way as #1 above. With that the various usage sites of #1 and #2 above can be simplified and corrected. Signed-off-by: Barry Song <song.bao.hua@hisilicon.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: dmitry.torokhov@gmail.com Link: https://lore.kernel.org/r/20210302224916.13980-2-song.bao.hua@hisilicon.com Stable-dep-of: 39db65a0a17b ("ASoC: es8316: Handle optional IRQ assignment") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-20irqdomain: Fix mapping-creation raceJohan Hovold
[ Upstream commit 601363cc08da25747feb87c55573dd54de91d66a ] Parallel probing of devices that share interrupts (e.g. when a driver uses asynchronous probing) can currently result in two mappings for the same hardware interrupt to be created due to missing serialisation. Make sure to hold the irq_domain_mutex when creating mappings so that looking for an existing mapping before creating a new one is done atomically. Fixes: 765230b5f084 ("driver-core: add asynchronous probing support for drivers") Fixes: b62b2cf5759b ("irqdomain: Fix handling of type settings for existing mappings") Link: https://lore.kernel.org/r/YuJXMHoT4ijUxnRb@hovoldconsulting.com Cc: stable@vger.kernel.org # 4.8 Cc: Dmitry Torokhov <dtor@chromium.org> Cc: Jon Hunter <jonathanh@nvidia.com> Tested-by: Hsin-Yi Wang <hsinyi@chromium.org> Tested-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com> Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230213104302.17307-7-johan+linaro@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-20irqdomain: Refactor __irq_domain_alloc_irqs()Johan Hovold
[ Upstream commit d55f7f4c58c07beb5050a834bf57ae2ede599c7e ] Refactor __irq_domain_alloc_irqs() so that it can be called internally while holding the irq_domain_mutex. This will be used to fix a shared-interrupt mapping race, hence the Fixes tag. Fixes: b62b2cf5759b ("irqdomain: Fix handling of type settings for existing mappings") Cc: stable@vger.kernel.org # 4.8 Tested-by: Hsin-Yi Wang <hsinyi@chromium.org> Tested-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com> Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230213104302.17307-6-johan+linaro@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-20irqdomain: Look for existing mapping only onceJohan Hovold
commit 6e6f75c9c98d2d246d90411ff2b6f0cd271f4cba upstream. Avoid looking for an existing mapping twice when creating a new mapping using irq_create_fwspec_mapping() by factoring out the actual allocation which is shared with irq_create_mapping_affinity(). The new helper function will also be used to fix a shared-interrupt mapping race, hence the Fixes tag. Fixes: b62b2cf5759b ("irqdomain: Fix handling of type settings for existing mappings") Cc: stable@vger.kernel.org # 4.8 Tested-by: Hsin-Yi Wang <hsinyi@chromium.org> Tested-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com> Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230213104302.17307-5-johan+linaro@kernel.org Signed-off-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-20treewide: Replace DECLARE_TASKLET() with DECLARE_TASKLET_OLD()Kees Cook
[ Upstream commit b13fecb1c3a603c4b8e99b306fecf4f668c11b32 ] This converts all the existing DECLARE_TASKLET() (and ...DISABLED) macros with DECLARE_TASKLET_OLD() in preparation for refactoring the tasklet callback type. All existing DECLARE_TASKLET() users had a "0" data argument, it has been removed here as well. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Kees Cook <keescook@chromium.org> Stable-dep-of: 1fdeb8b9f29d ("wifi: iwl3945: Add missing check for create_singlethread_workqueue") Signed-off-by: Sasha Levin <sashal@kernel.org> [Tom: fix backport to 5.4.y] AUTOSEL backport to 5.4.y of: b13fecb1c3a6 ("treewide: Replace DECLARE_TASKLET() with DECLARE_TASKLET_OLD()") changed all locations of DECLARE_TASKLET with DECLARE_TASKLET_OLD, except one, in arch/mips/lasat/pcivue_proc.c. This is due to: 10760dde9be3 ("MIPS: Remove support for LASAT") preceeding b13fecb1c3a6 ("treewide: Replace DECLARE_TASKLET() with DECLARE_TASKLET_OLD()") upstream and the former not being present in 5.4.y. Fix this by changing DECLARE_TASKLET to DECLARE_TASKLET_OLD in arch/mips/lasat/pcivue_proc.c. Fixes: 5de7a4254eb2 ("treewide: Replace DECLARE_TASKLET() with DECLARE_TASKLET_OLD()") Reported-by: "kernelci.org bot" <bot@kernelci.org> Signed-off-by: Tom Saeger <tom.saeger@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-20Revert "treewide: Replace DECLARE_TASKLET() with DECLARE_TASKLET_OLD()"Tom Saeger
This reverts commit 5de7a4254eb2d501cbb59918a152665b29c02109 which caused mips build failures. kernelci.org bot reports: arch/mips/lasat/picvue_proc.c:87:20: error: ‘pvc_display_tasklet’ undeclared (first use in this function) arch/mips/lasat/picvue_proc.c:42:44: error: expected ‘)’ before ‘&’ token arch/mips/lasat/picvue_proc.c:33:13: error: ‘pvc_display’ defined but not used [-Werror=unused-function] Link: https://lore.kernel.org/stable/64041dda.170a0220.8cc25.79c9@mx.google.com/ Reported-by: "kernelci.org bot" <bot@kernelci.org> Signed-off-by: Tom Saeger <tom.saeger@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-17irqdomain: Fix domain registration raceMarc Zyngier
[ Upstream commit 8932c32c3053accd50702b36e944ac2016cd103c ] Hierarchical domains created using irq_domain_create_hierarchy() are currently added to the domain list before having been fully initialised. This specifically means that a racing allocation request might fail to allocate irq data for the inner domains of a hierarchy in case the parent domain pointer has not yet been set up. Note that this is not really any issue for irqchip drivers that are registered early (e.g. via IRQCHIP_DECLARE() or IRQCHIP_ACPI_DECLARE()) but could potentially cause trouble with drivers that are registered later (e.g. modular drivers using IRQCHIP_PLATFORM_DRIVER_BEGIN(), gpiochip drivers, etc.). Fixes: afb7da83b9f4 ("irqdomain: Introduce helper function irq_domain_add_hierarchy()") Cc: stable@vger.kernel.org # 3.19 Signed-off-by: Marc Zyngier <maz@kernel.org> [ johan: add commit message ] Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230213104302.17307-8-johan+linaro@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-17irqdomain: Change the type of 'size' in __irq_domain_add() to be consistentBixuan Cui
[ Upstream commit 20c36ce2164f1774b487d443ece99b754bc6ad43 ] The 'size' is used in struct_size(domain, revmap, size) and its input parameter type is 'size_t'(unsigned int). Changing the size to 'unsigned int' to make the type consistent. Signed-off-by: Bixuan Cui <cuibixuan@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210916025203.44841-1-cuibixuan@huawei.com Stable-dep-of: 8932c32c3053 ("irqdomain: Fix domain registration race") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-11irqdomain: Drop bogus fwspec-mapping error handlingJohan Hovold
commit e3b7ab025e931accdc2c12acf9b75c6197f1c062 upstream. In case a newly allocated IRQ ever ends up not having any associated struct irq_data it would not even be possible to dispose the mapping. Replace the bogus disposal with a WARN_ON(). This will also be used to fix a shared-interrupt mapping race, hence the CC-stable tag. Fixes: 1e2a7d78499e ("irqdomain: Don't set type when mapping an IRQ") Cc: stable@vger.kernel.org # 4.8 Tested-by: Hsin-Yi Wang <hsinyi@chromium.org> Tested-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com> Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230213104302.17307-4-johan+linaro@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-11irqdomain: Fix disassociation raceJohan Hovold
commit 3f883c38f5628f46b30bccf090faec054088e262 upstream. The global irq_domain_mutex is held when mapping interrupts from non-hierarchical domains but currently not when disposing them. This specifically means that updates of the domain mapcount is racy (currently only used for statistics in debugfs). Make sure to hold the global irq_domain_mutex also when disposing mappings from non-hierarchical domains. Fixes: 9dc6be3d4193 ("genirq/irqdomain: Add map counter") Cc: stable@vger.kernel.org # 4.13 Tested-by: Hsin-Yi Wang <hsinyi@chromium.org> Tested-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com> Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230213104302.17307-3-johan+linaro@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-11irqdomain: Fix association raceJohan Hovold
commit b06730a571a9ff1ba5bd6b20bf9e50e5a12f1ec6 upstream. The sanity check for an already mapped virq is done outside of the irq_domain_mutex-protected section which means that an (unlikely) racing association may not be detected. Fix this by factoring out the association implementation, which will also be used in a follow-on change to fix a shared-interrupt mapping race. Fixes: ddaf144c61da ("irqdomain: Refactor irq_domain_associate_many()") Cc: stable@vger.kernel.org # 3.11 Tested-by: Hsin-Yi Wang <hsinyi@chromium.org> Tested-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com> Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230213104302.17307-2-johan+linaro@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-11treewide: Replace DECLARE_TASKLET() with DECLARE_TASKLET_OLD()Kees Cook
[ Upstream commit b13fecb1c3a603c4b8e99b306fecf4f668c11b32 ] This converts all the existing DECLARE_TASKLET() (and ...DISABLED) macros with DECLARE_TASKLET_OLD() in preparation for refactoring the tasklet callback type. All existing DECLARE_TASKLET() users had a "0" data argument, it has been removed here as well. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Kees Cook <keescook@chromium.org> Stable-dep-of: 1fdeb8b9f29d ("wifi: iwl3945: Add missing check for create_singlethread_workqueue") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-18genirq/irqdesc: Don't try to remove non-existing sysfs filesYang Yingliang
[ Upstream commit 9049e1ca41983ab773d7ea244bee86d7835ec9f5 ] Fault injection tests trigger warnings like this: kernfs: can not remove 'chip_name', no directory WARNING: CPU: 0 PID: 253 at fs/kernfs/dir.c:1616 kernfs_remove_by_name_ns+0xce/0xe0 RIP: 0010:kernfs_remove_by_name_ns+0xce/0xe0 Call Trace: <TASK> remove_files.isra.1+0x3f/0xb0 sysfs_remove_group+0x68/0xe0 sysfs_remove_groups+0x41/0x70 __kobject_del+0x45/0xc0 kobject_del+0x29/0x40 free_desc+0x42/0x70 irq_free_descs+0x5e/0x90 The reason is that the interrupt descriptor sysfs handling does not roll back on a failing kobject_add() during allocation. If the descriptor is freed later on, kobject_del() is invoked with a not added kobject resulting in the above warnings. A proper rollback in case of a kobject_add() failure would be the straight forward solution. But this is not possible due to the way how interrupt descriptor sysfs handling works. Interrupt descriptors are allocated before sysfs becomes available. So the sysfs files for the early allocated descriptors are added later in the boot process. At this point there can be nothing useful done about a failing kobject_add(). For consistency the interrupt descriptor allocation always treats kobject_add() failures as non-critical and just emits a warning. To solve this problem, keep track in the interrupt descriptor whether kobject_add() was successful or not and make the invocation of kobject_del() conditional on that. [ tglx: Massage changelog, comments and use a state bit. ] Fixes: ecb3f394c5db ("genirq: Expose interrupt information through sysfs") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20221128151612.1786122-1-yangyingliang@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-25genirq: Don't return error on missing optional irq_request_resources()Antonio Borneo
[ Upstream commit 95001b756467ecc9f5973eb5e74e97699d9bbdf1 ] Function irq_chip::irq_request_resources() is reported as optional in the declaration of struct irq_chip. If the parent irq_chip does not implement it, we should ignore it and return. Don't return error if the functions is missing. Signed-off-by: Antonio Borneo <antonio.borneo@foss.st.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220512160544.13561-1-antonio.borneo@foss.st.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-22random: remove unused irq_flags argument from add_interrupt_randomness()Sebastian Andrzej Siewior
commit 703f7066f40599c290babdb79dd61319264987e9 upstream. Since commit ee3e00e9e7101 ("random: use registers from interrupted code for CPU's w/o a cycle counter") the irq_flags argument is no longer used. Remove unused irq_flags. Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dexuan Cui <decui@microsoft.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Liu <wei.liu@kernel.org> Cc: linux-hyperv@vger.kernel.org Cc: x86@kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Wei Liu <wei.liu@kernel.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-12genirq: Synchronize interrupt thread startupThomas Pfaff
commit 8707898e22fd665bc1d7b18b809be4b56ce25bdd upstream. A kernel hang can be observed when running setserial in a loop on a kernel with force threaded interrupts. The sequence of events is: setserial open("/dev/ttyXXX") request_irq() do_stuff() -> serial interrupt -> wake(irq_thread) desc->threads_active++; close() free_irq() kthread_stop(irq_thread) synchronize_irq() <- hangs because desc->threads_active != 0 The thread is created in request_irq() and woken up, but does not get on a CPU to reach the actual thread function, which would handle the pending wake-up. kthread_stop() sets the should stop condition which makes the thread immediately exit, which in turn leaves the stale threads_active count around. This problem was introduced with commit 519cc8652b3a, which addressed a interrupt sharing issue in the PCIe code. Before that commit free_irq() invoked synchronize_irq(), which waits for the hard interrupt handler and also for associated threads to complete. To address the PCIe issue synchronize_irq() was replaced with __synchronize_hardirq(), which only waits for the hard interrupt handler to complete, but not for threaded handlers. This was done under the assumption, that the interrupt thread already reached the thread function and waits for a wake-up, which is guaranteed to be handled before acting on the stop condition. The problematic case, that the thread would not reach the thread function, was obviously overlooked. Make sure that the interrupt thread is really started and reaches thread_fn() before returning from __setup_irq(). This utilizes the existing wait queue in the interrupt descriptor. The wait queue is unused for non-shared interrupts. For shared interrupts the usage might cause a spurious wake-up of a waiter in synchronize_irq() or the completion of a threaded handler might cause a spurious wake-up of the waiter for the ready flag. Both are harmless and have no functional impact. [ tglx: Amended changelog ] Fixes: 519cc8652b3a ("genirq: Synchronize only with single thread on free_irq()") Signed-off-by: Thomas Pfaff <tpfaff@pcs.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/552fe7b4-9224-b183-bb87-a8f36d335690@pcs.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20genirq/affinity: Consider that CPUs on nodes can be unbalancedRei Yamamoto
commit 08d835dff916bfe8f45acc7b92c7af6c4081c8a7 upstream. If CPUs on a node are offline at boot time, the number of nodes is different when building affinity masks for present cpus and when building affinity masks for possible cpus. This causes the following problem: In the case that the number of vectors is less than the number of nodes there are cases where bits of masks for present cpus are overwritten when building masks for possible cpus. Fix this by excluding CPUs, which are not part of the current build mask (present/possible). [ tglx: Massaged changelog and added comment ] Fixes: b82592199032 ("genirq/affinity: Spread IRQs to all available NUMA nodes") Signed-off-by: Rei Yamamoto <yamamoto.rei@jp.fujitsu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220331003309.10891-1-yamamoto.rei@jp.fujitsu.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-15genirq/timings: Fix error return code in irq_timings_test_irqs()Zhen Lei
[ Upstream commit 290fdc4b7ef14e33d0e30058042b0e9bfd02b89b ] Return a negative error code from the error handling case instead of 0, as done elsewhere in this function. Fixes: f52da98d900e ("genirq/timings: Add selftest for irqs circular buffer") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210811093333.2376-1-thunder.leizhen@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-08-18genirq/timings: Prevent potential array overflow in __irq_timings_store()Ben Dai
commit b9cc7d8a4656a6e815852c27ab50365009cb69c1 upstream. When the interrupt interval is greater than 2 ^ PREDICTION_BUFFER_SIZE * PREDICTION_FACTOR us and less than 1s, the calculated index will be greater than the length of irqs->ema_time[]. Check the calculated index before using it to prevent array overflow. Fixes: 23aa3b9a6b7d ("genirq/timings: Encapsulate storing function") Signed-off-by: Ben Dai <ben.dai@unisoc.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210425150903.25456-1-ben.dai9703@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-08-18genirq/msi: Ensure deactivation on teardownBixuan Cui
commit dbbc93576e03fbe24b365fab0e901eb442237a8a upstream. msi_domain_alloc_irqs() invokes irq_domain_activate_irq(), but msi_domain_free_irqs() does not enforce deactivation before tearing down the interrupts. This happens when PCI/MSI interrupts are set up and never used before being torn down again, e.g. in error handling pathes. The only place which cleans that up is the error handling path in msi_domain_alloc_irqs(). Move the cleanup from msi_domain_alloc_irqs() into msi_domain_free_irqs() to cure that. Fixes: f3b0946d629c ("genirq/msi: Make sure PCI MSIs are activated early") Signed-off-by: Bixuan Cui <cuibixuan@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210518033117.78104-1-cuibixuan@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-08-18genirq: Provide IRQCHIP_AFFINITY_PRE_STARTUPThomas Gleixner
commit 826da771291fc25a428e871f9e7fb465e390f852 upstream. X86 IO/APIC and MSI interrupts (when used without interrupts remapping) require that the affinity setup on startup is done before the interrupt is enabled for the first time as the non-remapped operation mode cannot safely migrate enabled interrupts from arbitrary contexts. Provide a new irq chip flag which allows affected hardware to request this. This has to be opt-in because there have been reports in the past that some interrupt chips cannot handle affinity setting before startup. Fixes: 18404756765c ("genirq: Expose default irq affinity mask (take 3)") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210729222542.779791738@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-11genirq/matrix: Prevent allocation counter corruptionVitaly Kuznetsov
[ Upstream commit c93a5e20c3c2dabef8ea360a3d3f18c6f68233ab ] When irq_matrix_free() is called for an unallocated vector the managed_allocated and total_allocated counters get out of sync with the real state of the matrix. Later, when the last interrupt is freed, these counters will underflow resulting in UINTMAX because the counters are unsigned. While this is certainly a problem of the calling code, this can be catched in the allocator by checking the allocation bit for the to be freed vector which simplifies debugging. An example of the problem described above: https://lore.kernel.org/lkml/20210318192819.636943062@linutronix.de/ Add the missing sanity check and emit a warning when it triggers. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210319111823.1105248-1-vkuznets@redhat.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-24genirq: Disable interrupts for force threaded handlersThomas Gleixner
commit 81e2073c175b887398e5bca6c004efa89983f58d upstream. With interrupt force threading all device interrupt handlers are invoked from kernel threads. Contrary to hard interrupt context the invocation only disables bottom halfs, but not interrupts. This was an oversight back then because any code like this will have an issue: thread(irq_A) irq_handler(A) spin_lock(&foo->lock); interrupt(irq_B) irq_handler(B) spin_lock(&foo->lock); This has been triggered with networking (NAPI vs. hrtimers) and console drivers where printk() happens from an interrupt which interrupted the force threaded handler. Now people noticed and started to change the spin_lock() in the handler to spin_lock_irqsave() which affects performance or add IRQF_NOTHREAD to the interrupt request which in turn breaks RT. Fix the root cause and not the symptom and disable interrupts before invoking the force threaded handler which preserves the regular semantics and the usefulness of the interrupt force threading as a general debugging tool. For not RT this is not changing much, except that during the execution of the threaded handler interrupts are delayed until the handler returns. Vs. scheduling and softirq processing there is no difference. For RT kernels there is no issue. Fixes: 8d32a307e4fa ("genirq: Provide forced interrupt threading") Reported-by: Johan Hovold <johan@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Johan Hovold <johan@kernel.org> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lore.kernel.org/r/20210317143859.513307808@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-10genirq/msi: Activate Multi-MSI early when MSI_FLAG_ACTIVATE_EARLY is setMarc Zyngier
commit 4c457e8cb75eda91906a4f89fc39bde3f9a43922 upstream. When MSI_FLAG_ACTIVATE_EARLY is set (which is the case for PCI), __msi_domain_alloc_irqs() performs the activation of the interrupt (which in the case of PCI results in the endpoint being programmed) as soon as the interrupt is allocated. But it appears that this is only done for the first vector, introducing an inconsistent behaviour for PCI Multi-MSI. Fix it by iterating over the number of vectors allocated to each MSI descriptor. This is easily achieved by introducing a new "for_each_msi_vector" iterator, together with a tiny bit of refactoring. Fixes: f3b0946d629c ("genirq/msi: Make sure PCI MSIs are activated early") Reported-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210123122759.1781359-1-maz@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-30genirq/irqdomain: Don't try to free an interrupt that has no mappingMarc Zyngier
[ Upstream commit 4615fbc3788ddc8e7c6d697714ad35a53729aa2c ] When an interrupt allocation fails for N interrupts, it is pretty common for the error handling code to free the same number of interrupts, no matter how many interrupts have actually been allocated. This may result in the domain freeing code to be unexpectedly called for interrupts that have no mapping in that domain. Things end pretty badly. Instead, add some checks to irq_domain_free_irqs_hierarchy() to make sure that thiss does not follow the hierarchy if no mapping exists for a given interrupt. Fixes: 6a6544e520abe ("genirq/irqdomain: Remove auto-recursive hierarchy support") Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20201129135551.396777-1-maz@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-12-11genirq/irqdomain: Add an irq_create_mapping_affinity() functionLaurent Vivier
commit bb4c6910c8b41623104c2e64a30615682689a54d upstream. There is currently no way to convey the affinity of an interrupt via irq_create_mapping(), which creates issues for devices that expect that affinity to be managed by the kernel. In order to sort this out, rename irq_create_mapping() to irq_create_mapping_affinity() with an additional affinity parameter that can be passed down to irq_domain_alloc_descs(). irq_create_mapping() is re-implemented as a wrapper around irq_create_mapping_affinity(). No functional change. Fixes: e75eafb9b039 ("genirq/msi: Switch to new irq spreading infrastructure") Signed-off-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Greg Kurz <groug@kaod.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20201126082852.1178497-2-lvivier@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-18genirq: Let GENERIC_IRQ_IPI select IRQ_DOMAIN_HIERARCHYMarc Zyngier
[ Upstream commit 151a535171be6ff824a0a3875553ea38570f4c05 ] kernel/irq/ipi.c otherwise fails to compile if nothing else selects it. Fixes: 379b656446a3 ("genirq: Add GENERIC_IRQ_IPI Kconfig symbol") Reported-by: Pavel Machek <pavel@ucw.cz> Tested-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20201015101222.GA32747@amd Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-09-03genirq/matrix: Deal with the sillyness of for_each_cpu() on UPThomas Gleixner
commit 784a0830377d0761834e385975bc46861fea9fa0 upstream. Most of the CPU mask operations behave the same way, but for_each_cpu() and it's variants ignore the cpumask argument and claim that CPU0 is always in the mask. This is historical, inconsistent and annoying behaviour. The matrix allocator uses for_each_cpu() and can be called on UP with an empty cpumask. The calling code does not expect that this succeeds but until commit e027fffff799 ("x86/irq: Unbreak interrupt affinity setting") this went unnoticed. That commit added a WARN_ON() to catch cases which move an interrupt from one vector to another on the same CPU. The warning triggers on UP. Add a check for the cpumask being empty to prevent this. Fixes: 2f75d9e1c905 ("genirq: Implement bitmap matrix allocator") Reported-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-21genirq/PM: Always unlock IRQ descriptor in rearm_wake_irq()Guenter Roeck
commit e27b1636e9337d1a1d174b191e53d0f86421a822 upstream. rearm_wake_irq() does not unlock the irq descriptor if the interrupt is not suspended or if wakeup is not enabled on it. Restucture the exit conditions so the unlock is always ensured. Fixes: 3a79bc63d9075 ("PCI: irq: Introduce rearm_wake_irq()") Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200811180001.80203-1-linux@roeck-us.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-21genirq/affinity: Make affinity setting if activated opt-inThomas Gleixner
commit f0c7baca180046824e07fc5f1326e83a8fd150c7 upstream. John reported that on a RK3288 system the perf per CPU interrupts are all affine to CPU0 and provided the analysis: "It looks like what happens is that because the interrupts are not per-CPU in the hardware, armpmu_request_irq() calls irq_force_affinity() while the interrupt is deactivated and then request_irq() with IRQF_PERCPU | IRQF_NOBALANCING. Now when irq_startup() runs with IRQ_STARTUP_NORMAL, it calls irq_setup_affinity() which returns early because IRQF_PERCPU and IRQF_NOBALANCING are set, leaving the interrupt on its original CPU." This was broken by the recent commit which blocked interrupt affinity setting in hardware before activation of the interrupt. While this works in general, it does not work for this particular case. As contrary to the initial analysis not all interrupt chip drivers implement an activate callback, the safe cure is to make the deferred interrupt affinity setting at activation time opt-in. Implement the necessary core logic and make the two irqchip implementations for which this is required opt-in. In hindsight this would have been the right thing to do, but ... Fixes: baedb87d1b53 ("genirq/affinity: Handle affinity setting on inactive interrupts correctly") Reported-by: John Keeping <john@metanate.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marc Zyngier <maz@kernel.org> Acked-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/87blk4tzgm.fsf@nanos.tec.linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-22genirq/affinity: Handle affinity setting on inactive interrupts correctlyThomas Gleixner
commit baedb87d1b53532f81b4bd0387f83b05d4f7eb9a upstream. Setting interrupt affinity on inactive interrupts is inconsistent when hierarchical irq domains are enabled. The core code should just store the affinity and not call into the irq chip driver for inactive interrupts because the chip drivers may not be in a state to handle such requests. X86 has a hacky workaround for that but all other irq chips have not which causes problems e.g. on GIC V3 ITS. Instead of adding more ugly hacks all over the place, solve the problem in the core code. If the affinity is set on an inactive interrupt then: - Store it in the irq descriptors affinity mask - Update the effective affinity to reflect that so user space has a consistent view - Don't call into the irq chip driver This is the core equivalent of the X86 workaround and works correctly because the affinity setting is established in the irq chip when the interrupt is activated later on. Note, that this is only effective when hierarchical irq domains are enabled by the architecture. Doing it unconditionally would break legacy irq chip implementations. For hierarchial irq domains this works correctly as none of the drivers can have a dependency on affinity setting in inactive state by design. Remove the X86 workaround as it is not longer required. Fixes: 02edee152d6e ("x86/apic/vector: Ignore set_affinity call for inactive interrupts") Reported-by: Ali Saidi <alisaidi@amazon.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Ali Saidi <alisaidi@amazon.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200529015501.15771-1-alisaidi@amazon.com Link: https://lkml.kernel.org/r/877dv2rv25.fsf@nanos.tec.linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-17genirq/debugfs: Add missing sanity checks to interrupt injectionThomas Gleixner
commit a740a423c36932695b01a3e920f697bc55b05fec upstream. Interrupts cannot be injected when the interrupt is not activated and when a replay is already in progress. Fixes: 536e2e34bd00 ("genirq/debugfs: Triggering of interrupts from userspace") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20200306130623.500019114@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-17genirq/irqdomain: Check pointer in irq_domain_alloc_irqs_hierarchy()Alexander Sverdlin
[ Upstream commit 87f2d1c662fa1761359fdf558246f97e484d177a ] irq_domain_alloc_irqs_hierarchy() has 3 call sites in the compilation unit but only one of them checks for the pointer which is being dereferenced inside the called function. Move the check into the function. This allows for catching the error instead of the following crash: Unable to handle kernel NULL pointer dereference at virtual address 00000000 PC is at 0x0 LR is at gpiochip_hierarchy_irq_domain_alloc+0x11f/0x140 ... [<c06c23ff>] (gpiochip_hierarchy_irq_domain_alloc) [<c0462a89>] (__irq_domain_alloc_irqs) [<c0462dad>] (irq_create_fwspec_mapping) [<c06c2251>] (gpiochip_to_irq) [<c06c1c9b>] (gpiod_to_irq) [<bf973073>] (gpio_irqs_init [gpio_irqs]) [<bf974048>] (gpio_irqs_exit+0xecc/0xe84 [gpio_irqs]) Code: bad PC value Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200306174720.82604-1-alexander.sverdlin@nokia.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-01genirq: Fix reference leaks on irq affinity notifiersEdward Cree
commit df81dfcfd6991d547653d46c051bac195cd182c1 upstream. The handling of notify->work did not properly maintain notify->kref in two cases: 1) where the work was already scheduled, another irq_set_affinity_locked() would get the ref and (no-op-ly) schedule the work. Thus when irq_affinity_notify() ran, it would drop the original ref but not the additional one. 2) when cancelling the (old) work in irq_set_affinity_notifier(), if there was outstanding work a ref had been got for it but was never put. Fix both by checking the return values of the work handling functions (schedule_work() for (1) and cancel_work_sync() for (2)) and put the extra ref if the return value indicates preexisting work. Fixes: cd7eab44e994 ("genirq: Add IRQ affinity notifiers") Fixes: 59c39840f5ab ("genirq: Prevent use-after-free and work list corruption") Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ben Hutchings <ben@decadent.org.uk> Link: https://lkml.kernel.org/r/24f5983f-2ab5-e83a-44ee-a45b5f9300f5@solarflare.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-28genirq/proc: Reject invalid affinity masks (again)Thomas Gleixner
commit cba6437a1854fde5934098ec3bd0ee83af3129f5 upstream. Qian Cai reported that the WARN_ON() in the x86/msi affinity setting code, which catches cases where the affinity setting is not done on the CPU which is the current target of the interrupt, triggers during CPU hotplug stress testing. It turns out that the warning which was added with the commit addressing the MSI affinity race unearthed yet another long standing bug. If user space writes a bogus affinity mask, i.e. it contains no online CPUs, then it calls irq_select_affinity_usr(). This was introduced for ALPHA in eee45269b0f5 ("[PATCH] Alpha: convert to generic irq framework (generic part)") and subsequently made available for all architectures in 18404756765c ("genirq: Expose default irq affinity mask (take 3)") which introduced the circumvention of the affinity setting restrictions for interrupt which cannot be moved in process context. The whole exercise is bogus in various aspects: 1) If the interrupt is already started up then there is absolutely no point to honour a bogus interrupt affinity setting from user space. The interrupt is already assigned to an online CPU and it does not make any sense to reassign it to some other randomly chosen online CPU. 2) If the interupt is not yet started up then there is no point either. A subsequent startup of the interrupt will invoke irq_setup_affinity() anyway which will chose a valid target CPU. So the only correct solution is to just return -EINVAL in case user space wrote an affinity mask which does not contain any online CPUs, except for ALPHA which has it's own magic sauce for this. Fixes: 18404756765c ("genirq: Expose default irq affinity mask (take 3)") Reported-by: Qian Cai <cai@lca.pw> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Qian Cai <cai@lca.pw> Link: https://lkml.kernel.org/r/878sl8xdbm.fsf@nanos.tec.linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11x86/apic/msi: Plug non-maskable MSI affinity raceThomas Gleixner
commit 6f1a4891a5928a5969c87fa5a584844c983ec823 upstream. Evan tracked down a subtle race between the update of the MSI message and the device raising an interrupt internally on PCI devices which do not support MSI masking. The update of the MSI message is non-atomic and consists of either 2 or 3 sequential 32bit wide writes to the PCI config space. - Write address low 32bits - Write address high 32bits (If supported by device) - Write data When an interrupt is migrated then both address and data might change, so the kernel attempts to mask the MSI interrupt first. But for MSI masking is optional, so there exist devices which do not provide it. That means that if the device raises an interrupt internally between the writes then a MSI message is sent built from half updated state. On x86 this can lead to spurious interrupts on the wrong interrupt vector when the affinity setting changes both address and data. As a consequence the device interrupt can be lost causing the device to become stuck or malfunctioning. Evan tried to handle that by disabling MSI accross an MSI message update. That's not feasible because disabling MSI has issues on its own: If MSI is disabled the PCI device is routing an interrupt to the legacy INTx mechanism. The INTx delivery can be disabled, but the disablement is not working on all devices. Some devices lose interrupts when both MSI and INTx delivery are disabled. Another way to solve this would be to enforce the allocation of the same vector on all CPUs in the system for this kind of screwed devices. That could be done, but it would bring back the vector space exhaustion problems which got solved a few years ago. Fortunately the high address (if supported by the device) is only relevant when X2APIC is enabled which implies interrupt remapping. In the interrupt remapping case the affinity setting is happening at the interrupt remapping unit and the PCI MSI message is programmed only once when the PCI device is initialized. That makes it possible to solve it with a two step update: 1) Target the MSI msg to the new vector on the current target CPU 2) Target the MSI msg to the new vector on the new target CPU In both cases writing the MSI message is only changing a single 32bit word which prevents the issue of inconsistency. After writing the final destination it is necessary to check whether the device issued an interrupt while the intermediate state #1 (new vector, current CPU) was in effect. This is possible because the affinity change is always happening on the current target CPU. The code runs with interrupts disabled, so the interrupt can be detected by checking the IRR of the local APIC. If the vector is pending in the IRR then the interrupt is retriggered on the new target CPU by sending an IPI for the associated vector on the target CPU. This can cause spurious interrupts on both the local and the new target CPU. 1) If the new vector is not in use on the local CPU and the device affected by the affinity change raised an interrupt during the transitional state (step #1 above) then interrupt entry code will ignore that spurious interrupt. The vector is marked so that the 'No irq handler for vector' warning is supressed once. 2) If the new vector is in use already on the local CPU then the IRR check might see an pending interrupt from the device which is using this vector. The IPI to the new target CPU will then invoke the handler of the device, which got the affinity change, even if that device did not issue an interrupt 3) If the new vector is in use already on the local CPU and the device affected by the affinity change raised an interrupt during the transitional state (step #1 above) then the handler of the device which uses that vector on the local CPU will be invoked. expose issues in device driver interrupt handlers which are not prepared to handle a spurious interrupt correctly. This not a regression, it's just exposing something which was already broken as spurious interrupts can happen for a lot of reasons and all driver handlers need to be able to deal with them. Reported-by: Evan Green <evgreen@chromium.org> Debugged-by: Evan Green <evgreen@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Evan Green <evgreen@chromium.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/87imkr4s7n.fsf@nanos.tec.linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11irqdomain: Fix a memory leak in irq_domain_push_irq()Kevin Hao
commit 0f394daef89b38d58c91118a2b08b8a1b316703b upstream. Fix a memory leak reported by kmemleak: unreferenced object 0xffff000bc6f50e80 (size 128): comm "kworker/23:2", pid 201, jiffies 4294894947 (age 942.132s) hex dump (first 32 bytes): 00 00 00 00 41 00 00 00 86 c0 03 00 00 00 00 00 ....A........... 00 a0 b2 c6 0b 00 ff ff 40 51 fd 10 00 80 ff ff ........@Q...... backtrace: [<00000000e62d2240>] kmem_cache_alloc_trace+0x1a4/0x320 [<00000000279143c9>] irq_domain_push_irq+0x7c/0x188 [<00000000d9f4c154>] thunderx_gpio_probe+0x3ac/0x438 [<00000000fd09ec22>] pci_device_probe+0xe4/0x198 [<00000000d43eca75>] really_probe+0xdc/0x320 [<00000000d3ebab09>] driver_probe_device+0x5c/0xf0 [<000000005b3ecaa0>] __device_attach_driver+0x88/0xc0 [<000000004e5915f5>] bus_for_each_drv+0x7c/0xc8 [<0000000079d4db41>] __device_attach+0xe4/0x140 [<00000000883bbda9>] device_initial_probe+0x18/0x20 [<000000003be59ef6>] bus_probe_device+0x98/0xa0 [<0000000039b03d3f>] deferred_probe_work_func+0x74/0xa8 [<00000000870934ce>] process_one_work+0x1c8/0x470 [<00000000e3cce570>] worker_thread+0x1f8/0x428 [<000000005d64975e>] kthread+0xfc/0x128 [<00000000f0eaa764>] ret_from_fork+0x10/0x18 Fixes: 495c38d3001f ("irqdomain: Add irq_domain_{push,pop}_irq() functions") Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200120043547.22271-1-haokexin@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-05irq/irqdomain: Update __irq_domain_alloc_fwnode() function documentationYi Wang
A recent commit changed a parameter of __irq_domain_alloc_fwnode(), but did not update the documentation comment. Fix it up. Fixes: b977fcf477c1 ("irqdomain/debugfs: Use PAs to generate fwnode names") Signed-off-by: Yi Wang <wang.yi59@zte.com.cn> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1571476047-29463-1-git-send-email-wang.yi59@zte.com.cn
2019-09-17Merge tag 'pm-5.4-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "These include a rework of the main suspend-to-idle code flow (related to the handling of spurious wakeups), a switch over of several users of cpufreq notifiers to QoS-based limits, a new devfreq driver for Tegra20, a new cpuidle driver and governor for virtualized guests, an extension of the wakeup sources framework to expose wakeup sources as device objects in sysfs, and more. Specifics: - Rework the main suspend-to-idle control flow to avoid repeating "noirq" device resume and suspend operations in case of spurious wakeups from the ACPI EC and decouple the ACPI EC wakeups support from the LPS0 _DSM support (Rafael Wysocki). - Extend the wakeup sources framework to expose wakeup sources as device objects in sysfs (Tri Vo, Stephen Boyd). - Expose system suspend statistics in sysfs (Kalesh Singh). - Introduce a new haltpoll cpuidle driver and a new matching governor for virtualized guests wanting to do guest-side polling in the idle loop (Marcelo Tosatti, Joao Martins, Wanpeng Li, Stephen Rothwell). - Fix the menu and teo cpuidle governors to allow the scheduler tick to be stopped if PM QoS is used to limit the CPU idle state exit latency in some cases (Rafael Wysocki). - Increase the resolution of the play_idle() argument to microseconds for more fine-grained injection of CPU idle cycles (Daniel Lezcano). - Switch over some users of cpuidle notifiers to the new QoS-based frequency limits and drop the CPUFREQ_ADJUST and CPUFREQ_NOTIFY policy notifier events (Viresh Kumar). - Add new cpufreq driver based on nvmem for sun50i (Yangtao Li). - Add support for MT8183 and MT8516 to the mediatek cpufreq driver (Andrew-sh.Cheng, Fabien Parent). - Add i.MX8MN support to the imx-cpufreq-dt cpufreq driver (Anson Huang). - Add qcs404 to cpufreq-dt-platdev blacklist (Jorge Ramirez-Ortiz). - Update the qcom cpufreq driver (among other things, to make it easier to extend and to use kryo cpufreq for other nvmem-based SoCs) and add qcs404 support to it (Niklas Cassel, Douglas RAILLARD, Sibi Sankar, Sricharan R). - Fix assorted issues and make assorted minor improvements in the cpufreq code (Colin Ian King, Douglas RAILLARD, Florian Fainelli, Gustavo Silva, Hariprasad Kelam). - Add new devfreq driver for NVidia Tegra20 (Dmitry Osipenko, Arnd Bergmann). - Add new Exynos PPMU events to devfreq events and extend that mechanism (Lukasz Luba). - Fix and clean up the exynos-bus devfreq driver (Kamil Konieczny). - Improve devfreq documentation and governor code, fix spelling typos in devfreq (Ezequiel Garcia, Krzysztof Kozlowski, Leonard Crestez, MyungJoo Ham, Gaël PORTAY). - Add regulators enable and disable to the OPP (operating performance points) framework (Kamil Konieczny). - Update the OPP framework to support multiple opp-suspend properties (Anson Huang). - Fix assorted issues and make assorted minor improvements in the OPP code (Niklas Cassel, Viresh Kumar, Yue Hu). - Clean up the generic power domains (genpd) framework (Ulf Hansson). - Clean up assorted pieces of power management code and documentation (Akinobu Mita, Amit Kucheria, Chuhong Yuan). - Update the pm-graph tool to version 5.5 including multiple fixes and improvements (Todd Brandt). - Update the cpupower utility (Benjamin Weis, Geert Uytterhoeven, Sébastien Szymanski)" * tag 'pm-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (126 commits) cpuidle-haltpoll: Enable kvm guest polling when dedicated physical CPUs are available cpuidle-haltpoll: do not set an owner to allow modunload cpuidle-haltpoll: return -ENODEV on modinit failure cpuidle-haltpoll: set haltpoll as preferred governor cpuidle: allow governor switch on cpuidle_register_driver() PM: runtime: Documentation: add runtime_status ABI document pm-graph: make setVal unbuffered again for python2 and python3 powercap: idle_inject: Use higher resolution for idle injection cpuidle: play_idle: Increase the resolution to usec cpuidle-haltpoll: vcpu hotplug support cpufreq: Add qcs404 to cpufreq-dt-platdev blacklist cpufreq: qcom: Add support for qcs404 on nvmem driver cpufreq: qcom: Refactor the driver to make it easier to extend cpufreq: qcom: Re-organise kryo cpufreq to use it for other nvmem based qcom socs dt-bindings: opp: Add qcom-opp bindings with properties needed for CPR dt-bindings: opp: qcom-nvmem: Support pstates provided by a power domain Documentation: cpufreq: Update policy notifier documentation cpufreq: Remove CPUFREQ_ADJUST and CPUFREQ_NOTIFY policy notifier events PM / Domains: Verify PM domain type in dev_pm_genpd_set_performance_state() PM / Domains: Simplify genpd_lookup_dev() ...
2019-09-17Merge branch 'irq-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core irq updates from Thomas Gleixner: "Updates from the irq departement: - Update the interrupt spreading code so it handles numa node with different CPU counts properly. - A large overhaul of the ARM GiCv3 driver to support new PPI and SPI ranges. - Conversion of all alloc_fwnode() users to use physical addresses instead of virtual addresses so the virtual addresses are not leaked. The physical address is sufficient to identify the associated interrupt chip. - Add support for Marvel MMP3, Amlogic Meson SM1 interrupt chips. - Enforce interrupt threading at compile time if RT is enabled. - Small updates and improvements all over the place" * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits) irqchip/gic-v3-its: Fix LPI release for Multi-MSI devices irqchip/uniphier-aidet: Use devm_platform_ioremap_resource() irqdomain: Add the missing assignment of domain->fwnode for named fwnode irqchip/mmp: Coexist with GIC root IRQ controller irqchip/mmp: Mask off interrupts from other cores irqchip/mmp: Add missing chained_irq_{enter,exit}() irqchip/mmp: Do not use of_address_to_resource() to get mux regs irqchip/meson-gpio: Add support for meson sm1 SoCs dt-bindings: interrupt-controller: New binding for the meson sm1 SoCs genirq/affinity: Remove const qualifier from node_to_cpumask argument genirq/affinity: Spread vectors on node according to nr_cpu ratio genirq/affinity: Improve __irq_build_affinity_masks() irqchip: Remove dev_err() usage after platform_get_irq() irqchip: Add include guard to irq-partition-percpu.h irqchip/mmp: Do not call irq_set_default_host() on DT platforms irqchip/gic-v3-its: Remove the redundant set_bit for lpi_map irqchip/gic-v3: Add quirks for HIP06/07 invalid GICD_TYPER erratum 161010803 irqchip/gic: Skip DT quirks when evaluating IIDR-based quirks irqchip/gic-v3: Warn about inconsistent implementations of extended ranges irqchip/gic-v3: Add EPPI range support ...
2019-09-17Merge branch 'pm-sleep'Rafael J. Wysocki
* pm-sleep: (29 commits) ACPI: PM: s2idle: Always set up EC GPE for system wakeup ACPI: PM: s2idle: Avoid rearming SCI for wakeup unnecessarily PM / wakeup: Unexport wakeup_source_sysfs_{add,remove}() PM / wakeup: Register wakeup class kobj after device is added PM / wakeup: Fix sysfs registration error path PM / wakeup: Show wakeup sources stats in sysfs PM / wakeup: Use wakeup_source_register() in wakelock.c PM / wakeup: Drop wakeup_source_init(), wakeup_source_prepare() PM: sleep: Replace strncmp() with str_has_prefix() PM: suspend: Fix platform_suspend_prepare_noirq() intel-hid: Disable button array during suspend-to-idle intel-hid: intel-vbtn: Avoid leaking wakeup_mode set ACPI: PM: s2idle: Execute LPS0 _DSM functions with suspended devices ACPI: EC: PM: Make acpi_ec_dispatch_gpe() print debug message ACPI: EC: PM: Consolidate some code depending on PM_SLEEP ACPI: PM: s2idle: Eliminate acpi_sleep_no_ec_events() ACPI: PM: s2idle: Switch EC over to polling during "noirq" suspend ACPI: PM: s2idle: Add acpi.sleep_no_lps0 module parameter ACPI: PM: s2idle: Rearrange lps0_device_attach() PM/sleep: Expose suspend stats in sysfs ...
2019-09-16Merge branch 'sched-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: - MAINTAINERS: Add Mark Rutland as perf submaintainer, Juri Lelli and Vincent Guittot as scheduler submaintainers. Add Dietmar Eggemann, Steven Rostedt, Ben Segall and Mel Gorman as scheduler reviewers. As perf and the scheduler is getting bigger and more complex, document the status quo of current responsibilities and interests, and spread the review pain^H^H^H^H fun via an increase in the Cc: linecount generated by scripts/get_maintainer.pl. :-) - Add another series of patches that brings the -rt (PREEMPT_RT) tree closer to mainline: split the monolithic CONFIG_PREEMPT dependencies into a new CONFIG_PREEMPTION category that will allow the eventual introduction of CONFIG_PREEMPT_RT. Still a few more hundred patches to go though. - Extend the CPU cgroup controller with uclamp.min and uclamp.max to allow the finer shaping of CPU bandwidth usage. - Micro-optimize energy-aware wake-ups from O(CPUS^2) to O(CPUS). - Improve the behavior of high CPU count, high thread count applications running under cpu.cfs_quota_us constraints. - Improve balancing with SCHED_IDLE (SCHED_BATCH) tasks present. - Improve CPU isolation housekeeping CPU allocation NUMA locality. - Fix deadline scheduler bandwidth calculations and logic when cpusets rebuilds the topology, or when it gets deadline-throttled while it's being offlined. - Convert the cpuset_mutex to percpu_rwsem, to allow it to be used from setscheduler() system calls without creating global serialization. Add new synchronization between cpuset topology-changing events and the deadline acceptance tests in setscheduler(), which were broken before. - Rework the active_mm state machine to be less confusing and more optimal. - Rework (simplify) the pick_next_task() slowpath. - Improve load-balancing on AMD EPYC systems. - ... and misc cleanups, smaller fixes and improvements - please see the Git log for more details. * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits) sched/psi: Correct overly pessimistic size calculation sched/fair: Speed-up energy-aware wake-ups sched/uclamp: Always use 'enum uclamp_id' for clamp_id values sched/uclamp: Update CPU's refcount on TG's clamp changes sched/uclamp: Use TG's clamps to restrict TASK's clamps sched/uclamp: Propagate system defaults to the root group sched/uclamp: Propagate parent clamps sched/uclamp: Extend CPU's cgroup controller sched/topology: Improve load balancing on AMD EPYC systems arch, ia64: Make NUMA select SMP sched, perf: MAINTAINERS update, add submaintainers and reviewers sched/fair: Use rq_lock/unlock in online_fair_sched_group cpufreq: schedutil: fix equation in comment sched: Rework pick_next_task() slow-path sched: Allow put_prev_task() to drop rq->lock sched/fair: Expose newidle_balance() sched: Add task_struct pointer to sched_class::set_curr_task sched: Rework CPU hotplug task selection sched/{rt,deadline}: Fix set_next_task vs pick_next_task sched: Fix kerneldoc comment for ia64_set_curr_task ...
2019-09-16Merge tag 'please-pull-ia64_for_5.4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux Pull ia64 updates from Tony Luck: "The big change here is removal of support for SGI Altix" * tag 'please-pull-ia64_for_5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux: (33 commits) genirq: remove the is_affinity_mask_valid hook ia64: remove CONFIG_SWIOTLB ifdefs ia64: remove support for machvecs ia64: move the screen_info setup to common code ia64: move the ROOT_DEV setup to common code ia64: rework iommu probing ia64: remove the unused sn_coherency_id symbol ia64: remove the SGI UV simulator support ia64: remove the zx1 swiotlb machvec ia64: remove CONFIG_ACPI ifdefs ia64: remove CONFIG_PCI ifdefs ia64: remove the hpsim platform ia64: remove now unused machvec indirections ia64: remove support for the SGI SN2 platform drivers: remove the SGI SN2 IOC4 base support drivers: remove the SGI SN2 IOC3 base support qla2xxx: remove SGI SN2 support qla1280: remove SGI SN2 support misc/sgi-xp: remove SGI SN2 support char/mspec: remove SGI SN2 support ...
2019-09-06Merge tag 'irqchip-5.4' of ↵Thomas Gleixner
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core Pull irqchip updates for Linux 5.4 from Marc Zyngier: - Large GICv3 updates to support new PPI and SPI ranges - Conver all alloc_fwnode() users to use PAs instead of VAs - Add support for Marvell's MMP3 irqchip - Add support for Amlogic Meson SM1 - Various cleanups and fixes
2019-09-05genirq: Prevent NULL pointer dereference in resend_irqs()Yunfeng Ye
The following crash was observed: Unable to handle kernel NULL pointer dereference at 0000000000000158 Internal error: Oops: 96000004 [#1] SMP pc : resend_irqs+0x68/0xb0 lr : resend_irqs+0x64/0xb0 ... Call trace: resend_irqs+0x68/0xb0 tasklet_action_common.isra.6+0x84/0x138 tasklet_action+0x2c/0x38 __do_softirq+0x120/0x324 run_ksoftirqd+0x44/0x60 smpboot_thread_fn+0x1ac/0x1e8 kthread+0x134/0x138 ret_from_fork+0x10/0x18 The reason for this is that the interrupt resend mechanism happens in soft interrupt context, which is a asynchronous mechanism versus other operations on interrupts. free_irq() does not take resend handling into account. Thus, the irq descriptor might be already freed before the resend tasklet is executed. resend_irqs() does not check the return value of the interrupt descriptor lookup and derefences the return value unconditionally. 1): __setup_irq irq_startup check_irq_resend // activate softirq to handle resend irq 2): irq_domain_free_irqs irq_free_descs free_desc call_rcu(&desc->rcu, delayed_free_desc) 3): __do_softirq tasklet_action resend_irqs desc = irq_to_desc(irq) desc->handle_irq(desc) // desc is NULL --> Ooops Fix this by adding a NULL pointer check in resend_irqs() before derefencing the irq descriptor. Fixes: a4633adcdbc1 ("[PATCH] genirq: add genirq sw IRQ-retrigger") Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Zhiqiang Liu <liuzhiqiang26@huawei.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1630ae13-5c8e-901e-de09-e740b6a426a7@huawei.com
2019-09-03irqdomain: Add the missing assignment of domain->fwnode for named fwnodeDexuan Cui
Recently device pass-through stops working for Linux VM running on Hyper-V. git-bisect shows the regression is caused by the recent commit 467a3bb97432 ("PCI: hv: Allocate a named fwnode ..."), but the root cause is that the commit d59f6617eef0 forgets to set the domain->fwnode for IRQCHIP_FWNODE_NAMED*, and as a result: 1. The domain->fwnode remains to be NULL. 2. irq_find_matching_fwspec() returns NULL since "h->fwnode == fwnode" is false, and pci_set_bus_msi_domain() sets the Hyper-V PCI root bus's msi_domain to NULL. 3. When the device is added onto the root bus, the device's dev->msi_domain is set to NULL in pci_set_msi_domain(). 4. When a device driver tries to enable MSI-X, pci_msi_setup_msi_irqs() calls arch_setup_msi_irqs(), which uses the native MSI chip (i.e. arch/x86/kernel/apic/msi.c: pci_msi_controller) to set up the irqs, but actually pci_msi_setup_msi_irqs() is supposed to call msi_domain_alloc_irqs() with the hbus->irq_domain, which is created in hv_pcie_init_irq_domain() and is associated with the Hyper-V chip hv_msi_irq_chip. Consequently, the irq line is not properly set up, and the device driver can not receive any interrupt. Fixes: d59f6617eef0 ("genirq: Allow fwnode to carry name information only") Fixes: 467a3bb97432 ("PCI: hv: Allocate a named fwnode instead of an address-based one") Reported-by: Lili Deng <v-lide@microsoft.com> Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/PU1P153MB01694D9AF625AC335C600C5FBFBE0@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM
2019-08-28genirq/affinity: Remove const qualifier from node_to_cpumask argumentMing Lei
When CONFIG_CPUMASK_OFFSTACK isn't enabled, 'cpumask_var_t' is as 'typedef struct cpumask cpumask_var_t[1]', so the argument 'node_to_cpumask' alloc_nodes_vectors() can't be declared as 'const cpumask_var_t *' Fixes the following warning: kernel/irq/affinity.c: In function '__irq_build_affinity_masks': alloc_nodes_vectors(numvecs, node_to_cpumask, cpu_mask, ^ kernel/irq/affinity.c:128:13: note: expected 'const struct cpumask (*)[1]' but argument is of type 'struct cpumask (*)[1]' static void alloc_nodes_vectors(unsigned int numvecs, ^ Fixes: b1a5a73e64e9 ("genirq/affinity: Spread vectors on node according to nr_cpu ratio") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190828085815.19931-1-ming.lei@redhat.com
2019-08-27genirq/affinity: Spread vectors on node according to nr_cpu ratioMing Lei
Now __irq_build_affinity_masks() spreads vectors evenly per node, but there is a case that not all vectors have been spread when each numa node has a different number of CPUs which triggers the warning in the spreading code. Improve the spreading algorithm by - assigning vectors according to the ratio of the number of CPUs on a node to the number of remaining CPUs. - running the assignment from smaller nodes to bigger nodes to guarantee that every active node gets allocated at least one vector. This ensures that all vectors are spread out. Asided of that the spread becomes more fair if the nodes have different number of CPUs. For example, on the following machine: CPU(s): 16 On-line CPU(s) list: 0-15 Thread(s) per core: 1 Core(s) per socket: 8 Socket(s): 2 NUMA node(s): 2 ... NUMA node0 CPU(s): 0,1,3,5-9,11,13-15 NUMA node1 CPU(s): 2,4,10,12 When a driver requests to allocate 8 vectors, the following spread results: irq 31, cpu list 2,4 irq 32, cpu list 10,12 irq 33, cpu list 0-1 irq 34, cpu list 3,5 irq 35, cpu list 6-7 irq 36, cpu list 8-9 irq 37, cpu list 11,13 irq 38, cpu list 14-15 So Node 0 has now 6 and Node 1 has 2 vectors assigned. The original algorithm assigned 4 vectors on each node which was unfair versus Node 0. [ tglx: Massaged changelog ] Reported-by: Jon Derrick <jonathan.derrick@intel.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Jon Derrick <jonathan.derrick@intel.com> Link: https://lkml.kernel.org/r/20190816022849.14075-3-ming.lei@redhat.com