summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)Author
2021-11-17mtd: rawnand: orion: Keep the driver compatible with on-die ECC enginesMiquel Raynal
commit 194ac63de6ff56d30c48e3ac19c8a412f9c1408e upstream. Following the introduction of the generic ECC engine infrastructure, it was necessary to reorganize the code and move the ECC configuration in the ->attach_chip() hook. Failing to do that properly lead to a first series of fixes supposed to stabilize the situation. Unfortunately, this only fixed the use of software ECC engines, preventing any other kind of engine to be used, including on-die ones. It is now time to (finally) fix the situation by ensuring that we still provide a default (eg. software ECC) but will still support different ECC engines such as on-die ECC engines if properly described in the device tree. There are no changes needed on the core side in order to do this, but we just need to leverage the logic there which allows: 1- a subsystem default (set to Host engines in the raw NAND world) 2- a driver specific default (here set to software ECC engines) 3- any type of engine requested by the user (ie. described in the DT) As the raw NAND subsystem has not yet been fully converted to the ECC engine infrastructure, in order to provide a default ECC engine for this driver we need to set chip->ecc.engine_type *before* calling nand_scan(). During the initialization step, the core will consider this entry as the default engine for this driver. This value may of course be overloaded by the user if the usual DT properties are provided. Fixes: 553508cec2e8 ("mtd: rawnand: orion: Move the ECC initialization to ->attach_chip()") Cc: stable@vger.kernel.org Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/20210928222258.199726-6-miquel.raynal@bootlin.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-17mtd: rawnand: pasemi: Keep the driver compatible with on-die ECC enginesMiquel Raynal
commit f16b7d2a5e810fcf4b15d096246d0d445da9cc88 upstream. Following the introduction of the generic ECC engine infrastructure, it was necessary to reorganize the code and move the ECC configuration in the ->attach_chip() hook. Failing to do that properly lead to a first series of fixes supposed to stabilize the situation. Unfortunately, this only fixed the use of software ECC engines, preventing any other kind of engine to be used, including on-die ones. It is now time to (finally) fix the situation by ensuring that we still provide a default (eg. software ECC) but will still support different ECC engines such as on-die ECC engines if properly described in the device tree. There are no changes needed on the core side in order to do this, but we just need to leverage the logic there which allows: 1- a subsystem default (set to Host engines in the raw NAND world) 2- a driver specific default (here set to software ECC engines) 3- any type of engine requested by the user (ie. described in the DT) As the raw NAND subsystem has not yet been fully converted to the ECC engine infrastructure, in order to provide a default ECC engine for this driver we need to set chip->ecc.engine_type *before* calling nand_scan(). During the initialization step, the core will consider this entry as the default engine for this driver. This value may of course be overloaded by the user if the usual DT properties are provided. Fixes: 8fc6f1f042b2 ("mtd: rawnand: pasemi: Move the ECC initialization to ->attach_chip()") Cc: stable@vger.kernel.org Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/20210928222258.199726-7-miquel.raynal@bootlin.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-17mtd: rawnand: gpio: Keep the driver compatible with on-die ECC enginesMiquel Raynal
commit b5b5b4dc6fcd8194b9dd38c8acdc5ab71adf44f8 upstream. Following the introduction of the generic ECC engine infrastructure, it was necessary to reorganize the code and move the ECC configuration in the ->attach_chip() hook. Failing to do that properly lead to a first series of fixes supposed to stabilize the situation. Unfortunately, this only fixed the use of software ECC engines, preventing any other kind of engine to be used, including on-die ones. It is now time to (finally) fix the situation by ensuring that we still provide a default (eg. software ECC) but will still support different ECC engines such as on-die ECC engines if properly described in the device tree. There are no changes needed on the core side in order to do this, but we just need to leverage the logic there which allows: 1- a subsystem default (set to Host engines in the raw NAND world) 2- a driver specific default (here set to software ECC engines) 3- any type of engine requested by the user (ie. described in the DT) As the raw NAND subsystem has not yet been fully converted to the ECC engine infrastructure, in order to provide a default ECC engine for this driver we need to set chip->ecc.engine_type *before* calling nand_scan(). During the initialization step, the core will consider this entry as the default engine for this driver. This value may of course be overloaded by the user if the usual DT properties are provided. Fixes: f6341f6448e0 ("mtd: rawnand: gpio: Move the ECC initialization to ->attach_chip()") Cc: stable@vger.kernel.org Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/20210928222258.199726-4-miquel.raynal@bootlin.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-17mtd: rawnand: mpc5121: Keep the driver compatible with on-die ECC enginesMiquel Raynal
commit f9d8570b7fd6f4f08528ce2f5e39787a8a260cd6 upstream. Following the introduction of the generic ECC engine infrastructure, it was necessary to reorganize the code and move the ECC configuration in the ->attach_chip() hook. Failing to do that properly lead to a first series of fixes supposed to stabilize the situation. Unfortunately, this only fixed the use of software ECC engines, preventing any other kind of engine to be used, including on-die ones. It is now time to (finally) fix the situation by ensuring that we still provide a default (eg. software ECC) but will still support different ECC engines such as on-die ECC engines if properly described in the device tree. There are no changes needed on the core side in order to do this, but we just need to leverage the logic there which allows: 1- a subsystem default (set to Host engines in the raw NAND world) 2- a driver specific default (here set to software ECC engines) 3- any type of engine requested by the user (ie. described in the DT) As the raw NAND subsystem has not yet been fully converted to the ECC engine infrastructure, in order to provide a default ECC engine for this driver we need to set chip->ecc.engine_type *before* calling nand_scan(). During the initialization step, the core will consider this entry as the default engine for this driver. This value may of course be overloaded by the user if the usual DT properties are provided. Fixes: 6dd09f775b72 ("mtd: rawnand: mpc5121: Move the ECC initialization to ->attach_chip()") Cc: stable@vger.kernel.org Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/20210928222258.199726-5-miquel.raynal@bootlin.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-17mtd: rawnand: xway: Keep the driver compatible with on-die ECC enginesMiquel Raynal
commit 6bcd2960af1b7bacb2f1e710ab0c0b802d900501 upstream. Following the introduction of the generic ECC engine infrastructure, it was necessary to reorganize the code and move the ECC configuration in the ->attach_chip() hook. Failing to do that properly lead to a first series of fixes supposed to stabilize the situation. Unfortunately, this only fixed the use of software ECC engines, preventing any other kind of engine to be used, including on-die ones. It is now time to (finally) fix the situation by ensuring that we still provide a default (eg. software ECC) but will still support different ECC engines such as on-die ECC engines if properly described in the device tree. There are no changes needed on the core side in order to do this, but we just need to leverage the logic there which allows: 1- a subsystem default (set to Host engines in the raw NAND world) 2- a driver specific default (here set to software ECC engines) 3- any type of engine requested by the user (ie. described in the DT) As the raw NAND subsystem has not yet been fully converted to the ECC engine infrastructure, in order to provide a default ECC engine for this driver we need to set chip->ecc.engine_type *before* calling nand_scan(). During the initialization step, the core will consider this entry as the default engine for this driver. This value may of course be overloaded by the user if the usual DT properties are provided. Fixes: d525914b5bd8 ("mtd: rawnand: xway: Move the ECC initialization to ->attach_chip()") Cc: stable@vger.kernel.org Cc: Jan Hoffmann <jan@3e8.eu> Cc: Kestrel seventyfour <kestrelseventyfour@gmail.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Tested-by: Jan Hoffmann <jan@3e8.eu> Link: https://lore.kernel.org/linux-mtd/20210928222258.199726-10-miquel.raynal@bootlin.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-17mtd: rawnand: ams-delta: Keep the driver compatible with on-die ECC enginesMiquel Raynal
commit d707bb74daae07879e0fc1b4b960f8f2d0a5fe5d upstream. Following the introduction of the generic ECC engine infrastructure, it was necessary to reorganize the code and move the ECC configuration in the ->attach_chip() hook. Failing to do that properly lead to a first series of fixes supposed to stabilize the situation. Unfortunately, this only fixed the use of software ECC engines, preventing any other kind of engine to be used, including on-die ones. It is now time to (finally) fix the situation by ensuring that we still provide a default (eg. software ECC) but will still support different ECC engines such as on-die ECC engines if properly described in the device tree. There are no changes needed on the core side in order to do this, but we just need to leverage the logic there which allows: 1- a subsystem default (set to Host engines in the raw NAND world) 2- a driver specific default (here set to software ECC engines) 3- any type of engine requested by the user (ie. described in the DT) As the raw NAND subsystem has not yet been fully converted to the ECC engine infrastructure, in order to provide a default ECC engine for this driver we need to set chip->ecc.engine_type *before* calling nand_scan(). During the initialization step, the core will consider this entry as the default engine for this driver. This value may of course be overloaded by the user if the usual DT properties are provided. Fixes: 59d93473323a ("mtd: rawnand: ams-delta: Move the ECC initialization to ->attach_chip()") Cc: stable@vger.kernel.org Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/20210928222258.199726-2-miquel.raynal@bootlin.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-17mtd: rawnand: fsmc: Fix use of SM ORDERMiquel Raynal
commit 9be1446ece291a1f08164bd056bed3d698681f8b upstream. The introduction of the generic ECC engine API lead to a number of changes in various drivers which broke some of them. Here is a typical example: I expected the SM_ORDER option to be handled by the Hamming ECC engine internals. Problem: the fsmc driver does not instantiate (yet) a real ECC engine object so we had to use a 'bare' ECC helper instead of the shiny rawnand functions. However, when not intializing this engine properly and using the bare helpers, we do not get the SM ORDER feature handled automatically. It looks like this was lost in the process so let's ensure we use the right SM ORDER now. Fixes: ad9ffdce4539 ("mtd: rawnand: fsmc: Fix external use of SW Hamming ECC helper") Cc: stable@vger.kernel.org Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/20210928221507.199198-2-miquel.raynal@bootlin.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-17remoteproc: imx_rproc: Fix rsc-table nameDong Aisheng
commit e90547d59d4e29e269e22aa6ce590ed0b41207d2 upstream. Usually the dash '-' is preferred in node name. So far, not dts in upstream kernel, so we just update node name in driver. Cc: Bjorn Andersson <bjorn.andersson@linaro.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Fixes: 5e4c1243071d ("remoteproc: imx_rproc: support remote cores booted before Linux Kernel") Reviewed-and-tested-by: Peng Fan <peng.fan@nxp.com> Signed-off-by: Dong Aisheng <aisheng.dong@nxp.com> Signed-off-by: Peng Fan <peng.fan@nxp.com> Cc: stable <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20210910090621.3073540-6-peng.fan@oss.nxp.com Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-17remoteproc: imx_rproc: Fix ignoring mapping vdev regionsDong Aisheng
commit afe670e23af91d8a74a8d7049f6e0984bbf6ea11 upstream. vdev regions are typically named vdev0buffer, vdev0ring0, vdev0ring1 and etc. Change to strncmp to cover them all. Fixes: 8f2d8961640f ("remoteproc: imx_rproc: ignore mapping vdev regions") Reviewed-and-tested-by: Peng Fan <peng.fan@nxp.com> Signed-off-by: Dong Aisheng <aisheng.dong@nxp.com> Signed-off-by: Peng Fan <peng.fan@nxp.com> Cc: stable <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20210910090621.3073540-5-peng.fan@oss.nxp.com Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-17remoteproc: Fix the wrong default value of is_iomemDong Aisheng
commit 970675f61bf5761d7e5326f6e4df995ecdba5e11 upstream. Currently the is_iomem is a random value in the stack which may be default to true even on those platforms that not use iomem to store firmware. Cc: Bjorn Andersson <bjorn.andersson@linaro.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Fixes: 40df0a91b2a5 ("remoteproc: add is_iomem to da_to_va") Reviewed-and-tested-by: Peng Fan <peng.fan@nxp.com> Signed-off-by: Dong Aisheng <aisheng.dong@nxp.com> Signed-off-by: Peng Fan <peng.fan@nxp.com> Cc: stable <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20210910090621.3073540-3-peng.fan@oss.nxp.com Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-17remoteproc: elf_loader: Fix loading segment when is_iomem truePeng Fan
commit 24acbd9dc934f5d9418a736c532d3970a272063e upstream. It seems luckliy work on i.MX platform, but it is wrong. Need use memcpy_toio, not memcpy_fromio. Fixes: 40df0a91b2a5 ("remoteproc: add is_iomem to da_to_va") Tested-by: Dong Aisheng <aisheng.dong@nxp.com> (i.MX8MQ) Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dong Aisheng <aisheng.dong@nxp.com> Signed-off-by: Peng Fan <peng.fan@nxp.com> Cc: stable <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20210910090621.3073540-2-peng.fan@oss.nxp.com Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-17s390/cio: make ccw_device_dma_* more robustHalil Pasic
commit ad9a14517263a16af040598c7920c09ca9670a31 upstream. Since commit 48720ba56891 ("virtio/s390: use DMA memory for ccw I/O and classic notifiers") we were supposed to make sure that virtio_ccw_release_dev() completes before the ccw device and the attached dma pool are torn down, but unfortunately we did not. Before that commit it used to be OK to delay cleaning up the memory allocated by virtio-ccw indefinitely (which isn't really intuitive for guys used to destruction happens in reverse construction order), but now we trigger a BUG_ON if the genpool is destroyed before all memory allocated from it is deallocated. Which brings down the guest. We can observe this problem, when unregister_virtio_device() does not give up the last reference to the virtio_device (e.g. because a virtio-scsi attached scsi disk got removed without previously unmounting its previously mounted partition). To make sure that the genpool is only destroyed after all the necessary freeing is done let us take a reference on the ccw device on each ccw_device_dma_zalloc() and give it up on each ccw_device_dma_free(). Actually there are multiple approaches to fixing the problem at hand that can work. The upside of this one is that it is the safest one while remaining simple. We don't crash the guest even if the driver does not pair allocations and frees. The downside is the reference counting overhead, that the reference counting for ccw devices becomes more complex, in a sense that we need to pair the calls to the aforementioned functions for it to be correct, and that if we happen to leak, we leak more than necessary (the whole ccw device instead of just the genpool). Some alternatives to this approach are taking a reference in virtio_ccw_online() and giving it up in virtio_ccw_release_dev() or making sure virtio_ccw_release_dev() completes its work before virtio_ccw_remove() returns. The downside of these approaches is that these are less safe against programming errors. Cc: <stable@vger.kernel.org> # v5.3 Signed-off-by: Halil Pasic <pasic@linux.ibm.com> Fixes: 48720ba56891 ("virtio/s390: use DMA memory for ccw I/O and classic notifiers") Reported-by: bfu@redhat.com Reviewed-by: Vineeth Vijayan <vneethv@linux.ibm.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-17s390/ap: Fix hanging ioctl caused by orphaned repliesHarald Freudenberger
commit 3826350e6dd435e244eb6e47abad5a47c169ebc2 upstream. When a queue is switched to soft offline during heavy load and later switched to soft online again and now used, it may be that the caller is blocked forever in the ioctl call. The failure occurs because there is a pending reply after the queue(s) have been switched to offline. This orphaned reply is received when the queue is switched to online and is accidentally counted for the outstanding replies. So when there was a valid outstanding reply and this orphaned reply is received it counts as the outstanding one thus dropping the outstanding counter to 0. Voila, with this counter the receive function is not called any more and the real outstanding reply is never received (until another request comes in...) and the ioctl blocks. The fix is simple. However, instead of readjusting the counter when an orphaned reply is detected, I check the queue status for not empty and compare this to the outstanding counter. So if the queue is not empty then the counter must not drop to 0 but at least have a value of 1. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Cc: stable@vger.kernel.org Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-17s390/tape: fix timer initialization in tape_std_assign()Sven Schnelle
commit 213fca9e23b59581c573d558aa477556f00b8198 upstream. commit 9c6c273aa424 ("timer: Remove init_timer_on_stack() in favor of timer_setup_on_stack()") changed the timer setup from init_timer_on_stack(() to timer_setup(), but missed to change the mod_timer() call. And while at it, use msecs_to_jiffies() instead of the open coded timeout calculation. Cc: stable@vger.kernel.org Fixes: 9c6c273aa424 ("timer: Remove init_timer_on_stack() in favor of timer_setup_on_stack()") Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-17s390/cio: check the subchannel validity for dev_busidVineeth Vijayan
commit a4751f157c194431fae9e9c493f456df8272b871 upstream. Check the validity of subchanel before reading other fields in the schib. Fixes: d3683c055212 ("s390/cio: add dev_busid sysfs entry for each subchannel") CC: <stable@vger.kernel.org> Reported-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Vineeth Vijayan <vneethv@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Link: https://lore.kernel.org/r/20211105154451.847288-1-vneethv@linux.ibm.com Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-17s390/cpumf: cpum_cf PMU displays invalid value after hotplug removeThomas Richter
commit 9d48c7afedf91a02d03295837ec76b2fb5e7d3fe upstream. When a CPU is hotplugged while the perf stat -e cycles command is running, a wrong (very large) value is displayed immediately after the CPU removal: Check the values, shouldn't be too high as in time counts unit events 1.001101919 29261846 cycles 2.002454499 17523405 cycles 3.003659292 24361161 cycles 4.004816983 18446744073638406144 cycles 5.005671647 <not counted> cycles ... The CPU hotplug off took place after 3 seconds. The issue is the read of the event count value after 4 seconds when the CPU is not available and the read of the counter returns an error. This is treated as a counter value of zero. This results in a very large value (0 - previous_value). Fix this by detecting the hotplugged off CPU and report 0 instead of a very large number. Cc: stable@vger.kernel.org Fixes: a029a4eab39e ("s390/cpumf: Allow concurrent access for CPU Measurement Counter Facility") Reported-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-17PM: sleep: Avoid calling put_device() under dpm_list_mtxRafael J. Wysocki
commit 2aa36604e8243698ff22bd5fef0dd0c6bb07ba92 upstream. It is generally unsafe to call put_device() with dpm_list_mtx held, because the given device's release routine may carry out an action depending on that lock which then may deadlock, so modify the system-wide suspend and resume of devices to always drop dpm_list_mtx before calling put_device() (and adjust white space somewhat while at it). For instance, this prevents the following splat from showing up in the kernel log after a system resume in certain configurations: [ 3290.969514] ====================================================== [ 3290.969517] WARNING: possible circular locking dependency detected [ 3290.969519] 5.15.0+ #2420 Tainted: G S [ 3290.969523] ------------------------------------------------------ [ 3290.969525] systemd-sleep/4553 is trying to acquire lock: [ 3290.969529] ffff888117ab1138 ((wq_completion)hci0#2){+.+.}-{0:0}, at: flush_workqueue+0x87/0x4a0 [ 3290.969554] but task is already holding lock: [ 3290.969556] ffffffff8280fca8 (dpm_list_mtx){+.+.}-{3:3}, at: dpm_resume+0x12e/0x3e0 [ 3290.969571] which lock already depends on the new lock. [ 3290.969573] the existing dependency chain (in reverse order) is: [ 3290.969575] -> #3 (dpm_list_mtx){+.+.}-{3:3}: [ 3290.969583] __mutex_lock+0x9d/0xa30 [ 3290.969591] device_pm_add+0x2e/0xe0 [ 3290.969597] device_add+0x4d5/0x8f0 [ 3290.969605] hci_conn_add_sysfs+0x43/0xb0 [bluetooth] [ 3290.969689] hci_conn_complete_evt.isra.71+0x124/0x750 [bluetooth] [ 3290.969747] hci_event_packet+0xd6c/0x28a0 [bluetooth] [ 3290.969798] hci_rx_work+0x213/0x640 [bluetooth] [ 3290.969842] process_one_work+0x2aa/0x650 [ 3290.969851] worker_thread+0x39/0x400 [ 3290.969859] kthread+0x142/0x170 [ 3290.969865] ret_from_fork+0x22/0x30 [ 3290.969872] -> #2 (&hdev->lock){+.+.}-{3:3}: [ 3290.969881] __mutex_lock+0x9d/0xa30 [ 3290.969887] hci_event_packet+0xba/0x28a0 [bluetooth] [ 3290.969935] hci_rx_work+0x213/0x640 [bluetooth] [ 3290.969978] process_one_work+0x2aa/0x650 [ 3290.969985] worker_thread+0x39/0x400 [ 3290.969993] kthread+0x142/0x170 [ 3290.969999] ret_from_fork+0x22/0x30 [ 3290.970004] -> #1 ((work_completion)(&hdev->rx_work)){+.+.}-{0:0}: [ 3290.970013] process_one_work+0x27d/0x650 [ 3290.970020] worker_thread+0x39/0x400 [ 3290.970028] kthread+0x142/0x170 [ 3290.970033] ret_from_fork+0x22/0x30 [ 3290.970038] -> #0 ((wq_completion)hci0#2){+.+.}-{0:0}: [ 3290.970047] __lock_acquire+0x15cb/0x1b50 [ 3290.970054] lock_acquire+0x26c/0x300 [ 3290.970059] flush_workqueue+0xae/0x4a0 [ 3290.970066] drain_workqueue+0xa1/0x130 [ 3290.970073] destroy_workqueue+0x34/0x1f0 [ 3290.970081] hci_release_dev+0x49/0x180 [bluetooth] [ 3290.970130] bt_host_release+0x1d/0x30 [bluetooth] [ 3290.970195] device_release+0x33/0x90 [ 3290.970201] kobject_release+0x63/0x160 [ 3290.970211] dpm_resume+0x164/0x3e0 [ 3290.970215] dpm_resume_end+0xd/0x20 [ 3290.970220] suspend_devices_and_enter+0x1a4/0xba0 [ 3290.970229] pm_suspend+0x26b/0x310 [ 3290.970236] state_store+0x42/0x90 [ 3290.970243] kernfs_fop_write_iter+0x135/0x1b0 [ 3290.970251] new_sync_write+0x125/0x1c0 [ 3290.970257] vfs_write+0x360/0x3c0 [ 3290.970263] ksys_write+0xa7/0xe0 [ 3290.970269] do_syscall_64+0x3a/0x80 [ 3290.970276] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 3290.970284] other info that might help us debug this: [ 3290.970285] Chain exists of: (wq_completion)hci0#2 --> &hdev->lock --> dpm_list_mtx [ 3290.970297] Possible unsafe locking scenario: [ 3290.970299] CPU0 CPU1 [ 3290.970300] ---- ---- [ 3290.970302] lock(dpm_list_mtx); [ 3290.970306] lock(&hdev->lock); [ 3290.970310] lock(dpm_list_mtx); [ 3290.970314] lock((wq_completion)hci0#2); [ 3290.970319] *** DEADLOCK *** [ 3290.970321] 7 locks held by systemd-sleep/4553: [ 3290.970325] #0: ffff888103bcd448 (sb_writers#4){.+.+}-{0:0}, at: ksys_write+0xa7/0xe0 [ 3290.970341] #1: ffff888115a14488 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x103/0x1b0 [ 3290.970355] #2: ffff888100f719e0 (kn->active#233){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x10c/0x1b0 [ 3290.970369] #3: ffffffff82661048 (autosleep_lock){+.+.}-{3:3}, at: state_store+0x12/0x90 [ 3290.970384] #4: ffffffff82658ac8 (system_transition_mutex){+.+.}-{3:3}, at: pm_suspend+0x9f/0x310 [ 3290.970399] #5: ffffffff827f2a48 (acpi_scan_lock){+.+.}-{3:3}, at: acpi_suspend_begin+0x4c/0x80 [ 3290.970416] #6: ffffffff8280fca8 (dpm_list_mtx){+.+.}-{3:3}, at: dpm_resume+0x12e/0x3e0 [ 3290.970428] stack backtrace: [ 3290.970431] CPU: 3 PID: 4553 Comm: systemd-sleep Tainted: G S 5.15.0+ #2420 [ 3290.970438] Hardware name: Dell Inc. XPS 13 9380/0RYJWW, BIOS 1.5.0 06/03/2019 [ 3290.970441] Call Trace: [ 3290.970446] dump_stack_lvl+0x44/0x57 [ 3290.970454] check_noncircular+0x105/0x120 [ 3290.970468] ? __lock_acquire+0x15cb/0x1b50 [ 3290.970474] __lock_acquire+0x15cb/0x1b50 [ 3290.970487] lock_acquire+0x26c/0x300 [ 3290.970493] ? flush_workqueue+0x87/0x4a0 [ 3290.970503] ? __raw_spin_lock_init+0x3b/0x60 [ 3290.970510] ? lockdep_init_map_type+0x58/0x240 [ 3290.970519] flush_workqueue+0xae/0x4a0 [ 3290.970526] ? flush_workqueue+0x87/0x4a0 [ 3290.970544] ? drain_workqueue+0xa1/0x130 [ 3290.970552] drain_workqueue+0xa1/0x130 [ 3290.970561] destroy_workqueue+0x34/0x1f0 [ 3290.970572] hci_release_dev+0x49/0x180 [bluetooth] [ 3290.970624] bt_host_release+0x1d/0x30 [bluetooth] [ 3290.970687] device_release+0x33/0x90 [ 3290.970695] kobject_release+0x63/0x160 [ 3290.970705] dpm_resume+0x164/0x3e0 [ 3290.970710] ? dpm_resume_early+0x251/0x3b0 [ 3290.970718] dpm_resume_end+0xd/0x20 [ 3290.970723] suspend_devices_and_enter+0x1a4/0xba0 [ 3290.970737] pm_suspend+0x26b/0x310 [ 3290.970746] state_store+0x42/0x90 [ 3290.970755] kernfs_fop_write_iter+0x135/0x1b0 [ 3290.970764] new_sync_write+0x125/0x1c0 [ 3290.970777] vfs_write+0x360/0x3c0 [ 3290.970785] ksys_write+0xa7/0xe0 [ 3290.970794] do_syscall_64+0x3a/0x80 [ 3290.970803] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 3290.970811] RIP: 0033:0x7f41b1328164 [ 3290.970819] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 80 00 00 00 00 8b 05 4a d2 2c 00 48 63 ff 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 f3 c3 66 90 55 53 48 89 d5 48 89 f3 48 83 [ 3290.970824] RSP: 002b:00007ffe6ae21b28 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 3290.970831] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f41b1328164 [ 3290.970836] RDX: 0000000000000004 RSI: 000055965e651070 RDI: 0000000000000004 [ 3290.970839] RBP: 000055965e651070 R08: 000055965e64f390 R09: 00007f41b1e3d1c0 [ 3290.970843] R10: 000000000000000a R11: 0000000000000246 R12: 0000000000000004 [ 3290.970846] R13: 0000000000000001 R14: 000055965e64f2b0 R15: 0000000000000004 Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-17video: backlight: Drop maximum brightness override for brightness zeroMarek Vasut
commit 33a5471f8da976bf271a1ebbd6b9d163cb0cb6aa upstream. The note in c2adda27d202f ("video: backlight: Add of_find_backlight helper in backlight.c") says that gpio-backlight uses brightness as power state. This has been fixed since in ec665b756e6f7 ("backlight: gpio-backlight: Correct initial power state handling") and other backlight drivers do not require this workaround. Drop the workaround. This fixes the case where e.g. pwm-backlight can perfectly well be set to brightness 0 on boot in DT, which without this patch leads to the display brightness to be max instead of off. Fixes: c2adda27d202f ("video: backlight: Add of_find_backlight helper in backlight.c") Cc: <stable@vger.kernel.org> # 5.4+ Cc: <stable@vger.kernel.org> # 4.19.x: ec665b756e6f7: backlight: gpio-backlight: Correct initial power state handling Signed-off-by: Marek Vasut <marex@denx.de> Acked-by: Noralf Trønnes <noralf@tronnes.org> Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-17mfd: dln2: Add cell for initializing DLN2 ADCJack Andersen
commit 313c84b5ae4104e48c661d5d706f9f4c425fd50f upstream. This patch extends the DLN2 driver; adding cell for adc_dln2 module. The original patch[1] fell through the cracks when the driver was added so ADC has never actually been usable. That patch did not have ACPI support which was added in v5.9, so the oldest supported version this current patch can be backported to is 5.10. [1] https://www.spinics.net/lists/linux-iio/msg33975.html Cc: <stable@vger.kernel.org> # 5.10+ Signed-off-by: Jack Andersen <jackoalan@gmail.com> Signed-off-by: Noralf Trønnes <noralf@tronnes.org> Signed-off-by: Lee Jones <lee.jones@linaro.org> Link: https://lore.kernel.org/r/20211018112541.25466-1-noralf@tronnes.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-17mm, oom: do not trigger out_of_memory from the #PFMichal Hocko
commit 60e2793d440a3ec95abb5d6d4fc034a4b480472d upstream. Any allocation failure during the #PF path will return with VM_FAULT_OOM which in turn results in pagefault_out_of_memory. This can happen for 2 different reasons. a) Memcg is out of memory and we rely on mem_cgroup_oom_synchronize to perform the memcg OOM handling or b) normal allocation fails. The latter is quite problematic because allocation paths already trigger out_of_memory and the page allocator tries really hard to not fail allocations. Anyway, if the OOM killer has been already invoked there is no reason to invoke it again from the #PF path. Especially when the OOM condition might be gone by that time and we have no way to find out other than allocate. Moreover if the allocation failed and the OOM killer hasn't been invoked then we are unlikely to do the right thing from the #PF context because we have already lost the allocation context and restictions and therefore might oom kill a task from a different NUMA domain. This all suggests that there is no legitimate reason to trigger out_of_memory from pagefault_out_of_memory so drop it. Just to be sure that no #PF path returns with VM_FAULT_OOM without allocation print a warning that this is happening before we restart the #PF. [VvS: #PF allocation can hit into limit of cgroup v1 kmem controller. This is a local problem related to memcg, however, it causes unnecessary global OOM kills that are repeated over and over again and escalate into a real disaster. This has been broken since kmem accounting has been introduced for cgroup v1 (3.8). There was no kmem specific reclaim for the separate limit so the only way to handle kmem hard limit was to return with ENOMEM. In upstream the problem will be fixed by removing the outdated kmem limit, however stable and LTS kernels cannot do it and are still affected. This patch fixes the problem and should be backported into stable/LTS.] Link: https://lkml.kernel.org/r/f5fd8dd8-0ad4-c524-5f65-920b01972a42@virtuozzo.com Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Roman Gushchin <guro@fb.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-17mm, oom: pagefault_out_of_memory: don't force global OOM for dying tasksVasily Averin
commit 0b28179a6138a5edd9d82ad2687c05b3773c387b upstream. Patch series "memcg: prohibit unconditional exceeding the limit of dying tasks", v3. Memory cgroup charging allows killed or exiting tasks to exceed the hard limit. It can be misused and allowed to trigger global OOM from inside a memcg-limited container. On the other hand if memcg fails allocation, called from inside #PF handler it triggers global OOM from inside pagefault_out_of_memory(). To prevent these problems this patchset: (a) removes execution of out_of_memory() from pagefault_out_of_memory(), becasue nobody can explain why it is necessary. (b) allow memcg to fail allocation of dying/killed tasks. This patch (of 3): Any allocation failure during the #PF path will return with VM_FAULT_OOM which in turn results in pagefault_out_of_memory which in turn executes out_out_memory() and can kill a random task. An allocation might fail when the current task is the oom victim and there are no memory reserves left. The OOM killer is already handled at the page allocator level for the global OOM and at the charging level for the memcg one. Both have much more information about the scope of allocation/charge request. This means that either the OOM killer has been invoked properly and didn't lead to the allocation success or it has been skipped because it couldn't have been invoked. In both cases triggering it from here is pointless and even harmful. It makes much more sense to let the killed task die rather than to wake up an eternally hungry oom-killer and send him to choose a fatter victim for breakfast. Link: https://lkml.kernel.org/r/0828a149-786e-7c06-b70a-52d086818ea3@virtuozzo.com Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Suggested-by: Michal Hocko <mhocko@suse.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Roman Gushchin <guro@fb.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-17io-wq: serialize hash clear with wakeupJens Axboe
commit d3e3c102d107bb84251455a298cf475f24bab995 upstream. We need to ensure that we serialize the stalled and hash bits with the wait_queue wait handler, or we could be racing with someone modifying the hashed state after we find it busy, but before we then give up and wait for it to be cleared. This can cause random delays or stalls when handling buffered writes for many files, where some of these files cause hash collisions between the worker threads. Cc: stable@vger.kernel.org Reported-by: Daniel Black <daniel@mariadb.org> Fixes: e941894eae31 ("io-wq: make buffered file write hashed work map per-ctx") Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-17io-wq: fix queue stalling raceJens Axboe
commit 0242f6426ea78fbe3933b44f8c55ae93ec37f6cc upstream. We need to set the stalled bit early, before we drop the lock for adding us to the stall hash queue. If not, then we can race with new work being queued between adding us to the stall hash and io_worker_handle_work() marking us stalled. Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-17io-wq: ensure that hash wait lock is IRQ disablingJens Axboe
commit 08bdbd39b58474d762242e1fadb7f2eb9ffcca71 upstream. A previous commit removed the IRQ safety of the worker and wqe locks, but that left one spot of the hash wait lock now being done without already having IRQs disabled. Ensure that we use the right locking variant for the hashed waitqueue lock. Fixes: a9a4aa9fbfc5 ("io-wq: wqe and worker locks no longer need to be IRQ safe") Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-17memcg: prohibit unconditional exceeding the limit of dying tasksVasily Averin
commit a4ebf1b6ca1e011289677239a2a361fde4a88076 upstream. Memory cgroup charging allows killed or exiting tasks to exceed the hard limit. It is assumed that the amount of the memory charged by those tasks is bound and most of the memory will get released while the task is exiting. This is resembling a heuristic for the global OOM situation when tasks get access to memory reserves. There is no global memory shortage at the memcg level so the memcg heuristic is more relieved. The above assumption is overly optimistic though. E.g. vmalloc can scale to really large requests and the heuristic would allow that. We used to have an early break in the vmalloc allocator for killed tasks but this has been reverted by commit b8c8a338f75e ("Revert "vmalloc: back off when the current task is killed""). There are likely other similar code paths which do not check for fatal signals in an allocation&charge loop. Also there are some kernel objects charged to a memcg which are not bound to a process life time. It has been observed that it is not really hard to trigger these bypasses and cause global OOM situation. One potential way to address these runaways would be to limit the amount of excess (similar to the global OOM with limited oom reserves). This is certainly possible but it is not really clear how much of an excess is desirable and still protects from global OOMs as that would have to consider the overall memcg configuration. This patch is addressing the problem by removing the heuristic altogether. Bypass is only allowed for requests which either cannot fail or where the failure is not desirable while excess should be still limited (e.g. atomic requests). Implementation wise a killed or dying task fails to charge if it has passed the OOM killer stage. That should give all forms of reclaim chance to restore the limit before the failure (ENOMEM) and tell the caller to back off. In addition, this patch renames should_force_charge() helper to task_is_dying() because now its use is not associated witch forced charging. This patch depends on pagefault_out_of_memory() to not trigger out_of_memory(), because then a memcg failure can unwind to VM_FAULT_OOM and cause a global OOM killer. Link: https://lkml.kernel.org/r/8f5cebbb-06da-4902-91f0-6566fc4b4203@virtuozzo.com Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Suggested-by: Michal Hocko <mhocko@suse.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Roman Gushchin <guro@fb.com> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Shakeel Butt <shakeelb@google.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-17mm/filemap.c: remove bogus VM_BUG_ONMatthew Wilcox (Oracle)
commit d417b49fff3e2f21043c834841e8623a6098741d upstream. It is not safe to check page->index without holding the page lock. It can be changed if the page is moved between the swap cache and the page cache for a shmem file, for example. There is a VM_BUG_ON below which checks page->index is correct after taking the page lock. Link: https://lkml.kernel.org/r/20210818144932.940640-1-willy@infradead.org Fixes: 5c211ba29deb ("mm: add and use find_lock_entries") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reported-by: <syzbot+c87be4f669d920c76330@syzkaller.appspotmail.com> Cc: Hugh Dickins <hughd@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-179p/net: fix missing error check in p9_check_errorsDominique Martinet
commit 27eb4c3144f7a5ebef3c9a261d80cb3e1fa784dc upstream. Link: https://lkml.kernel.org/r/99338965-d36c-886e-cd0e-1d8fff2b4746@gmail.com Reported-by: syzbot+06472778c97ed94af66d@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Signed-off-by: Dominique Martinet <asmadeus@codewreck.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-17bpf, cgroup: Assign cgroup in cgroup_sk_alloc when called from interruptDaniel Borkmann
[ Upstream commit 78cc316e9583067884eb8bd154301dc1e9ee945c ] If cgroup_sk_alloc() is called from interrupt context, then just assign the root cgroup to skcd->cgroup. Prior to commit 8520e224f547 ("bpf, cgroups: Fix cgroup v2 fallback on v1/v2 mixed mode") we would just return, and later on in sock_cgroup_ptr(), we were NULL-testing the cgroup in fast-path, and iff indeed NULL returning the root cgroup (v ?: &cgrp_dfl_root.cgrp). Rather than re-adding the NULL-test to the fast-path we can just assign it once from cgroup_sk_alloc() given v1/v2 handling has been simplified. The migration from NULL test with returning &cgrp_dfl_root.cgrp to assigning &cgrp_dfl_root.cgrp directly does /not/ change behavior for callers of sock_cgroup_ptr(). syzkaller was able to trigger a splat in the legacy netrom code base, where the RX handler in nr_rx_frame() calls nr_make_new() which calls sk_alloc() and therefore cgroup_sk_alloc() with in_interrupt() condition. Thus the NULL skcd->cgroup, where it trips over on cgroup_sk_free() side given it expects a non-NULL object. There are a few other candidates aside from netrom which have similar pattern where in their accept-like implementation, they just call to sk_alloc() and thus cgroup_sk_alloc() instead of sk_clone_lock() with the corresponding cgroup_sk_clone() which then inherits the cgroup from the parent socket. None of them are related to core protocols where BPF cgroup programs are running from. However, in future, they should follow to implement a similar inheritance mechanism. Additionally, with a !CONFIG_CGROUP_NET_PRIO and !CONFIG_CGROUP_NET_CLASSID configuration, the same issue was exposed also prior to 8520e224f547 due to commit e876ecc67db8 ("cgroup: memcg: net: do not associate sock with unrelated cgroup") which added the early in_interrupt() return back then. Fixes: 8520e224f547 ("bpf, cgroups: Fix cgroup v2 fallback on v1/v2 mixed mode") Fixes: e876ecc67db8 ("cgroup: memcg: net: do not associate sock with unrelated cgroup") Reported-by: syzbot+df709157a4ecaf192b03@syzkaller.appspotmail.com Reported-by: syzbot+533f389d4026d86a2a95@syzkaller.appspotmail.com Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Tested-by: syzbot+df709157a4ecaf192b03@syzkaller.appspotmail.com Tested-by: syzbot+533f389d4026d86a2a95@syzkaller.appspotmail.com Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/bpf/20210927123921.21535-1-daniel@iogearbox.net Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-17bpf, cgroups: Fix cgroup v2 fallback on v1/v2 mixed modeDaniel Borkmann
[ Upstream commit 8520e224f547cd070c7c8f97b1fc6d58cff7ccaa ] Fix cgroup v1 interference when non-root cgroup v2 BPF programs are used. Back in the days, commit bd1060a1d671 ("sock, cgroup: add sock->sk_cgroup") embedded per-socket cgroup information into sock->sk_cgrp_data and in order to save 8 bytes in struct sock made both mutually exclusive, that is, when cgroup v1 socket tagging (e.g. net_cls/net_prio) is used, then cgroup v2 falls back to the root cgroup in sock_cgroup_ptr() (&cgrp_dfl_root.cgrp). The assumption made was "there is no reason to mix the two and this is in line with how legacy and v2 compatibility is handled" as stated in bd1060a1d671. However, with Kubernetes more widely supporting cgroups v2 as well nowadays, this assumption no longer holds, and the possibility of the v1/v2 mixed mode with the v2 root fallback being hit becomes a real security issue. Many of the cgroup v2 BPF programs are also used for policy enforcement, just to pick _one_ example, that is, to programmatically deny socket related system calls like connect(2) or bind(2). A v2 root fallback would implicitly cause a policy bypass for the affected Pods. In production environments, we have recently seen this case due to various circumstances: i) a different 3rd party agent and/or ii) a container runtime such as [0] in the user's environment configuring legacy cgroup v1 net_cls tags, which triggered implicitly mentioned root fallback. Another case is Kubernetes projects like kind [1] which create Kubernetes nodes in a container and also add cgroup namespaces to the mix, meaning programs which are attached to the cgroup v2 root of the cgroup namespace get attached to a non-root cgroup v2 path from init namespace point of view. And the latter's root is out of reach for agents on a kind Kubernetes node to configure. Meaning, any entity on the node setting cgroup v1 net_cls tag will trigger the bypass despite cgroup v2 BPF programs attached to the namespace root. Generally, this mutual exclusiveness does not hold anymore in today's user environments and makes cgroup v2 usage from BPF side fragile and unreliable. This fix adds proper struct cgroup pointer for the cgroup v2 case to struct sock_cgroup_data in order to address these issues; this implicitly also fixes the tradeoffs being made back then with regards to races and refcount leaks as stated in bd1060a1d671, and removes the fallback, so that cgroup v2 BPF programs always operate as expected. [0] https://github.com/nestybox/sysbox/ [1] https://kind.sigs.k8s.io/ Fixes: bd1060a1d671 ("sock, cgroup: add sock->sk_cgroup") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Stanislav Fomichev <sdf@google.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/bpf/20210913230759.2313-1-daniel@iogearbox.net Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-17net, neigh: Enable state migration between NUD_PERMANENT and NTF_USEDaniel Borkmann
[ Upstream commit 3dc20f4762c62d3b3f0940644881ed818aa7b2f5 ] Currently, it is not possible to migrate a neighbor entry between NUD_PERMANENT state and NTF_USE flag with a dynamic NUD state from a user space control plane. Similarly, it is not possible to add/remove NTF_EXT_LEARNED flag from an existing neighbor entry in combination with NTF_USE flag. This is due to the latter directly calling into neigh_event_send() without any meta data updates as happening in __neigh_update(). Thus, to enable this use case, extend the latter with a NEIGH_UPDATE_F_USE flag where we break the NUD_PERMANENT state in particular so that a latter neigh_event_send() is able to re-resolve a neighbor entry. Before fix, NUD_PERMANENT -> NUD_* & NTF_USE: # ./ip/ip n replace 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a # ./ip/ip n 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a PERMANENT [...] # ./ip/ip n replace 192.168.178.30 dev enp5s0 use extern_learn # ./ip/ip n 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a PERMANENT [...] As can be seen, despite the admin-triggered replace, the entry remains in the NUD_PERMANENT state. After fix, NUD_PERMANENT -> NUD_* & NTF_USE: # ./ip/ip n replace 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a # ./ip/ip n 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a PERMANENT [...] # ./ip/ip n replace 192.168.178.30 dev enp5s0 use extern_learn # ./ip/ip n 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a extern_learn REACHABLE [...] # ./ip/ip n 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a extern_learn STALE [...] # ./ip/ip n replace 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a # ./ip/ip n 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a PERMANENT [...] After the fix, the admin-triggered replace switches to a dynamic state from the NTF_USE flag which triggered a new neighbor resolution. Likewise, we can transition back from there, if needed, into NUD_PERMANENT. Similar before/after behavior can be observed for below transitions: Before fix, NTF_USE -> NTF_USE | NTF_EXT_LEARNED -> NTF_USE: # ./ip/ip n replace 192.168.178.30 dev enp5s0 use # ./ip/ip n 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a REACHABLE [...] # ./ip/ip n replace 192.168.178.30 dev enp5s0 use extern_learn # ./ip/ip n 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a REACHABLE [...] After fix, NTF_USE -> NTF_USE | NTF_EXT_LEARNED -> NTF_USE: # ./ip/ip n replace 192.168.178.30 dev enp5s0 use # ./ip/ip n 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a REACHABLE [...] # ./ip/ip n replace 192.168.178.30 dev enp5s0 use extern_learn # ./ip/ip n 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a extern_learn REACHABLE [...] # ./ip/ip n replace 192.168.178.30 dev enp5s0 use # ./ip/ip n 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a REACHABLE [..] Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Roopa Prabhu <roopa@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-17dmaengine: bestcomm: fix system boot lockupsAnatolij Gustschin
commit adec566b05288f2787a1f88dbaf77ed8b0c644fa upstream. memset() and memcpy() on an MMIO region like here results in a lockup at startup on mpc5200 platform (since this first happens during probing of the ATA and Ethernet drivers). Use memset_io() and memcpy_toio() instead. Fixes: 2f9ea1bde0d1 ("bestcomm: core bestcomm support for Freescale MPC5200") Cc: stable@vger.kernel.org # v5.14+ Signed-off-by: Anatolij Gustschin <agust@denx.de> Link: https://lore.kernel.org/r/20211014094012.21286-1-agust@denx.de Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-17dmaengine: ti: k3-udma: Set r/tchan or rflow to NULL if request failKishon Vijay Abraham I
commit eb91224e47ec33a0a32c9be0ec0fcb3433e555fd upstream. udma_get_*() checks if rchan/tchan/rflow is already allocated by checking if it has a NON NULL value. For the error cases, rchan/tchan/rflow will have error value and udma_get_*() considers this as already allocated (PASS) since the error values are NON NULL. This results in NULL pointer dereference error while de-referencing rchan/tchan/rflow. Reset the value of rchan/tchan/rflow to NULL if a channel request fails. CC: stable@vger.kernel.org Acked-by: Peter Ujfalusi <peter.ujfalusi@gmail.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Link: https://lore.kernel.org/r/20211031032411.27235-3-kishon@ti.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-17dmaengine: ti: k3-udma: Set bchan to NULL if a channel request failKishon Vijay Abraham I
commit 5c6c6d60e4b489308ae4da8424c869f7cc53cd12 upstream. bcdma_get_*() checks if bchan is already allocated by checking if it has a NON NULL value. For the error cases, bchan will have error value and bcdma_get_*() considers this as already allocated (PASS) since the error values are NON NULL. This results in NULL pointer dereference error while de-referencing bchan. Reset the value of bchan to NULL if a channel request fails. CC: stable@vger.kernel.org Acked-by: Peter Ujfalusi <peter.ujfalusi@gmail.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Link: https://lore.kernel.org/r/20211031032411.27235-2-kishon@ti.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-17erofs: fix unsafe pagevec reuse of hooked pclustersGao Xiang
commit 86432a6dca9bed79111990851df5756d3eb5f57c upstream. There are pclusters in runtime marked with Z_EROFS_PCLUSTER_TAIL before actual I/O submission. Thus, the decompression chain can be extended if the following pcluster chain hooks such tail pcluster. As the related comment mentioned, if some page is made of a hooked pcluster and another followed pcluster, it can be reused for in-place I/O (since I/O should be submitted anyway): _______________________________________________________________ | tail (partial) page | head (partial) page | |_____PRIMARY_HOOKED___|____________PRIMARY_FOLLOWED____________| However, it's by no means safe to reuse as pagevec since if such PRIMARY_HOOKED pclusters finally move into bypass chain without I/O submission. It's somewhat hard to reproduce with LZ4 and I just found it (general protection fault) by ro_fsstressing a LZMA image for long time. I'm going to actively clean up related code together with multi-page folio adaption in the next few months. Let's address it directly for easier backporting for now. Call trace for reference: z_erofs_decompress_pcluster+0x10a/0x8a0 [erofs] z_erofs_decompress_queue.isra.36+0x3c/0x60 [erofs] z_erofs_runqueue+0x5f3/0x840 [erofs] z_erofs_readahead+0x1e8/0x320 [erofs] read_pages+0x91/0x270 page_cache_ra_unbounded+0x18b/0x240 filemap_get_pages+0x10a/0x5f0 filemap_read+0xa9/0x330 new_sync_read+0x11b/0x1a0 vfs_read+0xf1/0x190 Link: https://lore.kernel.org/r/20211103182006.4040-1-xiang@kernel.org Fixes: 3883a79abd02 ("staging: erofs: introduce VLE decompression support") Cc: <stable@vger.kernel.org> # 4.19+ Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-17f2fs: fix UAF in f2fs_available_free_memoryDongliang Mu
commit 5429c9dbc9025f9a166f64e22e3a69c94fd5b29b upstream. if2fs_fill_super -> f2fs_build_segment_manager -> create_discard_cmd_control -> f2fs_start_discard_thread It invokes kthread_run to create a thread and run issue_discard_thread. However, if f2fs_build_node_manager fails, the control flow goes to free_nm and calls f2fs_destroy_node_manager. This function will free sbi->nm_info. However, if issue_discard_thread accesses sbi->nm_info after the deallocation, but before the f2fs_stop_discard_thread, it will cause UAF(Use-after-free). -> f2fs_destroy_segment_manager -> destroy_discard_cmd_control -> f2fs_stop_discard_thread Fix this by stopping discard thread before f2fs_destroy_node_manager. Note that, the commit d6d2b491a82e1 introduces the call of f2fs_available_free_memory into issue_discard_thread. Cc: stable@vger.kernel.org Fixes: d6d2b491a82e ("f2fs: allow to change discard policy based on cached discard cmds") Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-17f2fs: include non-compressed blocks in compr_written_blockDaeho Jeong
commit 09631cf3234d32156e7cae32275f5a4144c683c5 upstream. Need to include non-compressed blocks in compr_written_block to estimate average compression ratio more accurately. Fixes: 5ac443e26a09 ("f2fs: add sysfs nodes to get runtime compression stat") Cc: stable@vger.kernel.org Signed-off-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-17f2fs: should use GFP_NOFS for directory inodesJaegeuk Kim
commit 92d602bc7177325e7453189a22e0c8764ed3453e upstream. We use inline_dentry which requires to allocate dentry page when adding a link. If we allow to reclaim memory from filesystem, we do down_read(&sbi->cp_rwsem) twice by f2fs_lock_op(). I think this should be okay, but how about stopping the lockdep complaint [1]? f2fs_create() - f2fs_lock_op() - f2fs_do_add_link() - __f2fs_find_entry - f2fs_get_read_data_page() -> kswapd - shrink_node - f2fs_evict_inode - f2fs_lock_op() [1] fs_reclaim ){+.+.}-{0:0} : kswapd0: lock_acquire+0x114/0x394 kswapd0: __fs_reclaim_acquire+0x40/0x50 kswapd0: prepare_alloc_pages+0x94/0x1ec kswapd0: __alloc_pages_nodemask+0x78/0x1b0 kswapd0: pagecache_get_page+0x2e0/0x57c kswapd0: f2fs_get_read_data_page+0xc0/0x394 kswapd0: f2fs_find_data_page+0xa4/0x23c kswapd0: find_in_level+0x1a8/0x36c kswapd0: __f2fs_find_entry+0x70/0x100 kswapd0: f2fs_do_add_link+0x84/0x1ec kswapd0: f2fs_mkdir+0xe4/0x1e4 kswapd0: vfs_mkdir+0x110/0x1c0 kswapd0: do_mkdirat+0xa4/0x160 kswapd0: __arm64_sys_mkdirat+0x24/0x34 kswapd0: el0_svc_common.llvm.17258447499513131576+0xc4/0x1e8 kswapd0: do_el0_svc+0x28/0xa0 kswapd0: el0_svc+0x24/0x38 kswapd0: el0_sync_handler+0x88/0xec kswapd0: el0_sync+0x1c0/0x200 kswapd0: -> #1 ( &sbi->cp_rwsem ){++++}-{3:3} : kswapd0: lock_acquire+0x114/0x394 kswapd0: down_read+0x7c/0x98 kswapd0: f2fs_do_truncate_blocks+0x78/0x3dc kswapd0: f2fs_truncate+0xc8/0x128 kswapd0: f2fs_evict_inode+0x2b8/0x8b8 kswapd0: evict+0xd4/0x2f8 kswapd0: iput+0x1c0/0x258 kswapd0: do_unlinkat+0x170/0x2a0 kswapd0: __arm64_sys_unlinkat+0x4c/0x68 kswapd0: el0_svc_common.llvm.17258447499513131576+0xc4/0x1e8 kswapd0: do_el0_svc+0x28/0xa0 kswapd0: el0_svc+0x24/0x38 kswapd0: el0_sync_handler+0x88/0xec kswapd0: el0_sync+0x1c0/0x200 Cc: stable@vger.kernel.org Fixes: bdbc90fa55af ("f2fs: don't put dentry page in pagecache into highmem") Reviewed-by: Chao Yu <chao@kernel.org> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Light Hsieh <light.hsieh@mediatek.com> Tested-by: Light Hsieh <light.hsieh@mediatek.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-17irqchip/sifive-plic: Fixup EOI failed when maskedGuo Ren
commit 69ea463021be0d159ab30f96195fb0dd18ee2272 upstream. When using "devm_request_threaded_irq(,,,,IRQF_ONESHOT,,)" in a driver, only the first interrupt is handled, and following interrupts are never delivered (initially reported in [1]). That's because the RISC-V PLIC cannot EOI masked interrupts, as explained in the description of Interrupt Completion in the PLIC spec [2]: <quote> The PLIC signals it has completed executing an interrupt handler by writing the interrupt ID it received from the claim to the claim/complete register. The PLIC does not check whether the completion ID is the same as the last claim ID for that target. If the completion ID does not match an interrupt source that *is currently enabled* for the target, the completion is silently ignored. </quote> Re-enable the interrupt before completion if it has been masked during the handling, and remask it afterwards. [1] http://lists.infradead.org/pipermail/linux-riscv/2021-July/007441.html [2] https://github.com/riscv/riscv-plic-spec/blob/8bc15a35d07c9edf7b5d23fec9728302595ffc4d/riscv-plic.adoc Fixes: bb0fed1c60cc ("irqchip/sifive-plic: Switch to fasteoi flow") Reported-by: Vincent Pelletier <plr.vincent@gmail.com> Tested-by: Nikita Shubin <nikita.shubin@maquefel.me> Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Cc: stable@vger.kernel.org Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Atish Patra <atish.patra@wdc.com> Reviewed-by: Anup Patel <anup@brainfault.org> [maz: amended commit message] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211105094748.3894453-1-guoren@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-17posix-cpu-timers: Clear task::posix_cputimers_work in copy_process()Michael Pratt
commit ca7752caeaa70bd31d1714af566c9809688544af upstream. copy_process currently copies task_struct.posix_cputimers_work as-is. If a timer interrupt arrives while handling clone and before dup_task_struct completes then the child task will have: 1. posix_cputimers_work.scheduled = true 2. posix_cputimers_work.work queued. copy_process clears task_struct.task_works, so (2) will have no effect and posix_cpu_timers_work will never run (not to mention it doesn't make sense for two tasks to share a common linked list). Since posix_cpu_timers_work never runs, posix_cputimers_work.scheduled is never cleared. Since scheduled is set, future timer interrupts will skip scheduling work, with the ultimate result that the task will never receive timer expirations. Together, the complete flow is: 1. Task 1 calls clone(), enters kernel. 2. Timer interrupt fires, schedules task work on Task 1. 2a. task_struct.posix_cputimers_work.scheduled = true 2b. task_struct.posix_cputimers_work.work added to task_struct.task_works. 3. dup_task_struct() copies Task 1 to Task 2. 4. copy_process() clears task_struct.task_works for Task 2. 5. Future timer interrupts on Task 2 see task_struct.posix_cputimers_work.scheduled = true and skip scheduling work. Fix this by explicitly clearing contents of task_struct.posix_cputimers_work in copy_process(). This was never meant to be shared or inherited across tasks in the first place. Fixes: 1fb497dd0030 ("posix-cpu-timers: Provide mechanisms to defer timer handling to task_work") Reported-by: Rhys Hiltner <rhys@justin.tv> Signed-off-by: Michael Pratt <mpratt@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20211101210615.716522-1-mpratt@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-17KVM: x86: move guest_pv_has out of user_access sectionPaolo Bonzini
commit 3e067fd8503d6205aa0c1c8f48f6b209c592d19c upstream. When UBSAN is enabled, the code emitted for the call to guest_pv_has includes a call to __ubsan_handle_load_invalid_value. objtool complains that this call happens with UACCESS enabled; to avoid the warning, pull the calls to user_access_begin into both arms of the "if" statement, after the check for guest_pv_has. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-17x86/mce: Add errata workaround for Skylake SKX37Dave Jones
commit e629fc1407a63dbb748f828f9814463ffc2a0af0 upstream. Errata SKX37 is word-for-word identical to the other errata listed in this workaround. I happened to notice this after investigating a CMCI storm on a Skylake host. While I can't confirm this was the root cause, spurious corrected errors does sound like a likely suspect. Fixes: 2976908e4198 ("x86/mce: Do not log spurious corrected mce errors") Signed-off-by: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20211029205759.GA7385@codemonkey.org.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-17MIPS: Fix assembly error from MIPSr2 code used within MIPS_ISA_ARCH_LEVELMaciej W. Rozycki
commit a923a2676e60683aee46aa4b93c30aff240ac20d upstream. Fix assembly errors like: {standard input}: Assembler messages: {standard input}:287: Error: opcode not supported on this processor: mips3 (mips3) `dins $10,$7,32,32' {standard input}:680: Error: opcode not supported on this processor: mips3 (mips3) `dins $10,$7,32,32' {standard input}:1274: Error: opcode not supported on this processor: mips3 (mips3) `dins $12,$9,32,32' {standard input}:2175: Error: opcode not supported on this processor: mips3 (mips3) `dins $10,$7,32,32' make[1]: *** [scripts/Makefile.build:277: mm/highmem.o] Error 1 with code produced from `__cmpxchg64' for MIPS64r2 CPU configurations using CONFIG_32BIT and CONFIG_PHYS_ADDR_T_64BIT. This is due to MIPS_ISA_ARCH_LEVEL downgrading the assembly architecture to `r4000' i.e. MIPS III for MIPS64r2 configurations, while there is a block of code containing a DINS MIPS64r2 instruction conditionalized on MIPS_ISA_REV >= 2 within the scope of the downgrade. The assembly architecture override code pattern has been put there for LL/SC instructions, so that code compiles for configurations that select a processor to build for that does not support these instructions while still providing run-time support for processors that do, dynamically switched by non-constant `cpu_has_llsc'. It went in with linux-mips.org commit aac8aa7717a2 ("Enable a suitable ISA for the assembler around ll/sc so that code builds even for processors that don't support the instructions. Plus minor formatting fixes.") back in 2005. Fix the problem by wrapping these instructions along with the adjacent SYNC instructions only, following the practice established with commit cfd54de3b0e4 ("MIPS: Avoid move psuedo-instruction whilst using MIPS_ISA_LEVEL") and commit 378ed6f0e3c5 ("MIPS: Avoid using .set mips0 to restore ISA"). Strictly speaking the SYNC instructions do not have to be wrapped as they are only used as a Loongson3 erratum workaround, so they will be enabled in the assembler by default, but do this so as to keep code consistent with other places. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Fixes: c7e2d71dda7a ("MIPS: Fix set_pte() for Netlogic XLR using cmpxchg64()") Cc: stable@vger.kernel.org # v5.1+ Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-17MIPS: fix duplicated slashes for Platform file pathMasahiro Yamada
commit cca2aac8acf470b01066f559acd7146fc4c32ae8 upstream. platform-y accumulates platform names with a slash appended. The current $(patsubst ...) ends up with doubling slashes. GNU Make still include Platform files, but in case of an error, a clumsy file path is displayed: arch/mips/loongson2ef//Platform:36: *** only binutils >= 2.20.2 have needed option -mfix-loongson2f-nop. Stop. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Jason Self <jason@bluehome.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-17parisc: Flush kernel data mapping in set_pte_at() when installing pte for ↵John David Anglin
user page commit 38860b2c8bb1b92f61396eb06a63adff916fc31d upstream. For years, there have been random segmentation faults in userspace on SMP PA-RISC machines. It occurred to me that this might be a problem in set_pte_at(). MIPS and some other architectures do cache flushes when installing PTEs with the present bit set. Here I have adapted the code in update_mmu_cache() to flush the kernel mapping when the kernel flush is deferred, or when the kernel mapping may alias with the user mapping. This simplifies calls to update_mmu_cache(). I also changed the barrier in set_pte() from a compiler barrier to a full memory barrier. I know this change is not sufficient to fix the problem. It might not be needed. I have had a few days of operation with 5.14.16 to 5.15.1 and haven't seen any random segmentation faults on rp3440 or c8000 so far. Signed-off-by: John David Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de> Cc: stable@kernel.org # 5.12+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-17parisc: Fix backtrace to always include init funtion namesHelge Deller
commit 279917e27edc293eb645a25428c6ab3f3bca3f86 upstream. I noticed that sometimes at kernel startup the backtraces did not included the function names of init functions. Their address were not resolved to function names and instead only the address was printed. Debugging shows that the culprit is is_ksym_addr() which is called by the backtrace functions to check if an address belongs to a function in the kernel. The problem occurs only for CONFIG_KALLSYMS_ALL=y. When looking at is_ksym_addr() one can see that for CONFIG_KALLSYMS_ALL=y the function only tries to resolve the address via is_kernel() function, which checks like this: if (addr >= _stext && addr <= _end) return 1; On parisc the init functions are located before _stext, so this check fails. Other platforms seem to have all functions (including init functions) behind _stext. The following patch moves the _stext symbol at the beginning of the kernel and thus includes the init section. This fixes the check and does not seem to have any negative side effects on where the kernel mapping happens in the map_pages() function in arch/parisc/mm/init.c. Signed-off-by: Helge Deller <deller@gmx.de> Cc: stable@kernel.org # 5.4+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-17ARM: 9156/1: drop cc-option fallbacks for architecture selectionArnd Bergmann
commit 418ace9992a7647c446ed3186df40cf165b67298 upstream. Naresh and Antonio ran into a build failure with latest Debian armhf compilers, with lots of output like tmp/ccY3nOAs.s:2215: Error: selected processor does not support `cpsid i' in ARM mode As it turns out, $(cc-option) fails early here when the FPU is not selected before CPU architecture is selected, as the compiler option check runs before enabling -msoft-float, which causes a problem when testing a target architecture level without an FPU: cc1: error: '-mfloat-abi=hard': selected architecture lacks an FPU Passing e.g. -march=armv6k+fp in place of -march=armv6k would avoid this issue, but the fallback logic is already broken because all supported compilers (gcc-5 and higher) are much more recent than these options, and building with -march=armv5t as a fallback no longer works. The best way forward that I see is to just remove all the checks, which also has the nice side-effect of slightly improving the startup time for 'make'. The -mtune=marvell-f option was apparently never supported by any mainline compiler, and the custom Codesourcery gcc build that did support is now too old to build kernels, so just use -mtune=xscale unconditionally for those. This should be safe to apply on all stable kernels, and will be required in order to keep building them with gcc-11 and higher. Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=996419 Reported-by: Antonio Terceiro <antonio.terceiro@linaro.org> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Reported-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Tested-by: Sebastian Reichel <sebastian.reichel@collabora.com> Tested-by: Klaus Kudielka <klaus.kudielka@gmail.com> Cc: Matthias Klose <doko@debian.org> Cc: stable@vger.kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-17ARM: 9155/1: fix early early_iounmap()Michał Mirosław
commit 0d08e7bf0d0d1a29aff7b16ef516f7415eb1aa05 upstream. Currently __set_fixmap() bails out with a warning when called in early boot from early_iounmap(). Fix it, and while at it, make the comment a bit easier to understand. Cc: <stable@vger.kernel.org> Fixes: b089c31c519c ("ARM: 8667/3: Fix memory attribute inconsistencies when using fixmap") Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-17smb3: do not error on fsync when readonlySteve French
commit 71e6864eacbef0b2645ca043cdfbac272cb6cea3 upstream. Linux allows doing a flush/fsync on a file open for read-only, but the protocol does not allow that. If the file passed in on the flush is read-only try to find a writeable handle for the same inode, if that is not possible skip sending the fsync call to the server to avoid breaking the apps. Reported-by: Julian Sikorski <belegdol@gmail.com> Tested-by: Julian Sikorski <belegdol@gmail.com> Suggested-by: Jeremy Allison <jra@samba.org> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-17selftests/net: udpgso_bench_rx: fix port argumentWillem de Bruijn
[ Upstream commit d336509cb9d03970911878bb77f0497f64fda061 ] The below commit added optional support for passing a bind address. It configures the sockaddr bind arguments before parsing options and reconfigures on options -b and -4. This broke support for passing port (-p) on its own. Configure sockaddr after parsing all arguments. Fixes: 3327a9c46352 ("selftests: add functionals test for UDP GRO") Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-17cxgb4: fix eeprom len when diagnostics not implementedRahul Lakkireddy
[ Upstream commit 4ca110bf8d9b31a60f8f8ff6706ea147d38ad97c ] Ensure diagnostics monitoring support is implemented for the SFF 8472 compliant port module and set the correct length for ethtool port module eeprom read. Fixes: f56ec6766dcf ("cxgb4: Add support for ethtool i2c dump") Signed-off-by: Manoj Malviya <manojmalviya@chelsio.com> Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>