aboutsummaryrefslogtreecommitdiffstats
path: root/include/linux
AgeCommit message (Collapse)Author
2020-05-21mmc: core: Allow host controllers to require R1B for CMD6Ulf Hansson
commit 1292e3efb149ee21d8d33d725eeed4e6b1ade963 upstream. It has turned out that some host controllers can't use R1B for CMD6 and other commands that have R1B associated with them. Therefore invent a new host cap, MMC_CAP_NEED_RSP_BUSY to let them specify this. In __mmc_switch(), let's check the flag and use it to prevent R1B responses from being converted into R1. Note that, this also means that the host are on its own, when it comes to manage the busy timeout. Suggested-by: Sowjanya Komatineni <skomatineni@nvidia.com> Cc: <stable@vger.kernel.org> Tested-by: Anders Roxell <anders.roxell@linaro.org> Tested-by: Sowjanya Komatineni <skomatineni@nvidia.com> Tested-by: Faiz Abbas <faiz_abbas@ti.com> Tested-By: Peter Geis <pgwipeout@gmail.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-05-21futex: Fix inode life-time issuePeter Zijlstra
commit 8019ad13ef7f64be44d4f892af9c840179009254 upstream. As reported by Jann, ihold() does not in fact guarantee inode persistence. And instead of making it so, replace the usage of inode pointers with a per boot, machine wide, unique inode identifier. This sequence number is global, but shared (file backed) futexes are rare enough that this should not become a performance issue. Reported-by: Jann Horn <jannh@google.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-05-21page-flags: fix a crash at SetPageError(THP_SWAP)Qian Cai
commit d72520ad004a8ce18a6ba6cde317f0081b27365a upstream. Commit bd4c82c22c36 ("mm, THP, swap: delay splitting THP after swapped out") supported writing THP to a swap device but forgot to upgrade an older commit df8c94d13c7e ("page-flags: define behavior of FS/IO-related flags on compound pages") which could trigger a crash during THP swapping out with DEBUG_VM_PGFLAGS=y, kernel BUG at include/linux/page-flags.h:317! page dumped because: VM_BUG_ON_PAGE(1 && PageCompound(page)) page:fffff3b2ec3a8000 refcount:512 mapcount:0 mapping:000000009eb0338c index:0x7f6e58200 head:fffff3b2ec3a8000 order:9 compound_mapcount:0 compound_pincount:0 anon flags: 0x45fffe0000d8454(uptodate|lru|workingset|owner_priv_1|writeback|head|reclaim|swapbacked) end_swap_bio_write() SetPageError(page) VM_BUG_ON_PAGE(1 && PageCompound(page)) <IRQ> bio_endio+0x297/0x560 dec_pending+0x218/0x430 [dm_mod] clone_endio+0xe4/0x2c0 [dm_mod] bio_endio+0x297/0x560 blk_update_request+0x201/0x920 scsi_end_request+0x6b/0x4b0 scsi_io_completion+0x509/0x7e0 scsi_finish_command+0x1ed/0x2a0 scsi_softirq_done+0x1c9/0x1d0 __blk_mqnterrupt+0xf/0x20 </IRQ> Fix by checking PF_NO_TAIL in those places instead. Fixes: bd4c82c22c36 ("mm, THP, swap: delay splitting THP after swapped out") Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: "Huang, Ying" <ying.huang@intel.com> Acked-by: Rafael Aquini <aquini@redhat.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200310235846.1319-1-cai@lca.pw Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-05-21iommu/vt-d: Fix RCU list debugging warningsAmol Grover
commit 02d715b4a8182f4887d82df82a7b83aced647760 upstream. dmar_drhd_units is traversed using list_for_each_entry_rcu() outside of an RCU read side critical section but under the protection of dmar_global_lock. Hence add corresponding lockdep expression to silence the following false-positive warnings: [ 1.603975] ============================= [ 1.603976] WARNING: suspicious RCU usage [ 1.603977] 5.5.4-stable #17 Not tainted [ 1.603978] ----------------------------- [ 1.603980] drivers/iommu/intel-iommu.c:4769 RCU-list traversed in non-reader section!! [ 1.603869] ============================= [ 1.603870] WARNING: suspicious RCU usage [ 1.603872] 5.5.4-stable #17 Not tainted [ 1.603874] ----------------------------- [ 1.603875] drivers/iommu/dmar.c:293 RCU-list traversed in non-reader section!! Tested-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com> Signed-off-by: Amol Grover <frextrite@gmail.com> Cc: stable@vger.kernel.org Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-05-21cgroup: Iterate tasks that did not finish do_exit()Michal Koutný
commit 9c974c77246460fa6a92c18554c3311c8c83c160 upstream. PF_EXITING is set earlier than actual removal from css_set when a task is exitting. This can confuse cgroup.procs readers who see no PF_EXITING tasks, however, rmdir is checking against css_set membership so it can transitionally fail with EBUSY. Fix this by listing tasks that weren't unlinked from css_set active lists. It may happen that other users of the task iterator (without CSS_TASK_ITER_PROCS) spot a PF_EXITING task before cgroup_exit(). This is equal to the state before commit c03cd7738a83 ("cgroup: Include dying leaders with live threads in PROCS iterations") but it may be reviewed later. Reported-by: Suren Baghdasaryan <surenb@google.com> Fixes: c03cd7738a83 ("cgroup: Include dying leaders with live threads in PROCS iterations") Signed-off-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-05-21net: phy: fix MDIO bus PM PHY resumingHeiner Kallweit
commit 611d779af7cad2b87487ff58e4931a90c20b113c upstream. So far we have the unfortunate situation that mdio_bus_phy_may_suspend() is called in suspend AND resume path, assuming that function result is the same. After the original change this is no longer the case, resulting in broken resume as reported by Geert. To fix this call mdio_bus_phy_may_suspend() in the suspend path only, and let the phy_device store the info whether it was suspended by MDIO bus PM. Fixes: 503ba7c69610 ("net: phy: Avoid multiple suspends") Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-05-21inet_diag: return classid for all socket typesDmitry Yakunin
commit 83f73c5bb7b9a9135173f0ba2b1aa00c06664ff9 upstream. In commit 1ec17dbd90f8 ("inet_diag: fix reporting cgroup classid and fallback to priority") croup classid reporting was fixed. But this works only for TCP sockets because for other socket types icsk parameter can be NULL and classid code path is skipped. This change moves classid handling to inet_diag_msg_attrs_fill() function. Also inet_diag_msg_attrs_size() helper was added and addends in nlmsg_new() were reordered to save order from inet_sk_diag_fill(). Fixes: 1ec17dbd90f8 ("inet_diag: fix reporting cgroup classid and fallback to priority") Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru> Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-05-21rcu: Add support for consolidated-RCU reader checkingJoel Fernandes (Google)
commit 28875945ba98d1b47a8a706812b6494d165bb0a0 upstream. This commit adds RCU-reader checks to list_for_each_entry_rcu() and hlist_for_each_entry_rcu(). These checks are optional, and are indicated by a lockdep expression passed to a new optional argument to these two macros. If this optional lockdep expression is omitted, these two macros act as before, checking for an RCU read-side critical section. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> [ paulmck: Update to eliminate return within macro and update comment. ] Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-05-21rcu: Fix irritating whitespace error in rcu_assign_pointer()Paul E. McKenney
commit b3119cde1d70d6df1574b9f26d8e087e8e5116b4 upstream. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-05-21treewide: Rename rcu_dereference_raw_notrace() to _check()Joel Fernandes (Google)
commit 0a5b99f57873e233ad42ef71e23c629f6ea1fcfe upstream. The rcu_dereference_raw_notrace() API name is confusing. It is equivalent to rcu_dereference_raw() except that it also does sparse pointer checking. There are only a few users of rcu_dereference_raw_notrace(). This patches renames all of them to be rcu_dereference_raw_check() with the "_check()" indicating sparse checking. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> [ paulmck: Fix checkpatch warnings about parentheses. ] Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-05-21signal: Extend exec_id to 64bitsEric W. Biederman
commit d1e7fd6462ca9fc76650fbe6ca800e35b24267da upstream. Replace the 32bit exec_id with a 64bit exec_id to make it impossible to wrap the exec_id counter. With care an attacker can cause exec_id wrap and send arbitrary signals to a newly exec'd parent. This bypasses the signal sending checks if the parent changes their credentials during exec. The severity of this problem can been seen that in my limited testing of a 32bit exec_id it can take as little as 19s to exec 65536 times. Which means that it can take as little as 14 days to wrap a 32bit exec_id. Adam Zabrocki has succeeded wrapping the self_exe_id in 7 days. Even my slower timing is in the uptime of a typical server. Which means self_exec_id is simply a speed bump today, and if exec gets noticably faster self_exec_id won't even be a speed bump. Extending self_exec_id to 64bits introduces a problem on 32bit architectures where reading self_exec_id is no longer atomic and can take two read instructions. Which means that is is possible to hit a window where the read value of exec_id does not match the written value. So with very lucky timing after this change this still remains expoiltable. I have updated the update of exec_id on exec to use WRITE_ONCE and the read of exec_id in do_notify_parent to use READ_ONCE to make it clear that there is no locking between these two locations. Link: https://lore.kernel.org/kernel-hardening/20200324215049.GA3710@pi3.com.pl Fixes: 2.3.23pre2 Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-05-19Merge tag 'v5.2.40' into v5.2/standard/baseBruce Ashfield
This is the 5.2.40 stable release
2020-05-15bdi: add a ->dev_name field to struct backing_dev_infoChristoph Hellwig
commit 6bd87eec23cbc9ed222bed0f5b5b02bf300e9a8d upstream. Cache a copy of the name for the life time of the backing_dev_info structure so that we can reference it even after unregistering. Fixes: 68f23b89067f ("memcg: fix a crash in wb_workfn when a device disappears") Reported-by: Yufen Yu <yuyufen@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-05-15bdi: move bdi_dev_name out of lineChristoph Hellwig
commit eb7ae5e06bb6e6ac6bb86872d27c43ebab92f6b2 upstream. bdi_dev_name is not a fast path function, move it out of line. This prepares for using it from modular callers without having to export an implementation detail like bdi_unknown_name. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-05-15blktrace: Protect q->blk_trace with RCUJan Kara
commit c780e86dd48ef6467a1146cf7d0fe1e05a635039 upstream. KASAN is reporting that __blk_add_trace() has a use-after-free issue when accessing q->blk_trace. Indeed the switching of block tracing (and thus eventual freeing of q->blk_trace) is completely unsynchronized with the currently running tracing and thus it can happen that the blk_trace structure is being freed just while __blk_add_trace() works on it. Protect accesses to q->blk_trace by RCU during tracing and make sure we wait for the end of RCU grace period when shutting down tracing. Luckily that is rare enough event that we can afford that. Note that postponing the freeing of blk_trace to an RCU callback should better be avoided as it could have unexpected user visible side-effects as debugfs files would be still existing for a short while block tracing has been shut down. Link: https://bugzilla.kernel.org/show_bug.cgi?id=205711 CC: stable@vger.kernel.org Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Tested-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reported-by: Tristan Madani <tristmd@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-05-15netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reportsJozsef Kadlecsik
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream. In the case of huge hash:* types of sets, due to the single spinlock of a set the processing of the whole set under spinlock protection could take too long. There were four places where the whole hash table of the set was processed from bucket to bucket under holding the spinlock: - During resizing a set, the original set was locked to exclude kernel side add/del element operations (userspace add/del is excluded by the nfnetlink mutex). The original set is actually just read during the resize, so the spinlocking is replaced with rcu locking of regions. However, thus there can be parallel kernel side add/del of entries. In order not to loose those operations a backlog is added and replayed after the successful resize. - Garbage collection of timed out entries was also protected by the spinlock. In order not to lock too long, region locking is introduced and a single region is processed in one gc go. Also, the simple timer based gc running is replaced with a workqueue based solution. The internal book-keeping (number of elements, size of extensions) is moved to region level due to the region locking. - Adding elements: when the max number of the elements is reached, the gc was called to evict the timed out entries. The new approach is that the gc is called just for the matching region, assuming that if the region (proportionally) seems to be full, then the whole set does. We could scan the other regions to check every entry under rcu locking, but for huge sets it'd mean a slowdown at adding elements. - Listing the set header data: when the set was defined with timeout support, the garbage collector was called to clean up timed out entries to get the correct element numbers and set size values. Now the set is scanned to check non-timed out entries, without actually calling the gc for the whole set. Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe -> SOFTIRQ-unsafe lock order issues during working on the patch. Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7") Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-05-15HID: core: increase HID report buffer size to 8KiBJohan Korsnes
commit 84a4062632462c4320704fcdf8e99e89e94c0aba upstream. We have a HID touch device that reports its opens and shorts test results in HID buffers of size 8184 bytes. The maximum size of the HID buffer is currently set to 4096 bytes, causing probe of this device to fail. With this patch we increase the maximum size of the HID buffer to 8192 bytes, making device probe and acquisition of said buffers succeed. Signed-off-by: Johan Korsnes <jkorsnes@cisco.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Armando Visconti <armando.visconti@st.com> Cc: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-05-15timers/nohz: Update NOHZ load in remote tickPeter Zijlstra (Intel)
commit ebc0f83c78a2d26384401ecf2d2fa48063c0ee27 upstream. The way loadavg is tracked during nohz only pays attention to the load upon entering nohz. This can be particularly noticeable if full nohz is entered while non-idle, and then the cpu goes idle and stays that way for a long time. Use the remote tick to ensure that full nohz cpus report their deltas within a reasonable time. [ swood: Added changelog and removed recheck of stopped tick. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Scott Wood <swood@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/1578736419-14628-3-git-send-email-swood@redhat.com Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-05-15timers/nohz: fix implicit dependency on "struct rq"Paul Gortmaker
Backports to older v5.x kernels revealed a recently introduced implicit dependency on struct rq that makes the nohz.h header no longer stand alone. This is most easily demonstrated as: $ echo '#include <linux/sched/nohz.h>' > init/main.c $ echo 'void foo(void) {}' >> init/main.c $ make init/main.o CC init/main.o In file included from init/main.c:1:0: ./include/linux/sched/nohz.h:18:35: warning: ‘struct rq’ declared inside parameter list [enabled by default] void calc_load_nohz_remote(struct rq *rq); ^ ./include/linux/sched/nohz.h:18:35: warning: its scope is only this definition or declaration, which is probably not what you want [enabled by default] Fixes: ebc0f83c78a2 ("timers/nohz: Update NOHZ load in remote tick") Cc: Peter Zijlstra <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-05-15ata: ahci: Add shutdown to freeze hardware resources of ahciPrabhakar Kushwaha
commit 10a663a1b15134a5a714aa515e11425a44d4fdf7 upstream. device_shutdown() called from reboot or power_shutdown expect all devices to be shutdown. Same is true for even ahci pci driver. As no ahci shutdown function is implemented, the ata subsystem always remains alive with DMA & interrupt support. File system related calls should not be honored after device_shutdown(). So defining ahci pci driver shutdown to freeze hardware (mask interrupt, stop DMA engine and free DMA resources). Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-05-15iommu/vt-d: Fix compile warning from intel-svm.hJoerg Roedel
commit e7598fac323aad0e502415edeffd567315994dd6 upstream. The intel_svm_is_pasid_valid() needs to be marked inline, otherwise it causes the compile warning below: CC [M] drivers/dma/idxd/cdev.o In file included from drivers/dma/idxd/cdev.c:9:0: ./include/linux/intel-svm.h:125:12: warning: ‘intel_svm_is_pasid_valid’ defined but not used [-Wunused-function] static int intel_svm_is_pasid_valid(struct device *dev, int pasid) ^~~~~~~~~~~~~~~~~~~~~~~~ Reported-by: Borislav Petkov <bp@alien8.de> Fixes: 15060aba71711 ('iommu/vt-d: Helper function to query if a pasid has any active users') Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-05-15genirq/irqdomain: Make sure all irq domain flags are distinctZenghui Yu
commit 2546287c5fb363a0165933ae2181c92f03e701d0 upstream. This was noticed when printing debugfs for MSIs on my ARM64 server. The new dstate IRQD_MSI_NOMASK_QUIRK came out surprisingly while it should only be the x86 stuff for the time being... The new MSI quirk flag uses the same bit as IRQ_DOMAIN_NAME_ALLOCATED which is oddly defined as bit 6 for no good reason. Switch it to the non used bit 1. Fixes: 6f1a4891a592 ("x86/apic/msi: Plug non-maskable MSI affinity race") Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20200221020725.2038-1-yuzenghui@huawei.com Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-05-15serdev: ttyport: restore client ops on deregistrationJohan Hovold
commit 0c5aae59270fb1f827acce182786094c9ccf598e upstream. The serdev tty-port controller driver should reset the tty-port client operations also on deregistration to avoid a NULL-pointer dereference in case the port is later re-registered as a normal tty device. Note that this can only happen with tty drivers such as 8250 which have statically allocated port structures that can end up being reused and where a later registration would not register a serdev controller (e.g. due to registration errors or if the devicetree has been changed in between). Specifically, this can be an issue for any statically defined ports that would be registered by 8250 core when an 8250 driver is being unbound. Fixes: bed35c6dfa6a ("serdev: add a tty port controller driver") Cc: stable <stable@vger.kernel.org> # 4.11 Reported-by: Loic Poulain <loic.poulain@linaro.org> Signed-off-by: Johan Hovold <johan@kernel.org> Link: https://lore.kernel.org/r/20200210145730.22762-1-johan@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-05-15regulator: core: Expose some of core functions needed by couplersDmitry Osipenko
commit d22b85a1b97d12a4940ef9d778f6122546736f78 upstream. Expose some of internal functions that are required for implementation of customized regulator couplers. Signed-off-by: Dmitry Osipenko <digetx@gmail.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-05-15regulator: core: Introduce API for regulators coupling customizationDmitry Osipenko
commit d8ca7d184b33af7913c244900df77c6cad6a5590 upstream. Right now regulator core supports only one type of regulators coupling, the "voltage max-spread" which keeps voltages of coupled regulators in a given range from each other. A more sophisticated coupling may be required in practice, one example is the NVIDIA Tegra SoCs which besides the max-spreading have other restrictions that must be adhered. Introduce API that allow platforms to provide their own customized coupling algorithms. Signed-off-by: Dmitry Osipenko <digetx@gmail.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-05-15rcu: Use WRITE_ONCE() for assignments to ->pprev for hlist_nullsPaul E. McKenney
commit 860c8802ace14c646864795e057349c9fb2d60ad upstream. Eric Dumazet supplied a KCSAN report of a bug that forces use of hlist_unhashed_lockless() from sk_unhashed(): ------------------------------------------------------------------------ BUG: KCSAN: data-race in inet_unhash / inet_unhash write to 0xffff8880a69a0170 of 8 bytes by interrupt on cpu 1: __hlist_nulls_del include/linux/list_nulls.h:88 [inline] hlist_nulls_del_init_rcu include/linux/rculist_nulls.h:36 [inline] __sk_nulls_del_node_init_rcu include/net/sock.h:676 [inline] inet_unhash+0x38f/0x4a0 net/ipv4/inet_hashtables.c:612 tcp_set_state+0xfa/0x3e0 net/ipv4/tcp.c:2249 tcp_done+0x93/0x1e0 net/ipv4/tcp.c:3854 tcp_write_err+0x7e/0xc0 net/ipv4/tcp_timer.c:56 tcp_retransmit_timer+0x9b8/0x16d0 net/ipv4/tcp_timer.c:479 tcp_write_timer_handler+0x42d/0x510 net/ipv4/tcp_timer.c:599 tcp_write_timer+0xd1/0xf0 net/ipv4/tcp_timer.c:619 call_timer_fn+0x5f/0x2f0 kernel/time/timer.c:1404 expire_timers kernel/time/timer.c:1449 [inline] __run_timers kernel/time/timer.c:1773 [inline] __run_timers kernel/time/timer.c:1740 [inline] run_timer_softirq+0xc0c/0xcd0 kernel/time/timer.c:1786 __do_softirq+0x115/0x33f kernel/softirq.c:292 invoke_softirq kernel/softirq.c:373 [inline] irq_exit+0xbb/0xe0 kernel/softirq.c:413 exiting_irq arch/x86/include/asm/apic.h:536 [inline] smp_apic_timer_interrupt+0xe6/0x280 arch/x86/kernel/apic/apic.c:1137 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:830 native_safe_halt+0xe/0x10 arch/x86/kernel/paravirt.c:71 arch_cpu_idle+0x1f/0x30 arch/x86/kernel/process.c:571 default_idle_call+0x1e/0x40 kernel/sched/idle.c:94 cpuidle_idle_call kernel/sched/idle.c:154 [inline] do_idle+0x1af/0x280 kernel/sched/idle.c:263 cpu_startup_entry+0x1b/0x20 kernel/sched/idle.c:355 start_secondary+0x208/0x260 arch/x86/kernel/smpboot.c:264 secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:241 read to 0xffff8880a69a0170 of 8 bytes by interrupt on cpu 0: sk_unhashed include/net/sock.h:607 [inline] inet_unhash+0x3d/0x4a0 net/ipv4/inet_hashtables.c:592 tcp_set_state+0xfa/0x3e0 net/ipv4/tcp.c:2249 tcp_done+0x93/0x1e0 net/ipv4/tcp.c:3854 tcp_write_err+0x7e/0xc0 net/ipv4/tcp_timer.c:56 tcp_retransmit_timer+0x9b8/0x16d0 net/ipv4/tcp_timer.c:479 tcp_write_timer_handler+0x42d/0x510 net/ipv4/tcp_timer.c:599 tcp_write_timer+0xd1/0xf0 net/ipv4/tcp_timer.c:619 call_timer_fn+0x5f/0x2f0 kernel/time/timer.c:1404 expire_timers kernel/time/timer.c:1449 [inline] __run_timers kernel/time/timer.c:1773 [inline] __run_timers kernel/time/timer.c:1740 [inline] run_timer_softirq+0xc0c/0xcd0 kernel/time/timer.c:1786 __do_softirq+0x115/0x33f kernel/softirq.c:292 invoke_softirq kernel/softirq.c:373 [inline] irq_exit+0xbb/0xe0 kernel/softirq.c:413 exiting_irq arch/x86/include/asm/apic.h:536 [inline] smp_apic_timer_interrupt+0xe6/0x280 arch/x86/kernel/apic/apic.c:1137 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:830 native_safe_halt+0xe/0x10 arch/x86/kernel/paravirt.c:71 arch_cpu_idle+0x1f/0x30 arch/x86/kernel/process.c:571 default_idle_call+0x1e/0x40 kernel/sched/idle.c:94 cpuidle_idle_call kernel/sched/idle.c:154 [inline] do_idle+0x1af/0x280 kernel/sched/idle.c:263 cpu_startup_entry+0x1b/0x20 kernel/sched/idle.c:355 rest_init+0xec/0xf6 init/main.c:452 arch_call_rest_init+0x17/0x37 start_kernel+0x838/0x85e init/main.c:786 x86_64_start_reservations+0x29/0x2b arch/x86/kernel/head64.c:490 x86_64_start_kernel+0x72/0x76 arch/x86/kernel/head64.c:471 secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:241 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.4.0-rc6+ #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 ------------------------------------------------------------------------ This commit therefore replaces C-language assignments with WRITE_ONCE() in include/linux/list_nulls.h and include/linux/rculist_nulls.h. Reported-by: Eric Dumazet <edumazet@google.com> # For KCSAN Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-05-15dmaengine: Store module owner in dma_device structLogan Gunthorpe
commit dae7a589c18a4d979d5f14b09374e871b995ceb1 upstream. dma_chan_to_owner() dereferences the driver from the struct device to obtain the owner and call module_[get|put](). However, if the backing device is unbound before the dma_device is unregistered, the driver will be cleared and this will cause a NULL pointer dereference. Instead, store a pointer to the owner module in the dma_device struct so the module reference can be properly put when the channel is put, even if the backing device was destroyed first. This change helps to support a safer unbind of DMA engines. If the dma_device is unregistered in the driver's remove function, there's no guarantee that there are no existing clients and a users action may trigger the WARN_ONCE in dma_async_device_unregister() which is unlikely to leave the system in a consistent state. Instead, a better approach is to allow the backing driver to go away and fail any subsequent requests to it. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Link: https://lore.kernel.org/r/20191216190120.21374-2-logang@deltatee.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-05-15PCI: Add nr_devfns parameter to pci_add_dma_alias()James Sewart
commit 09298542cd891b43778db1f65aa3613aa5a562eb upstream. Add a "nr_devfns" parameter to pci_add_dma_alias() so it can be used to create DMA aliases for a range of devfns. [bhelgaas: incorporate nr_devfns fix from James, update quirk_pex_vca_alias() and setup_aliases()] Signed-off-by: James Sewart <jamessewart@arista.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-05-15raid6/test: fix a compilation errorZhengyuan Liu
commit 6b8651aac1dca6140dd7fb4c9fec2736ed3f6223 upstream. The compilation error is redeclaration showed as following: In file included from ../../../include/linux/limits.h:6, from /usr/include/x86_64-linux-gnu/bits/local_lim.h:38, from /usr/include/x86_64-linux-gnu/bits/posix1_lim.h:161, from /usr/include/limits.h:183, from /usr/lib/gcc/x86_64-linux-gnu/8/include-fixed/limits.h:194, from /usr/lib/gcc/x86_64-linux-gnu/8/include-fixed/syslimits.h:7, from /usr/lib/gcc/x86_64-linux-gnu/8/include-fixed/limits.h:34, from ../../../include/linux/raid/pq.h:30, from algos.c:14: ../../../include/linux/types.h:114:15: error: conflicting types for ‘int64_t’ typedef s64 int64_t; ^~~~~~~ In file included from /usr/include/stdint.h:34, from /usr/lib/gcc/x86_64-linux-gnu/8/include/stdint.h:9, from /usr/include/inttypes.h:27, from ../../../include/linux/raid/pq.h:29, from algos.c:14: /usr/include/x86_64-linux-gnu/bits/stdint-intn.h:27:19: note: previous \ declaration of ‘int64_t’ was here typedef __int64_t int64_t; Fixes: 54d50897d544 ("linux/kernel.h: split *_MAX and *_MIN macros into <linux/limits.h>") Signed-off-by: Zhengyuan Liu <liuzhengyuan@kylinos.cn> Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-05-05Merge branch 'linux-5.2.y' into v5.2/standard/baseBruce Ashfield
2020-05-04gpio: add gpiod_toggle_active_low()Michał Mirosław
commit d3a5bcb4a17f1ad072484bb92c42519ff3aba6e1 upstream. Add possibility to toggle active-low flag of a gpio descriptor. This is useful for compatibility code, where defaults are inverted vs DT gpio flags or the active-low flag is taken from elsewhere. Acked-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Link: https://lore.kernel.org/r/7ce0338e01ad17fa5a227176813941b41a7c35c1.1576031637.git.mirq-linux@rere.qmqm.pl Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-05-04x86/apic/msi: Plug non-maskable MSI affinity raceThomas Gleixner
commit 6f1a4891a5928a5969c87fa5a584844c983ec823 upstream. Evan tracked down a subtle race between the update of the MSI message and the device raising an interrupt internally on PCI devices which do not support MSI masking. The update of the MSI message is non-atomic and consists of either 2 or 3 sequential 32bit wide writes to the PCI config space. - Write address low 32bits - Write address high 32bits (If supported by device) - Write data When an interrupt is migrated then both address and data might change, so the kernel attempts to mask the MSI interrupt first. But for MSI masking is optional, so there exist devices which do not provide it. That means that if the device raises an interrupt internally between the writes then a MSI message is sent built from half updated state. On x86 this can lead to spurious interrupts on the wrong interrupt vector when the affinity setting changes both address and data. As a consequence the device interrupt can be lost causing the device to become stuck or malfunctioning. Evan tried to handle that by disabling MSI accross an MSI message update. That's not feasible because disabling MSI has issues on its own: If MSI is disabled the PCI device is routing an interrupt to the legacy INTx mechanism. The INTx delivery can be disabled, but the disablement is not working on all devices. Some devices lose interrupts when both MSI and INTx delivery are disabled. Another way to solve this would be to enforce the allocation of the same vector on all CPUs in the system for this kind of screwed devices. That could be done, but it would bring back the vector space exhaustion problems which got solved a few years ago. Fortunately the high address (if supported by the device) is only relevant when X2APIC is enabled which implies interrupt remapping. In the interrupt remapping case the affinity setting is happening at the interrupt remapping unit and the PCI MSI message is programmed only once when the PCI device is initialized. That makes it possible to solve it with a two step update: 1) Target the MSI msg to the new vector on the current target CPU 2) Target the MSI msg to the new vector on the new target CPU In both cases writing the MSI message is only changing a single 32bit word which prevents the issue of inconsistency. After writing the final destination it is necessary to check whether the device issued an interrupt while the intermediate state #1 (new vector, current CPU) was in effect. This is possible because the affinity change is always happening on the current target CPU. The code runs with interrupts disabled, so the interrupt can be detected by checking the IRR of the local APIC. If the vector is pending in the IRR then the interrupt is retriggered on the new target CPU by sending an IPI for the associated vector on the target CPU. This can cause spurious interrupts on both the local and the new target CPU. 1) If the new vector is not in use on the local CPU and the device affected by the affinity change raised an interrupt during the transitional state (step #1 above) then interrupt entry code will ignore that spurious interrupt. The vector is marked so that the 'No irq handler for vector' warning is supressed once. 2) If the new vector is in use already on the local CPU then the IRR check might see an pending interrupt from the device which is using this vector. The IPI to the new target CPU will then invoke the handler of the device, which got the affinity change, even if that device did not issue an interrupt 3) If the new vector is in use already on the local CPU and the device affected by the affinity change raised an interrupt during the transitional state (step #1 above) then the handler of the device which uses that vector on the local CPU will be invoked. expose issues in device driver interrupt handlers which are not prepared to handle a spurious interrupt correctly. This not a regression, it's just exposing something which was already broken as spurious interrupts can happen for a lot of reasons and all driver handlers need to be able to deal with them. Reported-by: Evan Green <evgreen@chromium.org> Debugged-by: Evan Green <evgreen@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Evan Green <evgreen@chromium.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/87imkr4s7n.fsf@nanos.tec.linutronix.de Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-05-04KVM: Use vcpu-specific gva->hva translation when querying host page sizeSean Christopherson
commit f9b84e19221efc5f493156ee0329df3142085f28 upstream. Use kvm_vcpu_gfn_to_hva() when retrieving the host page size so that the correct set of memslots is used when handling x86 page faults in SMM. Fixes: 54bf36aac520 ("KVM: x86: use vcpu-specific functions to read/write/translate GFNs") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-05-04KVM: x86: Use gpa_t for cr2/gpa to fix TDP support on 32-bit KVMSean Christopherson
commit 736c291c9f36b07f8889c61764c28edce20e715d upstream. Convert a plethora of parameters and variables in the MMU and page fault flows from type gva_t to gpa_t to properly handle TDP on 32-bit KVM. Thanks to PSE and PAE paging, 32-bit kernels can access 64-bit physical addresses. When TDP is enabled, the fault address is a guest physical address and thus can be a 64-bit value, even when both KVM and its guest are using 32-bit virtual addressing, e.g. VMX's VMCS.GUEST_PHYSICAL is a 64-bit field, not a natural width field. Using a gva_t for the fault address means KVM will incorrectly drop the upper 32-bits of the GPA. Ditto for gva_to_gpa() when it is used to translate L2 GPAs to L1 GPAs. Opportunistically rename variables and parameters to better reflect the dual address modes, e.g. use "cr2_or_gpa" for fault addresses and plain "addr" instead of "vaddr" when the address may be either a GVA or an L2 GPA. Similarly, use "gpa" in the nonpaging_page_fault() flows to avoid a confusing "gpa_t gva" declaration; this also sets the stage for a future patch to combing nonpaging_page_fault() and tdp_page_fault() with minimal churn. Sprinkle in a few comments to document flows where an address is known to be a GVA and thus can be safely truncated to a 32-bit value. Add WARNs in kvm_handle_page_fault() and FNAME(gva_to_gpa_nested)() to help document such cases and detect bugs. Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-05-04percpu: Separate decrypted varaibles anytime encryption can be enabledErdem Aktas
commit 264b0d2bee148073c117e7bbbde5be7125a53be1 upstream. CONFIG_VIRTUALIZATION may not be enabled for memory encrypted guests. If disabled, decrypted per-CPU variables may end up sharing the same page with variables that should be left encrypted. Always separate per-CPU variables that should be decrypted into their own page anytime memory encryption can be enabled in the guest rather than rely on any other config option that may not be enabled. Fixes: ac26963a1175 ("percpu: Introduce DEFINE_PER_CPU_DECRYPTED") Cc: stable@vger.kernel.org # 4.15+ Signed-off-by: Erdem Aktas <erdemaktas@google.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Dennis Zhou <dennis@kernel.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-05-04eventfd: track eventfd_signal() recursion depthJens Axboe
commit b5e683d5cab8cd433b06ae178621f083cabd4f63 upstream. eventfd use cases from aio and io_uring can deadlock due to circular or resursive calling, when eventfd_signal() tries to grab the waitqueue lock. On top of that, it's also possible to construct notification chains that are deep enough that we could blow the stack. Add a percpu counter that tracks the percpu recursion depth, warn if we exceed it. The counter is also exposed so that users of eventfd_signal() can do the right thing if it's non-zero in the context where it is called. Cc: stable@vger.kernel.org # 4.19+ Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-05-04ext4: optimize case-insensitive lookupsGabriel Krisman Bertazi
commit 3ae72562ad917df36a1b1247d749240e3b4865db upstream. Temporarily cache a casefolded version of the file name under lookup in ext4_filename, to avoid repeatedly casefolding it. I got up to 30% speedup on lookups of large directories (>100k entries), depending on the length of the string under lookup. Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-05-04memcg: fix a crash in wb_workfn when a device disappearsTheodore Ts'o
commit 68f23b89067fdf187763e75a56087550624fdbee upstream. Without memcg, there is a one-to-one mapping between the bdi and bdi_writeback structures. In this world, things are fairly straightforward; the first thing bdi_unregister() does is to shutdown the bdi_writeback structure (or wb), and part of that writeback ensures that no other work queued against the wb, and that the wb is fully drained. With memcg, however, there is a one-to-many relationship between the bdi and bdi_writeback structures; that is, there are multiple wb objects which can all point to a single bdi. There is a refcount which prevents the bdi object from being released (and hence, unregistered). So in theory, the bdi_unregister() *should* only get called once its refcount goes to zero (bdi_put will drop the refcount, and when it is zero, release_bdi gets called, which calls bdi_unregister). Unfortunately, del_gendisk() in block/gen_hd.c never got the memo about the Brave New memcg World, and calls bdi_unregister directly. It does this without informing the file system, or the memcg code, or anything else. This causes the root wb associated with the bdi to be unregistered, but none of the memcg-specific wb's are shutdown. So when one of these wb's are woken up to do delayed work, they try to dereference their wb->bdi->dev to fetch the device name, but unfortunately bdi->dev is now NULL, thanks to the bdi_unregister() called by del_gendisk(). As a result, *boom*. Fortunately, it looks like the rest of the writeback path is perfectly happy with bdi->dev and bdi->owner being NULL, so the simplest fix is to create a bdi_dev_name() function which can handle bdi->dev being NULL. This also allows us to bulletproof the writeback tracepoints to prevent them from dereferencing a NULL pointer and crashing the kernel if one is tracing with memcg's enabled, and an iSCSI device dies or a USB storage stick is pulled. The most common way of triggering this will be hotremoval of a device while writeback with memcg enabled is going on. It was triggering several times a day in a heavily loaded production environment. Google Bug Id: 145475544 Link: https://lore.kernel.org/r/20191227194829.150110-1-tytso@mit.edu Link: http://lkml.kernel.org/r/20191228005211.163952-1-tytso@mit.edu Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: Chris Mason <clm@fb.com> Cc: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [PG: four less instances to rename in older v5.2 writeback.h code] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-19Merge branch 'linux-5.2.y' into v5.2/standard/baseBruce Ashfield
2020-04-16nvme-fc: Revert "add module to ops template to allow module references"James Smart
commit 8c5c660529209a0e324c1c1a35ce3f83d67a2aa5 upstream. The original patch was to resolve the lldd being able to be unloaded while being used to talk to the boot device of the system. However, the end result of the original patch is that any driver unload while a nvme controller is live via the lldd is now being prohibited. Given the module reference, the module teardown routine can't be called, thus there's no way, other than manual actions to terminate the controllers. Fixes: 863fbae929c7 ("nvme_fc: add module to ops template to allow module references") Cc: <stable@vger.kernel.org> # v5.4+ Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-16rseq: Unregister rseq for clone CLONE_VMMathieu Desnoyers
commit 463f550fb47bede3a5d7d5177f363a6c3b45d50b upstream. It has been reported by Google that rseq is not behaving properly with respect to clone when CLONE_VM is used without CLONE_THREAD. It keeps the prior thread's rseq TLS registered when the TLS of the thread has moved, so the kernel can corrupt the TLS of the parent. The approach of clearing the per task-struct rseq registration on clone with CLONE_THREAD flag is incomplete. It does not cover the use-case of clone with CLONE_VM set, but without CLONE_THREAD. Here is the rationale for unregistering rseq on clone with CLONE_VM flag set: 1) CLONE_THREAD requires CLONE_SIGHAND, which requires CLONE_VM to be set. Therefore, just checking for CLONE_VM covers all CLONE_THREAD uses. There is no point in checking for both CLONE_THREAD and CLONE_VM, 2) There is the possibility of an unlikely scenario where CLONE_SETTLS is used without CLONE_VM. In order to be an issue, it would require that the rseq TLS is in a shared memory area. I do not plan on adding CLONE_SETTLS to the set of clone flags which unregister RSEQ, because it would require that we also unregister RSEQ on set_thread_area(2) and arch_prctl(2) ARCH_SET_FS for completeness. So rather than doing a partial solution, it appears better to let user-space explicitly perform rseq unregistration across clone if needed in scenarios where CLONE_VM is not set. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20191211161713.4490-3-mathieu.desnoyers@efficios.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-16ARM: OMAP2+: SmartReflex: add omap_sr_pdata definitionBen Dooks
commit 2079fe6ea8cbd2fb2fbadba911f1eca6c362eb9b upstream. The omap_sr_pdata is not declared but is exported, so add a define for it to fix the following warning: arch/arm/mach-omap2/pdata-quirks.c:609:36: warning: symbol 'omap_sr_pdata' was not declared. Should it be static? Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-16USB: serial: ir-usb: fix link-speed handlingJohan Hovold
commit 17a0184ca17e288decdca8b2841531e34d49285f upstream. Commit e0d795e4f36c ("usb: irda: cleanup on ir-usb module") added a USB IrDA header with common defines, but mistakingly switched to using the class-descriptor baud-rate bitmask values for the outbound header. This broke link-speed handling for rates above 9600 baud, but a device would also be able to operate at the default 9600 baud until a link-speed request was issued (e.g. using the TCGETS ioctl). Fixes: e0d795e4f36c ("usb: irda: cleanup on ir-usb module") Cc: stable <stable@vger.kernel.org> # 2.6.27 Cc: Felipe Balbi <balbi@kernel.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-16netfilter: ipset: use bitmap infrastructure completelyKadlecsik József
commit 32c72165dbd0e246e69d16a3ad348a4851afd415 upstream. The bitmap allocation did not use full unsigned long sizes when calculating the required size and that was triggered by KASAN as slab-out-of-bounds read in several places. The patch fixes all of them. Reported-by: syzbot+fabca5cbf5e54f3fe2de@syzkaller.appspotmail.com Reported-by: syzbot+827ced406c9a1d9570ed@syzkaller.appspotmail.com Reported-by: syzbot+190d63957b22ef673ea5@syzkaller.appspotmail.com Reported-by: syzbot+dfccdb2bdb4a12ad425e@syzkaller.appspotmail.com Reported-by: syzbot+df0d0f5895ef1f41a65b@syzkaller.appspotmail.com Reported-by: syzbot+b08bd19bb37513357fd4@syzkaller.appspotmail.com Reported-by: syzbot+53cdd0ec0bbabd53370a@syzkaller.appspotmail.com Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-16net: rtnetlink: validate IFLA_MTU attribute in rtnl_create_link()Eric Dumazet
commit d836f5c69d87473ff65c06a6123e5b2cf5e56f5b upstream. rtnl_create_link() needs to apply dev->min_mtu and dev->max_mtu checks that we apply in do_setlink() Otherwise malicious users can crash the kernel, for example after an integer overflow : BUG: KASAN: use-after-free in memset include/linux/string.h:365 [inline] BUG: KASAN: use-after-free in __alloc_skb+0x37b/0x5e0 net/core/skbuff.c:238 Write of size 32 at addr ffff88819f20b9c0 by task swapper/0/0 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.5.0-rc1-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x197/0x210 lib/dump_stack.c:118 print_address_description.constprop.0.cold+0xd4/0x30b mm/kasan/report.c:374 __kasan_report.cold+0x1b/0x41 mm/kasan/report.c:506 kasan_report+0x12/0x20 mm/kasan/common.c:639 check_memory_region_inline mm/kasan/generic.c:185 [inline] check_memory_region+0x134/0x1a0 mm/kasan/generic.c:192 memset+0x24/0x40 mm/kasan/common.c:108 memset include/linux/string.h:365 [inline] __alloc_skb+0x37b/0x5e0 net/core/skbuff.c:238 alloc_skb include/linux/skbuff.h:1049 [inline] alloc_skb_with_frags+0x93/0x590 net/core/skbuff.c:5664 sock_alloc_send_pskb+0x7ad/0x920 net/core/sock.c:2242 sock_alloc_send_skb+0x32/0x40 net/core/sock.c:2259 mld_newpack+0x1d7/0x7f0 net/ipv6/mcast.c:1609 add_grhead.isra.0+0x299/0x370 net/ipv6/mcast.c:1713 add_grec+0x7db/0x10b0 net/ipv6/mcast.c:1844 mld_send_cr net/ipv6/mcast.c:1970 [inline] mld_ifc_timer_expire+0x3d3/0x950 net/ipv6/mcast.c:2477 call_timer_fn+0x1ac/0x780 kernel/time/timer.c:1404 expire_timers kernel/time/timer.c:1449 [inline] __run_timers kernel/time/timer.c:1773 [inline] __run_timers kernel/time/timer.c:1740 [inline] run_timer_softirq+0x6c3/0x1790 kernel/time/timer.c:1786 __do_softirq+0x262/0x98c kernel/softirq.c:292 invoke_softirq kernel/softirq.c:373 [inline] irq_exit+0x19b/0x1e0 kernel/softirq.c:413 exiting_irq arch/x86/include/asm/apic.h:536 [inline] smp_apic_timer_interrupt+0x1a3/0x610 arch/x86/kernel/apic/apic.c:1137 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:829 </IRQ> RIP: 0010:native_safe_halt+0xe/0x10 arch/x86/include/asm/irqflags.h:61 Code: 98 6b ea f9 eb 8a cc cc cc cc cc cc e9 07 00 00 00 0f 00 2d 44 1c 60 00 f4 c3 66 90 e9 07 00 00 00 0f 00 2d 34 1c 60 00 fb f4 <c3> cc 55 48 89 e5 41 57 41 56 41 55 41 54 53 e8 4e 5d 9a f9 e8 79 RSP: 0018:ffffffff89807ce8 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff13 RAX: 1ffffffff13266ae RBX: ffffffff8987a1c0 RCX: 0000000000000000 RDX: dffffc0000000000 RSI: 0000000000000006 RDI: ffffffff8987aa54 RBP: ffffffff89807d18 R08: ffffffff8987a1c0 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: dffffc0000000000 R13: ffffffff8a799980 R14: 0000000000000000 R15: 0000000000000000 arch_cpu_idle+0xa/0x10 arch/x86/kernel/process.c:690 default_idle_call+0x84/0xb0 kernel/sched/idle.c:94 cpuidle_idle_call kernel/sched/idle.c:154 [inline] do_idle+0x3c8/0x6e0 kernel/sched/idle.c:269 cpu_startup_entry+0x1b/0x20 kernel/sched/idle.c:361 rest_init+0x23b/0x371 init/main.c:451 arch_call_rest_init+0xe/0x1b start_kernel+0x904/0x943 init/main.c:784 x86_64_start_reservations+0x29/0x2b arch/x86/kernel/head64.c:490 x86_64_start_kernel+0x77/0x7b arch/x86/kernel/head64.c:471 secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:242 The buggy address belongs to the page: page:ffffea00067c82c0 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 raw: 057ffe0000000000 ffffea00067c82c8 ffffea00067c82c8 0000000000000000 raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88819f20b880: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff88819f20b900: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >ffff88819f20b980: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ^ ffff88819f20ba00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff88819f20ba80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff Fixes: 61e84623ace3 ("net: centralize net_device min/max MTU checking") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-13Merge branch 'linux-5.2.y' into v5.2/standard/baseBruce Ashfield
2020-04-13compat_ioctl: add compat_ptr_ioctl()Arnd Bergmann
commit 2952db0fd51b0890f728df94ac563c21407f4f43 upstream Many drivers have ioctl() handlers that are completely compatible between 32-bit and 64-bit architectures, except for the argument that is passed down from user space and may have to be passed through compat_ptr() in order to become a valid 64-bit pointer. Using ".compat_ptr = compat_ptr_ioctl" in file operations should let us simplify a lot of those drivers to avoid #ifdef checks, and convert additional drivers that don't have proper compat handling yet. On most architectures, the compat_ptr_ioctl() just passes all arguments to the corresponding ->ioctl handler. The exception is arch/s390, where compat_ptr() clears the top bit of a 32-bit pointer value, so user space pointers to the second 2GB alias the first 2GB, as is the case for native 32-bit s390 user space. The compat_ptr_ioctl() function must therefore be used only with ioctl functions that either ignore the argument or pass a pointer to a compatible data type. If any ioctl command handled by fops->unlocked_ioctl passes a plain integer instead of a pointer, or any of the passed data types is incompatible between 32-bit and 64-bit architectures, a proper handler is required instead of compat_ptr_ioctl. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Zhang Xiao <xiao.zhang@windriver.com> Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
2020-04-12mmc: sdio: fix wl1251 vendor idH. Nikolaus Schaller
commit e5db673e7fe2f971ec82039a28dc0811c2100e87 upstream. v4.11-rc1 did introduce a patch series that rearranged the sdio quirks into a header file. Unfortunately this did forget to handle SDIO_VENDOR_ID_TI differently between wl1251 and wl1271 with the result that although the wl1251 was found on the sdio bus, the firmware did not load any more and there was no interface registration. This patch defines separate constants to be used by sdio quirks and drivers. Fixes: 884f38607897 ("mmc: core: move some sdio IDs out of quirks file") Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com> Cc: <stable@vger.kernel.org> # v4.11+ Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-12regulator: ab8500: Remove SYSCLKREQ from enum ab8505_regulator_idStephan Gerhold
commit 458ea3ad033fc86e291712ce50cbe60c3428cf30 upstream. Those regulators are not actually supported by the AB8500 regulator driver. There is no ab8500_regulator_info for them and no entry in ab8505_regulator_match. As such, they cannot be registered successfully, and looking them up in ab8505_regulator_match causes an out-of-bounds array read. Fixes: 547f384f33db ("regulator: ab8500: add support for ab8505") Cc: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Stephan Gerhold <stephan@gerhold.net> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20191106173125.14496-2-stephan@gerhold.net Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-12netfilter: nf_tables: autoload modules from the abort pathPablo Neira Ayuso
commit eb014de4fd418de1a277913cba244e47274fe392 upstream. This patch introduces a list of pending module requests. This new module list is composed of nft_module_request objects that contain the module name and one status field that tells if the module has been already loaded (the 'done' field). In the first pass, from the preparation phase, the netlink command finds that a module is missing on this list. Then, a module request is allocated and added to this list and nft_request_module() returns -EAGAIN. This triggers the abort path with the autoload parameter set on from nfnetlink, request_module() is called and the module request enters the 'done' state. Since the mutex is released when loading modules from the abort phase, the module list is zapped so this is iteration occurs over a local list. Therefore, the request_module() calls happen when object lists are in consistent state (after fulling aborting the transaction) and the commit list is empty. On the second pass, the netlink command will find that it already tried to load the module, so it does not request it again and nft_request_module() returns 0. Then, there is a look up to find the object that the command was missing. If the module was successfully loaded, the command proceeds normally since it finds the missing object in place, otherwise -ENOENT is reported to userspace. This patch also updates nfnetlink to include the reason to enter the abort phase, which is required for this new autoload module rationale. Fixes: ec7470b834fe ("netfilter: nf_tables: store transaction list locally while requesting module") Reported-by: syzbot+29125d208b3dae9a7019@syzkaller.appspotmail.com Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>