aboutsummaryrefslogtreecommitdiffstats
path: root/include/trace
AgeCommit message (Collapse)Author
2023-05-17nohz: Add TICK_DEP_BIT_RCUFrederic Weisbecker
[ Upstream commit 01b4c39901e087ceebae2733857248de81476bd8 ] If a nohz_full CPU is looping in the kernel, the scheduling-clock tick might nevertheless remain disabled. In !PREEMPT kernels, this can prevent RCU's attempts to enlist the aid of that CPU's executions of cond_resched(), which can in turn result in an arbitrarily delayed grace period and thus an OOM. RCU therefore needs a way to enable a holdout nohz_full CPU's scheduler-clock interrupt. This commit therefore provides a new TICK_DEP_BIT_RCU value which RCU can pass to tick_dep_set_cpu() and friends to force on the scheduler-clock interrupt for a specified CPU or task. In some cases, rcutorture needs to turn on the scheduler-clock tick, so this commit also exports the relevant symbols to GPL-licensed modules. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Stable-dep-of: 58d766824264 ("tick/nohz: Fix cpu_is_hotpluggable() by checking with nohz subsystem") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-26f2fs: Fix f2fs_truncate_partial_nodes ftrace eventDouglas Raillard
[ Upstream commit 0b04d4c0542e8573a837b1d81b94209e48723b25 ] Fix the nid_t field so that its size is correctly reported in the text format embedded in trace.dat files. As it stands, it is reported as being of size 4: field:nid_t nid[3]; offset:24; size:4; signed:0; Instead of 12: field:nid_t nid[3]; offset:24; size:12; signed:0; This also fixes the reported offset of subsequent fields so that they match with the actual struct layout. Signed-off-by: Douglas Raillard <douglas.raillard@arm.com> Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-24Revert "ext4: generalize extents status tree search functions"Greg Kroah-Hartman
This reverts commit cca8671f3a7f5775a078f2676f6d1039afb925e6 which is commit ad431025aecda85d3ebef5e4a3aca5c1c681d0c7 upstream. Eric writes: I recommend not backporting this patch or the other three patches apparently intended to support it to 4.19 stable. All these patches are related to ext4's bigalloc feature, which was experimental as of 4.19 (expressly noted by contemporary versions of e2fsprogs) and also suffered from a number of bugs. A significant number of additional patches that were applied to 5.X kernels over time would have to be backported to 4.19 for the patch below to function correctly. It's really not worth doing that given bigalloc's experimental status as of 4.19 and the very rare combination of the bigalloc and inline features. Link: https://lore.kernel.org/r/Y8mAe1SlcLD5fykg@debian-BULLSEYE-live-builder-AMD64 Cc: Eric Whitney <enwlinux@gmail.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-01-24Revert "ext4: fix reserved cluster accounting at delayed write time"Greg Kroah-Hartman
This reverts commit d40e09f701cf7a44e595a558b067b2b4f67fbf87 which is commit 0b02f4c0d6d9e2c611dfbdd4317193e9dca740e6 upstream. Eric writes: I recommend not backporting this patch or the other three patches apparently intended to support it to 4.19 stable. All these patches are related to ext4's bigalloc feature, which was experimental as of 4.19 (expressly noted by contemporary versions of e2fsprogs) and also suffered from a number of bugs. A significant number of additional patches that were applied to 5.X kernels over time would have to be backported to 4.19 for the patch below to function correctly. It's really not worth doing that given bigalloc's experimental status as of 4.19 and the very rare combination of the bigalloc and inline features. Link: https://lore.kernel.org/r/Y8mAe1SlcLD5fykg@debian-BULLSEYE-live-builder-AMD64 Cc: Eric Whitney <enwlinux@gmail.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-01-18ext4: fix reserved cluster accounting at delayed write timeEric Whitney
[ Upstream commit 0b02f4c0d6d9e2c611dfbdd4317193e9dca740e6 ] The code in ext4_da_map_blocks sometimes reserves space for more delayed allocated clusters than it should, resulting in premature ENOSPC, exceeded quota, and inaccurate free space reporting. Fix this by checking for written and unwritten blocks shared in the same cluster with the newly delayed allocated block. A cluster reservation should not be made for a cluster for which physical space has already been allocated. Signed-off-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Stable-dep-of: 131294c35ed6 ("ext4: fix delayed allocation bug in ext4_clu_mapped for bigalloc + inline") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-18ext4: generalize extents status tree search functionsEric Whitney
[ Upstream commit ad431025aecda85d3ebef5e4a3aca5c1c681d0c7 ] Ext4 contains a few functions that are used to search for delayed extents or blocks in the extents status tree. Rather than duplicate code to add new functions to search for extents with different status values, such as written or a combination of delayed and unwritten, generalize the existing code to search for caller-specified extents status values. Also, move this code into extents_status.c where it is better associated with the data structures it operates upon, and where it can be more readily used to implement new extents status tree functions that might want a broader scope for i_es_lock. Three missing static specifiers in RFC version of patch reported and fixed by Fengguang Wu <fengguang.wu@intel.com>. Signed-off-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Stable-dep-of: 131294c35ed6 ("ext4: fix delayed allocation bug in ext4_clu_mapped for bigalloc + inline") Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-25spmi: trace: fix stack-out-of-bound access in SPMI tracing functionsDavid Collins
commit 2af28b241eea816e6f7668d1954f15894b45d7e3 upstream. trace_spmi_write_begin() and trace_spmi_read_end() both call memcpy() with a length of "len + 1". This leads to one extra byte being read beyond the end of the specified buffer. Fix this out-of-bound memory access by using a length of "len" instead. Here is a KASAN log showing the issue: BUG: KASAN: stack-out-of-bounds in trace_event_raw_event_spmi_read_end+0x1d0/0x234 Read of size 2 at addr ffffffc0265b7540 by task thermal@2.0-ser/1314 ... Call trace: dump_backtrace+0x0/0x3e8 show_stack+0x2c/0x3c dump_stack_lvl+0xdc/0x11c print_address_description+0x74/0x384 kasan_report+0x188/0x268 kasan_check_range+0x270/0x2b0 memcpy+0x90/0xe8 trace_event_raw_event_spmi_read_end+0x1d0/0x234 spmi_read_cmd+0x294/0x3ac spmi_ext_register_readl+0x84/0x9c regmap_spmi_ext_read+0x144/0x1b0 [regmap_spmi] _regmap_raw_read+0x40c/0x754 regmap_raw_read+0x3a0/0x514 regmap_bulk_read+0x418/0x494 adc5_gen3_poll_wait_hs+0xe8/0x1e0 [qcom_spmi_adc5_gen3] ... __arm64_sys_read+0x4c/0x60 invoke_syscall+0x80/0x218 el0_svc_common+0xec/0x1c8 ... addr ffffffc0265b7540 is located in stack of task thermal@2.0-ser/1314 at offset 32 in frame: adc5_gen3_poll_wait_hs+0x0/0x1e0 [qcom_spmi_adc5_gen3] this frame has 1 object: [32, 33) 'status' Memory state around the buggy address: ffffffc0265b7400: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 ffffffc0265b7480: 04 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 >ffffffc0265b7500: 00 00 00 00 f1 f1 f1 f1 01 f3 f3 f3 00 00 00 00 ^ ffffffc0265b7580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffffffc0265b7600: f1 f1 f1 f1 01 f2 07 f2 f2 f2 01 f3 00 00 00 00 ================================================================== Fixes: a9fce374815d ("spmi: add command tracepoints for SPMI") Cc: stable@vger.kernel.org Reviewed-by: Stephen Boyd <sboyd@kernel.org> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: David Collins <quic_collinsd@quicinc.com> Link: https://lore.kernel.org/r/20220627235512.2272783-1-quic_collinsd@quicinc.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-07-21net: sock: tracing: Fix sock_exceed_buf_limit not to dereference stale pointerSteven Rostedt (Google)
commit 820b8963adaea34a87abbecb906d1f54c0aabfb7 upstream. The trace event sock_exceed_buf_limit saves the prot->sysctl_mem pointer and then dereferences it in the TP_printk() portion. This is unsafe as the TP_printk() portion is executed at the time the buffer is read. That is, it can be seconds, minutes, days, months, even years later. If the proto is freed, then this dereference will can also lead to a kernel crash. Instead, save the sysctl_mem array into the ring buffer and have the TP_printk() reference that instead. This is the proper and safe way to read pointers in trace events. Link: https://lore.kernel.org/all/20220706052130.16368-12-kuniyu@amazon.com/ Cc: stable@vger.kernel.org Fixes: 3847ce32aea9f ("core: add tracepoints for queueing skb to rcvbuf") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-07-02ata: libata: add qc->flags in ata_qc_complete_template tracepointEdward Wu
commit 540a92bfe6dab7310b9df2e488ba247d784d0163 upstream. Add flags value to check the result of ata completion Fixes: 255c03d15a29 ("libata: Add tracepoints") Cc: stable@vger.kernel.org Signed-off-by: Edward Wu <edwardwu@realtek.com> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-25random: remove unused tracepointsJason A. Donenfeld
commit 14c174633f349cb41ea90c2c0aaddac157012f74 upstream. These explicit tracepoints aren't really used and show sign of aging. It's work to keep these up to date, and before I attempted to keep them up to date, they weren't up to date, which indicates that they're not really used. These days there are better ways of introspecting anyway. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-25random: use hash function for crng_slow_load()Jason A. Donenfeld
commit 66e4c2b9541503d721e936cc3898c9f25f4591ff upstream. Since we have a hash function that's really fast, and the goal of crng_slow_load() is reportedly to "touch all of the crng's state", we can just hash the old state together with the new state and call it a day. This way we dont need to reason about another LFSR or worry about various attacks there. This code is only ever used at early boot and then never again. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-25random: simplify entropy debitingJason A. Donenfeld
commit 9c07f57869e90140080cfc282cc628d123e27704 upstream. Our pool is 256 bits, and we only ever use all of it or don't use it at all, which is decided by whether or not it has at least 128 bits in it. So we can drastically simplify the accounting and cmpxchg loop to do exactly this. While we're at it, we move the minimum bit size into a constant so it can be shared between the two places where it matters. The reason we want any of this is for the case in which an attacker has compromised the current state, and then bruteforces small amounts of entropy added to it. By demanding a particular minimum amount of entropy be present before reseeding, we make that bruteforcing difficult. Note that this rationale no longer includes anything about /dev/random blocking at the right moment, since /dev/random no longer blocks (except for at ~boot), but rather uses the crng. In a former life, /dev/random was different and therefore required a more nuanced account(), but this is no longer. Behaviorally, nothing changes here. This is just a simplification of the code. Cc: Theodore Ts'o <tytso@mit.edu> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-25random: rather than entropy_store abstraction, use globalJason A. Donenfeld
commit 90ed1e67e896cc8040a523f8428fc02f9b164394 upstream. Originally, the RNG used several pools, so having things abstracted out over a generic entropy_store object made sense. These days, there's only one input pool, and then an uneven mix of usage via the abstraction and usage via &input_pool. Rather than this uneasy mixture, just get rid of the abstraction entirely and have things always use the global. This simplifies the code and makes reading it a bit easier. Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-25random: remove dead code left over from blocking poolEric Biggers
commit 118a4417e14348b2e46f5e467da8444ec4757a45 upstream. Remove some dead code that was left over following commit 90ea1c6436d2 ("random: remove the blocking pool"). Cc: linux-crypto@vger.kernel.org Cc: Andy Lutomirski <luto@kernel.org> Cc: Jann Horn <jannh@google.com> Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andy Lutomirski <luto@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-25random: only read from /dev/random after its pool has received 128 bitsTheodore Ts'o
commit eb9d1bf079bb438d1a066d72337092935fc770f6 upstream. Immediately after boot, we allow reads from /dev/random before its entropy pool has been fully initialized. Fix this so that we don't allow this until the blocking pool has received 128 bits. We do this by repurposing the initialized flag in the entropy pool struct, and use the initialized flag in the blocking pool to indicate whether it is safe to pull from the blocking pool. To do this, we needed to rework when we decide to push entropy from the input pool to the blocking pool, since the initialized flag for the input pool was used for this purpose. To simplify things, we no longer use the initialized flag for that purpose, nor do we use the entropy_total field any more. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-14tracing: incorrect isolate_mote_t cast in mm_vmscan_lru_isolateVasily Averin
[ Upstream commit 2b132903de7124dd9a758be0c27562e91a510848 ] Fixes following sparse warnings: CHECK mm/vmscan.c mm/vmscan.c: note: in included file (through include/trace/trace_events.h, include/trace/define_trace.h, include/trace/events/vmscan.h): ./include/trace/events/vmscan.h:281:1: sparse: warning: cast to restricted isolate_mode_t ./include/trace/events/vmscan.h:281:1: sparse: warning: restricted isolate_mode_t degrades to integer Link: https://lkml.kernel.org/r/e85d7ff2-fd10-53f8-c24e-ba0458439c1b@openvz.org Signed-off-by: Vasily Averin <vvs@openvz.org> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-26f2fs: fix up f2fs_lookup tracepointsGao Xiang
[ Upstream commit 70a9ac36ffd807ac506ed0b849f3e8ce3c6623f2 ] Fix up a misuse that the filename pointer isn't always valid in the ring buffer, and we should copy the content instead. Fixes: 0c5e36db17f5 ("f2fs: trace f2fs_lookup") Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-08-08bdi: use bdi_dev_name() to get device nameYufen Yu
[ Upstream commit d51cfc53ade3189455a1b88ec7a2ff0c24597cf8 ] Use the common interface bdi_dev_name() to get device name. Signed-off-by: Yufen Yu <yuyufen@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Add missing <linux/backing-dev.h> include BFQ Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-02-13memcg: fix a crash in wb_workfn when a device disappearsTheodore Ts'o
[ Upstream commit 68f23b89067fdf187763e75a56087550624fdbee ] Without memcg, there is a one-to-one mapping between the bdi and bdi_writeback structures. In this world, things are fairly straightforward; the first thing bdi_unregister() does is to shutdown the bdi_writeback structure (or wb), and part of that writeback ensures that no other work queued against the wb, and that the wb is fully drained. With memcg, however, there is a one-to-many relationship between the bdi and bdi_writeback structures; that is, there are multiple wb objects which can all point to a single bdi. There is a refcount which prevents the bdi object from being released (and hence, unregistered). So in theory, the bdi_unregister() *should* only get called once its refcount goes to zero (bdi_put will drop the refcount, and when it is zero, release_bdi gets called, which calls bdi_unregister). Unfortunately, del_gendisk() in block/gen_hd.c never got the memo about the Brave New memcg World, and calls bdi_unregister directly. It does this without informing the file system, or the memcg code, or anything else. This causes the root wb associated with the bdi to be unregistered, but none of the memcg-specific wb's are shutdown. So when one of these wb's are woken up to do delayed work, they try to dereference their wb->bdi->dev to fetch the device name, but unfortunately bdi->dev is now NULL, thanks to the bdi_unregister() called by del_gendisk(). As a result, *boom*. Fortunately, it looks like the rest of the writeback path is perfectly happy with bdi->dev and bdi->owner being NULL, so the simplest fix is to create a bdi_dev_name() function which can handle bdi->dev being NULL. This also allows us to bulletproof the writeback tracepoints to prevent them from dereferencing a NULL pointer and crashing the kernel if one is tracing with memcg's enabled, and an iSCSI device dies or a USB storage stick is pulled. The most common way of triggering this will be hotremoval of a device while writeback with memcg enabled is going on. It was triggering several times a day in a heavily loaded production environment. Google Bug Id: 145475544 Link: https://lore.kernel.org/r/20191227194829.150110-1-tytso@mit.edu Link: http://lkml.kernel.org/r/20191228005211.163952-1-tytso@mit.edu Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: Chris Mason <clm@fb.com> Cc: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-02-13include/trace/events/writeback.h: fix -Wstringop-truncation warningsQian Cai
[ Upstream commit d1a445d3b86c9341ce7a0954c23be0edb5c9bec5 ] There are many of those warnings. In file included from ./arch/powerpc/include/asm/paca.h:15, from ./arch/powerpc/include/asm/current.h:13, from ./include/linux/thread_info.h:21, from ./include/asm-generic/preempt.h:5, from ./arch/powerpc/include/generated/asm/preempt.h:1, from ./include/linux/preempt.h:78, from ./include/linux/spinlock.h:51, from fs/fs-writeback.c:19: In function 'strncpy', inlined from 'perf_trace_writeback_page_template' at ./include/trace/events/writeback.h:56:1: ./include/linux/string.h:260:9: warning: '__builtin_strncpy' specified bound 32 equals destination size [-Wstringop-truncation] return __builtin_strncpy(p, q, size); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fix it by using the new strscpy_pad() which was introduced in "lib/string: Add strscpy_pad() function" and will always be NUL-terminated instead of strncpy(). Also, change strlcpy() to use strscpy_pad() in this file for consistency. Link: http://lkml.kernel.org/r/1564075099-27750-1-git-send-email-cai@lca.pw Fixes: 455b2864686d ("writeback: Initial tracing support") Fixes: 028c2dd184c0 ("writeback: Add tracing to balance_dirty_pages") Fixes: e84d0a4f8e39 ("writeback: trace event writeback_queue_io") Fixes: b48c104d2211 ("writeback: trace event bdi_dirty_ratelimit") Fixes: cc1676d917f3 ("writeback: Move requeueing when I_SYNC set to writeback_sb_inodes()") Fixes: 9fb0a7da0c52 ("writeback: add more tracepoints") Signed-off-by: Qian Cai <cai@lca.pw> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Tobin C. Harding <tobin@kernel.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Dave Chinner <dchinner@redhat.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Joe Perches <joe@perches.com> Cc: Kees Cook <keescook@chromium.org> Cc: Jann Horn <jannh@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Nitin Gote <nitin.r.gote@intel.com> Cc: Rasmus Villemoes <rasmus.villemoes@prevas.dk> Cc: Stephen Kitt <steve@sk2.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-01-30writeback: Drop I_DIRTY_TIME_EXPIREJan Kara
commit 5fcd57505c002efc5823a7355e21f48dd02d5a51 upstream. The only use of I_DIRTY_TIME_EXPIRE is to detect in __writeback_single_inode() that inode got there because flush worker decided it's time to writeback the dirty inode time stamps (either because we are syncing or because of age). However we can detect this directly in __writeback_single_inode() and there's no need for the strange propagation with I_DIRTY_TIME_EXPIRE flag. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-30scsi: target: core: Add CONTROL field for trace eventsRoman Bolshakov
[ Upstream commit 7010645ba7256992818b518163f46bd4cdf8002a ] trace-cmd report doesn't show events from target subsystem because scsi_command_size() leaks through event format string: [target:target_sequencer_start] function scsi_command_size not defined [target:target_cmd_complete] function scsi_command_size not defined Addition of scsi_command_size() to plugin_scsi.c in trace-cmd doesn't help because an expression is used inside TP_printk(). trace-cmd event parser doesn't understand minus sign inside [ ]: Error: expected ']' but read '-' Rather than duplicating kernel code in plugin_scsi.c, provide a dedicated field for CONTROL byte. Link: https://lore.kernel.org/r/20200929125957.83069-1-r.bolshakov@yadro.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Roman Bolshakov <r.bolshakov@yadro.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-01sctp: move trace_sctp_probe_path into sctp_outq_sackKevin Kou
[ Upstream commit f643ee295c1c63bc117fb052d4da681354d6f732 ] The original patch bringed in the "SCTP ACK tracking trace event" feature was committed at Dec.20, 2017, it replaced jprobe usage with trace events, and bringed in two trace events, one is TRACE_EVENT(sctp_probe), another one is TRACE_EVENT(sctp_probe_path). The original patch intended to trigger the trace_sctp_probe_path in TRACE_EVENT(sctp_probe) as below code, +TRACE_EVENT(sctp_probe, + + TP_PROTO(const struct sctp_endpoint *ep, + const struct sctp_association *asoc, + struct sctp_chunk *chunk), + + TP_ARGS(ep, asoc, chunk), + + TP_STRUCT__entry( + __field(__u64, asoc) + __field(__u32, mark) + __field(__u16, bind_port) + __field(__u16, peer_port) + __field(__u32, pathmtu) + __field(__u32, rwnd) + __field(__u16, unack_data) + ), + + TP_fast_assign( + struct sk_buff *skb = chunk->skb; + + __entry->asoc = (unsigned long)asoc; + __entry->mark = skb->mark; + __entry->bind_port = ep->base.bind_addr.port; + __entry->peer_port = asoc->peer.port; + __entry->pathmtu = asoc->pathmtu; + __entry->rwnd = asoc->peer.rwnd; + __entry->unack_data = asoc->unack_data; + + if (trace_sctp_probe_path_enabled()) { + struct sctp_transport *sp; + + list_for_each_entry(sp, &asoc->peer.transport_addr_list, + transports) { + trace_sctp_probe_path(sp, asoc); + } + } + ), But I found it did not work when I did testing, and trace_sctp_probe_path had no output, I finally found that there is trace buffer lock operation(trace_event_buffer_reserve) in include/trace/trace_events.h: static notrace void \ trace_event_raw_event_##call(void *__data, proto) \ { \ struct trace_event_file *trace_file = __data; \ struct trace_event_data_offsets_##call __maybe_unused __data_offsets;\ struct trace_event_buffer fbuffer; \ struct trace_event_raw_##call *entry; \ int __data_size; \ \ if (trace_trigger_soft_disabled(trace_file)) \ return; \ \ __data_size = trace_event_get_offsets_##call(&__data_offsets, args); \ \ entry = trace_event_buffer_reserve(&fbuffer, trace_file, \ sizeof(*entry) + __data_size); \ \ if (!entry) \ return; \ \ tstruct \ \ { assign; } \ \ trace_event_buffer_commit(&fbuffer); \ } The reason caused no output of trace_sctp_probe_path is that trace_sctp_probe_path written in TP_fast_assign part of TRACE_EVENT(sctp_probe), and it will be placed( { assign; } ) after the trace_event_buffer_reserve() when compiler expands Macro, entry = trace_event_buffer_reserve(&fbuffer, trace_file, \ sizeof(*entry) + __data_size); \ \ if (!entry) \ return; \ \ tstruct \ \ { assign; } \ so trace_sctp_probe_path finally can not acquire trace_event_buffer and return no output, that is to say the nest of tracepoint entry function is not allowed. The function call flow is: trace_sctp_probe() -> trace_event_raw_event_sctp_probe() -> lock buffer -> trace_sctp_probe_path() -> trace_event_raw_event_sctp_probe_path() --nested -> buffer has been locked and return no output. This patch is to remove trace_sctp_probe_path from the TP_fast_assign part of TRACE_EVENT(sctp_probe) to avoid the nest of entry function, and trigger sctp_probe_path_trace in sctp_outq_sack. After this patch, you can enable both events individually, # cd /sys/kernel/debug/tracing # echo 1 > events/sctp/sctp_probe/enable # echo 1 > events/sctp/sctp_probe_path/enable Or, you can enable all the events under sctp. # echo 1 > events/sctp/enable Signed-off-by: Kevin Kou <qdkevin.kou@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-09-03writeback: Fix sync livelock due to b_dirty_time processingJan Kara
commit f9cae926f35e8230330f28c7b743ad088611a8de upstream. When we are processing writeback for sync(2), move_expired_inodes() didn't set any inode expiry value (older_than_this). This can result in writeback never completing if there's steady stream of inodes added to b_dirty_time list as writeback rechecks dirty lists after each writeback round whether there's more work to be done. Fix the problem by using sync(2) start time is inode expiry value when processing b_dirty_time list similarly as for ordinarily dirtied inodes. This requires some refactoring of older_than_this handling which simplifies the code noticeably as a bonus. Fixes: 0ae45f63d4ef ("vfs: add support for a lazytime mount option") CC: stable@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-22rxrpc: Fix trace stringDavid Howells
commit aadf9dcef9d4cd68c73a4ab934f93319c4becc47 upstream. The trace symbol printer (__print_symbolic()) ignores symbols that map to an empty string and prints the hex value instead. Fix the symbol for rxrpc_cong_no_change to " -" instead of "" to avoid this. Fixes: b54a134a7de4 ("rxrpc: Fix handling of enums-to-string translation in tracing") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-27rxrpc: Trace discarded ACKsDavid Howells
[ Upstream commit d1f129470e6cb79b8b97fecd12689f6eb49e27fe ] Add a tracepoint to track received ACKs that are discarded due to being outside of the Tx window. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-05-02svcrdma: Fix trace point use-after-free raceChuck Lever
commit e28b4fc652c1830796a4d3e09565f30c20f9a2cf upstream. I hit this while testing nfsd-5.7 with kernel memory debugging enabled on my server: Mar 30 13:21:45 klimt kernel: BUG: unable to handle page fault for address: ffff8887e6c279a8 Mar 30 13:21:45 klimt kernel: #PF: supervisor read access in kernel mode Mar 30 13:21:45 klimt kernel: #PF: error_code(0x0000) - not-present page Mar 30 13:21:45 klimt kernel: PGD 3601067 P4D 3601067 PUD 87c519067 PMD 87c3e2067 PTE 800ffff8193d8060 Mar 30 13:21:45 klimt kernel: Oops: 0000 [#1] SMP DEBUG_PAGEALLOC PTI Mar 30 13:21:45 klimt kernel: CPU: 2 PID: 1933 Comm: nfsd Not tainted 5.6.0-rc6-00040-g881e87a3c6f9 #1591 Mar 30 13:21:45 klimt kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 1.0c 09/09/2015 Mar 30 13:21:45 klimt kernel: RIP: 0010:svc_rdma_post_chunk_ctxt+0xab/0x284 [rpcrdma] Mar 30 13:21:45 klimt kernel: Code: c1 83 34 02 00 00 29 d0 85 c0 7e 72 48 8b bb a0 02 00 00 48 8d 54 24 08 4c 89 e6 48 8b 07 48 8b 40 20 e8 5a 5c 2b e1 41 89 c6 <8b> 45 20 89 44 24 04 8b 05 02 e9 01 00 85 c0 7e 33 e9 5e 01 00 00 Mar 30 13:21:45 klimt kernel: RSP: 0018:ffffc90000dfbdd8 EFLAGS: 00010286 Mar 30 13:21:45 klimt kernel: RAX: 0000000000000000 RBX: ffff8887db8db400 RCX: 0000000000000030 Mar 30 13:21:45 klimt kernel: RDX: 0000000000000040 RSI: 0000000000000000 RDI: 0000000000000246 Mar 30 13:21:45 klimt kernel: RBP: ffff8887e6c27988 R08: 0000000000000000 R09: 0000000000000004 Mar 30 13:21:45 klimt kernel: R10: ffffc90000dfbdd8 R11: 00c068ef00000000 R12: ffff8887eb4e4a80 Mar 30 13:21:45 klimt kernel: R13: ffff8887db8db634 R14: 0000000000000000 R15: ffff8887fc931000 Mar 30 13:21:45 klimt kernel: FS: 0000000000000000(0000) GS:ffff88885bd00000(0000) knlGS:0000000000000000 Mar 30 13:21:45 klimt kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Mar 30 13:21:45 klimt kernel: CR2: ffff8887e6c279a8 CR3: 000000081b72e002 CR4: 00000000001606e0 Mar 30 13:21:45 klimt kernel: Call Trace: Mar 30 13:21:45 klimt kernel: ? svc_rdma_vec_to_sg+0x7f/0x7f [rpcrdma] Mar 30 13:21:45 klimt kernel: svc_rdma_send_write_chunk+0x59/0xce [rpcrdma] Mar 30 13:21:45 klimt kernel: svc_rdma_sendto+0xf9/0x3ae [rpcrdma] Mar 30 13:21:45 klimt kernel: ? nfsd_destroy+0x51/0x51 [nfsd] Mar 30 13:21:45 klimt kernel: svc_send+0x105/0x1e3 [sunrpc] Mar 30 13:21:45 klimt kernel: nfsd+0xf2/0x149 [nfsd] Mar 30 13:21:45 klimt kernel: kthread+0xf6/0xfb Mar 30 13:21:45 klimt kernel: ? kthread_queue_delayed_work+0x74/0x74 Mar 30 13:21:45 klimt kernel: ret_from_fork+0x3a/0x50 Mar 30 13:21:45 klimt kernel: Modules linked in: ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue ib_umad ib_ipoib mlx4_ib sb_edac x86_pkg_temp_thermal iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel glue_helper crypto_simd cryptd pcspkr rpcrdma i2c_i801 rdma_ucm lpc_ich mfd_core ib_iser rdma_cm iw_cm ib_cm mei_me raid0 libiscsi mei sg scsi_transport_iscsi ioatdma wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter nfsd nfs_acl lockd auth_rpcgss grace sunrpc ip_tables xfs libcrc32c mlx4_en sd_mod sr_mod cdrom mlx4_core crc32c_intel igb nvme i2c_algo_bit ahci i2c_core libahci nvme_core dca libata t10_pi qedr dm_mirror dm_region_hash dm_log dm_mod dax qede qed crc8 ib_uverbs ib_core Mar 30 13:21:45 klimt kernel: CR2: ffff8887e6c279a8 Mar 30 13:21:45 klimt kernel: ---[ end trace 87971d2ad3429424 ]--- It's absolutely not safe to use resources pointed to by the @send_wr argument of ib_post_send() _after_ that function returns. Those resources are typically freed by the Send completion handler, which can run before ib_post_send() returns. Thus the trace points currently around ib_post_send() in the server's RPC/RDMA transport are a hazard, even when they are disabled. Rearrange them so that they touch the Work Request only _before_ ib_post_send() is invoked. Fixes: bd2abef33394 ("svcrdma: Trace key RDMA API events") Fixes: 4201c7464753 ("svcrdma: Introduce svc_rdma_send_ctxt") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-02afs: Fix some tracing detailsDavid Howells
commit 4636cf184d6d9a92a56c2554681ea520dd4fe49a upstream. Fix a couple of tracelines to indicate the usage count after the atomic op, not the usage count before it to be consistent with other afs and rxrpc trace lines. Change the wording of the afs_call_trace_work trace ID label from "WORK" to "QUEUE" to reflect the fact that it's queueing work, not doing work. Fixes: 341f741f04be ("afs: Refcount the afs_call struct") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-29tracing: xen: Ordered comparison of function pointersChangbin Du
commit d0695e2351102affd8efae83989056bc4b275917 upstream. Just as commit 0566e40ce7 ("tracing: initcall: Ordered comparison of function pointers"), this patch fixes another remaining one in xen.h found by clang-9. In file included from arch/x86/xen/trace.c:21: In file included from ./include/trace/events/xen.h:475: In file included from ./include/trace/define_trace.h:102: In file included from ./include/trace/trace_events.h:473: ./include/trace/events/xen.h:69:7: warning: ordered comparison of function \ pointers ('xen_mc_callback_fn_t' (aka 'void (*)(void *)') and 'xen_mc_callback_fn_t') [-Wordered-compare-function-pointers] __field(xen_mc_callback_fn_t, fn) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ./include/trace/trace_events.h:421:29: note: expanded from macro '__field' ^ ./include/trace/trace_events.h:407:6: note: expanded from macro '__field_ext' is_signed_type(type), filter_type); \ ^ ./include/linux/trace_events.h:554:44: note: expanded from macro 'is_signed_type' ^ Fixes: c796f213a6934 ("xen/trace: add multicall tracing") Signed-off-by: Changbin Du <changbin.du@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-27rxrpc: Fix trace-after-put looking at the put connection recordDavid Howells
[ Upstream commit 4c1295dccc0afe0905b6ca4c62ade7f2406f2cfb ] rxrpc_put_*conn() calls trace_rxrpc_conn() after they have done the decrement of the refcount - which looks at the debug_id in the connection record. But unless the refcount was reduced to zero, we no longer have the right to look in the record and, indeed, it may be deleted by some other thread. Fix this by getting the debug_id out before decrementing the refcount and then passing that into the tracepoint. Fixes: 363deeab6d0f ("rxrpc: Add connection tracepoint and client conn state tracepoint") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-14tracing: Change offset type to s32 in preempt/irq tracepointsJoel Fernandes (Google)
commit bf44f488e168368cae4139b4b33c3d0aaa11679c upstream. Discussion in the below link reported that symbols in modules can appear to be before _stext on ARM architecture, causing wrapping with the offsets of this tracepoint. Change the offset type to s32 to fix this. Link: http://lore.kernel.org/r/20191127154428.191095-1-antonio.borneo@st.com Link: http://lkml.kernel.org/r/20200102194625.226436-1-joel@joelfernandes.org Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: David Sterba <dsterba@suse.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Antonio Borneo <antonio.borneo@st.com> Cc: stable@vger.kernel.org Fixes: d59158162e032 ("tracing: Add support for preempt and irq enable/disable events") Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-31block: Fix writeback throttling W=1 compiler warningsBart Van Assche
[ Upstream commit 1d200e9d6f635ae894993a7d0f1b9e0b6e522e3b ] Fix the following compiler warnings: In file included from ./include/linux/bitmap.h:9, from ./include/linux/cpumask.h:12, from ./arch/x86/include/asm/cpumask.h:5, from ./arch/x86/include/asm/msr.h:11, from ./arch/x86/include/asm/processor.h:21, from ./arch/x86/include/asm/cpufeature.h:5, from ./arch/x86/include/asm/thread_info.h:53, from ./include/linux/thread_info.h:38, from ./arch/x86/include/asm/preempt.h:7, from ./include/linux/preempt.h:78, from ./include/linux/spinlock.h:51, from ./include/linux/mmzone.h:8, from ./include/linux/gfp.h:6, from ./include/linux/mm.h:10, from ./include/linux/bvec.h:13, from ./include/linux/blk_types.h:10, from block/blk-wbt.c:23: In function 'strncpy', inlined from 'perf_trace_wbt_stat' at ./include/trace/events/wbt.h:15:1: ./include/linux/string.h:260:9: warning: '__builtin_strncpy' specified bound 32 equals destination size [-Wstringop-truncation] return __builtin_strncpy(p, q, size); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In function 'strncpy', inlined from 'perf_trace_wbt_lat' at ./include/trace/events/wbt.h:58:1: ./include/linux/string.h:260:9: warning: '__builtin_strncpy' specified bound 32 equals destination size [-Wstringop-truncation] return __builtin_strncpy(p, q, size); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In function 'strncpy', inlined from 'perf_trace_wbt_step' at ./include/trace/events/wbt.h:87:1: ./include/linux/string.h:260:9: warning: '__builtin_strncpy' specified bound 32 equals destination size [-Wstringop-truncation] return __builtin_strncpy(p, q, size); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In function 'strncpy', inlined from 'perf_trace_wbt_timer' at ./include/trace/events/wbt.h:126:1: ./include/linux/string.h:260:9: warning: '__builtin_strncpy' specified bound 32 equals destination size [-Wstringop-truncation] return __builtin_strncpy(p, q, size); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In function 'strncpy', inlined from 'trace_event_raw_event_wbt_stat' at ./include/trace/events/wbt.h:15:1: ./include/linux/string.h:260:9: warning: '__builtin_strncpy' specified bound 32 equals destination size [-Wstringop-truncation] return __builtin_strncpy(p, q, size); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In function 'strncpy', inlined from 'trace_event_raw_event_wbt_lat' at ./include/trace/events/wbt.h:58:1: ./include/linux/string.h:260:9: warning: '__builtin_strncpy' specified bound 32 equals destination size [-Wstringop-truncation] return __builtin_strncpy(p, q, size); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In function 'strncpy', inlined from 'trace_event_raw_event_wbt_timer' at ./include/trace/events/wbt.h:126:1: ./include/linux/string.h:260:9: warning: '__builtin_strncpy' specified bound 32 equals destination size [-Wstringop-truncation] return __builtin_strncpy(p, q, size); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In function 'strncpy', inlined from 'trace_event_raw_event_wbt_step' at ./include/trace/events/wbt.h:87:1: ./include/linux/string.h:260:9: warning: '__builtin_strncpy' specified bound 32 equals destination size [-Wstringop-truncation] return __builtin_strncpy(p, q, size); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Cc: Christoph Hellwig <hch@infradead.org> Cc: Ming Lei <ming.lei@redhat.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Fixes: e34cbd307477 ("blk-wbt: add general throttling mechanism"; v4.10). Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-05xprtrdma: Prevent leak of rpcrdma_rep objectsChuck Lever
[ Upstream commit 07e10308ee5da8e6132e0b737ece1c99dd651fb6 ] If a reply has been processed but the RPC is later retransmitted anyway, the req->rl_reply field still contains the only pointer to the old rpcrdma rep. When the next reply comes in, the reply handler will stomp on the rl_reply field, leaking the old rep. A trace event is added to capture such leaks. This problem seems to be worsened by the restructuring of the RPC Call path in v4.20. Fully addressing this issue will require at least a re-architecture of the disconnect logic, which is not appropriate during -rc. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-20sched/debug: Use symbolic names for task state constantsUwe Kleine-König
[ Upstream commit ff28915fd31ccafc0d38e6f84b66df280ed9e86a ] include/trace/events/sched.h includes <linux/sched.h> (via <linux/sched/numa_balancing.h>) and so knows about the TASK_* constants used to interpret .prev_state. So instead of duplicating the magic numbers make use of the defined macros to ease understanding the mapping from state bits to letters which isn't completely intuitive for an outsider. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel@pengutronix.de Link: http://lkml.kernel.org/r/20180905093636.24068-1-u.kleine-koenig@pengutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-06rxrpc: Fix trace-after-put looking at the put peer recordDavid Howells
commit 55f6c98e3674ce16038a1949c3f9ca5a9a99f289 upstream. rxrpc_put_peer() calls trace_rxrpc_peer() after it has done the decrement of the refcount - which looks at the debug_id in the peer record. But unless the refcount was reduced to zero, we no longer have the right to look in the record and, indeed, it may be deleted by some other thread. Fix this by getting the debug_id out before decrementing the refcount and then passing that into the tracepoint. This can cause the following symptoms: BUG: KASAN: use-after-free in __rxrpc_put_peer net/rxrpc/peer_object.c:411 [inline] BUG: KASAN: use-after-free in rxrpc_put_peer+0x685/0x6a0 net/rxrpc/peer_object.c:435 Read of size 8 at addr ffff888097ec0058 by task syz-executor823/24216 Fixes: 1159d4b496f5 ("rxrpc: Add a tracepoint to track rxrpc_peer refcounting") Reported-by: syzbot+b9be979c55f2bea8ed30@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-29btrfs: tracepoints: Fix bad entry members of qgroup eventsQu Wenruo
commit 1b2442b4ae0f234daeadd90e153b466332c466d8 upstream. [BUG] For btrfs:qgroup_meta_reserve event, the trace event can output garbage: qgroup_meta_reserve: 9c7f6acc-b342-4037-bc47-7f6e4d2232d7: refroot=5(FS_TREE) type=DATA diff=2 qgroup_meta_reserve: 9c7f6acc-b342-4037-bc47-7f6e4d2232d7: refroot=5(FS_TREE) type=0x258792 diff=2 The @type can be completely garbage, as DATA type is not possible for trace_qgroup_meta_reserve() trace event. [CAUSE] Ther are several problems related to qgroup trace events: - Unassigned entry member Member entry::type of trace_qgroup_update_reserve() and trace_qgourp_meta_reserve() is not assigned - Redundant entry member Member entry::type is completely useless in trace_qgroup_meta_convert() Fixes: 4ee0d8832c2e ("btrfs: qgroup: Update trace events for metadata reservation") CC: stable@vger.kernel.org # 4.10+ Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-07rxrpc: Fix rxrpc_recvmsg tracepointDavid Howells
[ Upstream commit db9b2e0af605e7c994784527abfd9276cabd718a ] Fix the rxrpc_recvmsg tracepoint to handle being called with a NULL call parameter. Fixes: a25e21f0bcd2 ("rxrpc, afs: Use debug_ids rather than pointers in traces") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-29rxrpc: Fix read-after-free in rxrpc_queue_local()David Howells
commit 06d9532fa6b34f12a6d75711162d47c17c1add72 upstream. rxrpc_queue_local() attempts to queue the local endpoint it is given and then, if successful, prints a trace line. The trace line includes the current usage count - but we're not allowed to look at the local endpoint at this point as we passed our ref on it to the workqueue. Fix this by reading the usage count before queuing the work item. Also fix the reading of local->debug_id for trace lines, which must be done with the same consideration as reading the usage count. Fixes: 09d2bf595db4 ("rxrpc: Add a tracepoint to track rxrpc_local refcounting") Reported-by: syzbot+78e71c5bab4f76a6a719@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-26rxrpc: Fix oops in tracepointDavid Howells
[ Upstream commit 99f0eae653b2db64917d0b58099eb51e300b311d ] If the rxrpc_eproto tracepoint is enabled, an oops will be cause by the trace line that rxrpc_extract_header() tries to emit when a protocol error occurs (typically because the packet is short) because the call argument is NULL. Fix this by using ?: to assume 0 as the debug_id if call is NULL. This can then be induced by: echo -e '\0\0\0\0\0\0\0\0' | ncat -4u --send-only <addr> 20001 where addr has the following program running on it: #include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include <sys/socket.h> #include <arpa/inet.h> #include <linux/rxrpc.h> int main(void) { struct sockaddr_rxrpc srx; int fd; memset(&srx, 0, sizeof(srx)); srx.srx_family = AF_RXRPC; srx.srx_service = 0; srx.transport_type = AF_INET; srx.transport_len = sizeof(srx.transport.sin); srx.transport.sin.sin_family = AF_INET; srx.transport.sin.sin_port = htons(0x4e21); fd = socket(AF_RXRPC, SOCK_DGRAM, AF_INET6); bind(fd, (struct sockaddr *)&srx, sizeof(srx)); sleep(20); return 0; } It results in the following oops. BUG: kernel NULL pointer dereference, address: 0000000000000340 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page ... RIP: 0010:trace_event_raw_event_rxrpc_rx_eproto+0x47/0xac ... Call Trace: <IRQ> rxrpc_extract_header+0x86/0x171 ? rcu_read_lock_sched_held+0x5d/0x63 ? rxrpc_new_skb+0xd4/0x109 rxrpc_input_packet+0xef/0x14fc ? rxrpc_input_data+0x986/0x986 udp_queue_rcv_one_skb+0xbf/0x3d0 udp_unicast_rcv_skb.isra.8+0x64/0x71 ip_protocol_deliver_rcu+0xe4/0x1b4 ip_local_deliver+0xf0/0x154 __netif_receive_skb_one_core+0x50/0x6c netif_receive_skb_internal+0x26b/0x2e9 napi_gro_receive+0xf8/0x1da rtl8169_poll+0x303/0x4c4 net_rx_action+0x10e/0x333 __do_softirq+0x1a5/0x38f irq_exit+0x54/0xc4 do_IRQ+0xda/0xf8 common_interrupt+0xf/0xf </IRQ> ... ? cpuidle_enter_state+0x23c/0x34d cpuidle_enter+0x2a/0x36 do_idle+0x163/0x1ea cpu_startup_entry+0x1d/0x1f start_secondary+0x157/0x172 secondary_startup_64+0xa4/0xb0 Fixes: a25e21f0bcd2 ("rxrpc, afs: Use debug_ids rather than pointers in traces") Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-20rxrpc: Fix client call connect/disconnect raceDavid Howells
[ Upstream commit 930c9f9125c85b5134b3e711bc252ecc094708e3 ] rxrpc_disconnect_client_call() reads the call's connection ID protocol value (call->cid) as part of that function's variable declarations. This is bad because it's not inside the locked section and so may race with someone granting use of the channel to the call. This manifests as an assertion failure (see below) where the call in the presumed channel (0 because call->cid wasn't set when we read it) doesn't match the call attached to the channel we were actually granted (if 1, 2 or 3). Fix this by moving the read and dependent calculations inside of the channel_lock section. Also, only set the channel number and pointer variables if cid is not zero (ie. unset). This problem can be induced by injecting an occasional error in rxrpc_wait_for_channel() before the call to schedule(). Make two further changes also: (1) Add a trace for wait failure in rxrpc_connect_call(). (2) Drop channel_lock before BUG'ing in the case of the assertion failure. The failure causes a trace akin to the following: rxrpc: Assertion failed - 18446612685268945920(0xffff8880beab8c00) == 18446612685268621312(0xffff8880bea69800) is false ------------[ cut here ]------------ kernel BUG at net/rxrpc/conn_client.c:824! ... RIP: 0010:rxrpc_disconnect_client_call+0x2bf/0x99d ... Call Trace: rxrpc_connect_call+0x902/0x9b3 ? wake_up_q+0x54/0x54 rxrpc_new_client_call+0x3a0/0x751 ? rxrpc_kernel_begin_call+0x141/0x1bc ? afs_alloc_call+0x1b5/0x1b5 rxrpc_kernel_begin_call+0x141/0x1bc afs_make_call+0x20c/0x525 ? afs_alloc_call+0x1b5/0x1b5 ? __lock_is_held+0x40/0x71 ? lockdep_init_map+0xaf/0x193 ? lockdep_init_map+0xaf/0x193 ? __lock_is_held+0x40/0x71 ? yfs_fs_fetch_data+0x33b/0x34a yfs_fs_fetch_data+0x33b/0x34a afs_fetch_data+0xdc/0x3b7 afs_read_dir+0x52d/0x97f afs_dir_iterate+0xa0/0x661 ? iterate_dir+0x63/0x141 iterate_dir+0xa2/0x141 ksys_getdents64+0x9f/0x11b ? filldir+0x111/0x111 ? do_syscall_64+0x3e/0x1a0 __x64_sys_getdents64+0x16/0x19 do_syscall_64+0x7d/0x1a0 entry_SYSCALL_64_after_hwframe+0x49/0xbe Fixes: 45025bceef17 ("rxrpc: Improve management and caching of client connection objects") Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-16sunrpc: use-after-free in svc_process_common()Vasily Averin
commit d4b09acf924b84bae77cad090a9d108e70b43643 upstream. if node have NFSv41+ mounts inside several net namespaces it can lead to use-after-free in svc_process_common() svc_process_common() /* Setup reply header */ rqstp->rq_xprt->xpt_ops->xpo_prep_reply_hdr(rqstp); <<< HERE svc_process_common() can use incorrect rqstp->rq_xprt, its caller function bc_svc_process() takes it from serv->sv_bc_xprt. The problem is that serv is global structure but sv_bc_xprt is assigned per-netnamespace. According to Trond, the whole "let's set up rqstp->rq_xprt for the back channel" is nothing but a giant hack in order to work around the fact that svc_process_common() uses it to find the xpt_ops, and perform a couple of (meaningless for the back channel) tests of xpt_flags. All we really need in svc_process_common() is to be able to run rqstp->rq_xprt->xpt_ops->xpo_prep_reply_hdr() Bruce J Fields points that this xpo_prep_reply_hdr() call is an awfully roundabout way just to do "svc_putnl(resv, 0);" in the tcp case. This patch does not initialiuze rqstp->rq_xprt in bc_svc_process(), now it calls svc_process_common() with rqstp->rq_xprt = NULL. To adjust reply header svc_process_common() just check rqstp->rq_prot and calls svc_tcp_prep_reply_hdr() for tcp case. To handle rqstp->rq_xprt = NULL case in functions called from svc_process_common() patch intruduces net namespace pointer svc_rqst->rq_bc_net and adjust SVC_NET() definition. Some other function was also adopted to properly handle described case. Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Cc: stable@vger.kernel.org Fixes: 23c20ecd4475 ("NFS: callback up - users counting cleanup") Signed-off-by: J. Bruce Fields <bfields@redhat.com> v2: added lost extern svc_tcp_prep_reply_hdr() Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-09ext4: force inode writes when nfsd calls commit_metadata()Theodore Ts'o
commit fde872682e175743e0c3ef939c89e3c6008a1529 upstream. Some time back, nfsd switched from calling vfs_fsync() to using a new commit_metadata() hook in export_operations(). If the file system did not provide a commit_metadata() hook, it fell back to using sync_inode_metadata(). Unfortunately doesn't work on all file systems. In particular, it doesn't work on ext4 due to how the inode gets journalled --- the VFS writeback code will not always call ext4_write_inode(). So we need to provide our own ext4_nfs_commit_metdata() method which calls ext4_write_inode() directly. Google-Bug-Id: 121195940 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-08sched, trace: Fix prev_state output in sched_switch tracepointPavankumar Kondeti
commit 3054426dc68e5d63aa6a6e9b91ac4ec78e3f3805 upstream. commit 3f5fe9fef5b2 ("sched/debug: Fix task state recording/printout") tried to fix the problem introduced by a previous commit efb40f588b43 ("sched/tracing: Fix trace_sched_switch task-state printing"). However the prev_state output in sched_switch is still broken. task_state_index() uses fls() which considers the LSB as 1. Left shifting 1 by this value gives an incorrect mapping to the task state. Fix this by decrementing the value returned by __get_task_state() before shifting. Link: http://lkml.kernel.org/r/1540882473-1103-1-git-send-email-pkondeti@codeaurora.org Cc: stable@vger.kernel.org Fixes: 3f5fe9fef5b2 ("sched/debug: Fix task state recording/printout") Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-10Merge tag 'rxrpc-fixes-20181008' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc: Fix packet reception code Here are a set of patches that prepares for and fix problems in rxrpc's package reception code. There serious problems are: (A) There's a window between binding the socket and setting the data_ready hook in which packets can find their way into the UDP socket's receive queues. (B) The skb_recv_udp() will return an error (and clear the error state) if there was an error on the Tx side. rxrpc doesn't handle this. (C) The rxrpc data_ready handler doesn't fully drain the UDP receive queue. (D) The rxrpc data_ready handler assumes it is called in a non-reentrant state. The second patch fixes (A) - (C); the third patch renders (B) and (C) non-issues by using the recap_rcv hook instead of data_ready - and the final patch fixes (D). That last is the most complex. The preparatory patches are: (1) Fix some places that are doing things in the wrong net namespace. (2) Stop taking the rcu read lock as it's held by the IP input routine in the call chain. (3) Only end the Tx phase if *we* rotated the final packet out of the Tx buffer. (4) Don't assume that the call state won't change after dropping the call_state lock. (5) Only take receive window and MTU suze parameters from an ACK packet if it's the latest ACK packet. (6) Record connection-level abort information correctly. (7) Fix a trace line. And then there are three main patches - note that these are mixed in with the preparatory patches somewhat: (1) Fix the setup window (A), skb_recv_udp() error check (B) and packet drainage (C). (2) Switch to using the encap_rcv instead of data_ready to cut out the effects of the UDP read queues and get the packets delivered directly. (3) Add more locking into the various packet input paths to defend against re-entrance (D). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08rxrpc: Fix the rxrpc_tx_packet trace lineDavid Howells
Fix the rxrpc_tx_packet trace line by storing the where parameter. Fixes: 4764c0da69dc ("rxrpc: Trace packet transmission") Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-05Merge branch 'sched-urgent-for-linus' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Ingo writes: "scheduler fixes: These fixes address a rather involved performance regression between v4.17->v4.19 in the sched/numa auto-balancing code. Since distros really need this fix we accelerated it to sched/urgent for a faster upstream merge. NUMA scheduling and balancing performance is now largely back to v4.17 levels, without reintroducing the NUMA placement bugs that v4.18 and v4.19 fixed. Many thanks to Srikar Dronamraju, Mel Gorman and Jirka Hladky, for reporting, testing, re-testing and solving this rather complex set of bugs." * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/numa: Migrate pages to local nodes quicker early in the lifetime of a task mm, sched/numa: Remove rate-limiting of automatic NUMA balancing migration sched/numa: Avoid task migration for small NUMA improvement mm/migrate: Use spin_trylock() while resetting rate limit sched/numa: Limit the conditions where scan period is reset sched/numa: Reset scan rate whenever task moves across nodes sched/numa: Pass destination CPU as a parameter to migrate_task_rq sched/numa: Stop multiple tasks from moving to the CPU at the same time
2018-10-02mm, sched/numa: Remove rate-limiting of automatic NUMA balancing migrationMel Gorman
Rate limiting of page migrations due to automatic NUMA balancing was introduced to mitigate the worst-case scenario of migrating at high frequency due to false sharing or slowly ping-ponging between nodes. Since then, a lot of effort was spent on correctly identifying these pages and avoiding unnecessary migrations and the safety net may no longer be required. Jirka Hladky reported a regression in 4.17 due to a scheduler patch that avoids spreading STREAM tasks wide prematurely. However, once the task was properly placed, it delayed migrating the memory due to rate limiting. Increasing the limit fixed the problem for him. Currently, the limit is hard-coded and does not account for the real capabilities of the hardware. Even if an estimate was attempted, it would not properly account for the number of memory controllers and it could not account for the amount of bandwidth used for normal accesses. Rather than fudging, this patch simply eliminates the rate limiting. However, Jirka reports that a STREAM configuration using multiple processes achieved similar performance to 4.16. In local tests, this patch improved performance of STREAM relative to the baseline but it is somewhat machine-dependent. Most workloads show little or not performance difference implying that there is not a heavily reliance on the throttling mechanism and it is safe to remove. STREAM on 2-socket machine 4.19.0-rc5 4.19.0-rc5 numab-v1r1 noratelimit-v1r1 MB/sec copy 43298.52 ( 0.00%) 44673.38 ( 3.18%) MB/sec scale 30115.06 ( 0.00%) 31293.06 ( 3.91%) MB/sec add 32825.12 ( 0.00%) 34883.62 ( 6.27%) MB/sec triad 32549.52 ( 0.00%) 34906.60 ( 7.24% Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Reviewed-by: Rik van Riel <riel@surriel.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Jirka Hladky <jhladky@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Linux-MM <linux-mm@kvack.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20181001100525.29789-2-mgorman@techsingularity.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-09-28rxrpc: Fix error distributionDavid Howells
Fix error distribution by immediately delivering the errors to all the affected calls rather than deferring them to a worker thread. The problem with the latter is that retries and things can happen in the meantime when we want to stop that sooner. To this end: (1) Stop the error distributor from removing calls from the error_targets list so that peer->lock isn't needed to synchronise against other adds and removals. (2) Require the peer's error_targets list to be accessed with RCU, thereby avoiding the need to take peer->lock over distribution. (3) Don't attempt to affect a call's state if it is already marked complete. Signed-off-by: David Howells <dhowells@redhat.com>
2018-08-24Merge branch 'for-4.19' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: "Just one commit from Steven to take out spin lock from trace event handlers" * 'for-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup/tracing: Move taking of spin lock out of trace event handlers
2018-08-20Merge tag 'trace-v4.19' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: - Restructure of lockdep and latency tracers This is the biggest change. Joel Fernandes restructured the hooks from irqs and preemption disabling and enabling. He got rid of a lot of the preprocessor #ifdef mess that they caused. He turned both lockdep and the latency tracers to use trace events inserted in the preempt/irqs disabling paths. But unfortunately, these started to cause issues in corner cases. Thus, parts of the code was reverted back to where lockdep and the latency tracers just get called directly (without using the trace events). But because the original change cleaned up the code very nicely we kept that, as well as the trace events for preempt and irqs disabling, but they are limited to not being called in NMIs. - Have trace events use SRCU for "rcu idle" calls. This was required for the preempt/irqs off trace events. But it also had to not allow them to be called in NMI context. Waiting till Paul makes an NMI safe SRCU API. - New notrace SRCU API to allow trace events to use SRCU. - Addition of mcount-nop option support - SPDX headers replacing GPL templates. - Various other fixes and clean ups. - Some fixes are marked for stable, but were not fully tested before the merge window opened. * tag 'trace-v4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (44 commits) tracing: Fix SPDX format headers to use C++ style comments tracing: Add SPDX License format tags to tracing files tracing: Add SPDX License format to bpf_trace.c blktrace: Add SPDX License format header s390/ftrace: Add -mfentry and -mnop-mcount support tracing: Add -mcount-nop option support tracing: Avoid calling cc-option -mrecord-mcount for every Makefile tracing: Handle CC_FLAGS_FTRACE more accurately Uprobe: Additional argument arch_uprobe to uprobe_write_opcode() Uprobes: Simplify uprobe_register() body tracepoints: Free early tracepoints after RCU is initialized uprobes: Use synchronize_rcu() not synchronize_sched() tracing: Fix synchronizing to event changes with tracepoint_synchronize_unregister() ftrace: Remove unused pointer ftrace_swapper_pid tracing: More reverting of "tracing: Centralize preemptirq tracepoints and unify their usage" tracing/irqsoff: Handle preempt_count for different configs tracing: Partial revert of "tracing: Centralize preemptirq tracepoints and unify their usage" tracing: irqsoff: Account for additional preempt_disable trace: Use rcu_dereference_raw for hooks from trace-event subsystem tracing/kprobes: Fix within_notrace_func() to check only notrace functions ...