aboutsummaryrefslogtreecommitdiffstats
path: root/kernel/rcu
AgeCommit message (Collapse)Author
2019-05-31rcuperf: Fix cleanup path for invalid perf_type stringsPaul E. McKenney
[ Upstream commit ad092c027713a68a34168942a5ef422e42e039f4 ] If the specified rcuperf.perf_type is not in the rcu_perf_init() function's perf_ops[] array, rcuperf prints some console messages and then invokes rcu_perf_cleanup() to set state so that a future torture test can run. However, rcu_perf_cleanup() also attempts to end the test that didn't actually start, and in doing so relies on the value of cur_ops, a value that is not particularly relevant in this case. This can result in confusing output or even follow-on failures due to attempts to use facilities that have not been properly initialized. This commit therefore sets the value of cur_ops to NULL in this case and inserts a check near the beginning of rcu_perf_cleanup(), thus avoiding relying on an irrelevant cur_ops value. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-05-31rcutorture: Fix cleanup path for invalid torture_type stringsPaul E. McKenney
[ Upstream commit b813afae7ab6a5e91b4e16cc567331d9c2ae1f04 ] If the specified rcutorture.torture_type is not in the rcu_torture_init() function's torture_ops[] array, rcutorture prints some console messages and then invokes rcu_torture_cleanup() to set state so that a future torture test can run. However, rcu_torture_cleanup() also attempts to end the test that didn't actually start, and in doing so relies on the value of cur_ops, a value that is not particularly relevant in this case. This can result in confusing output or even follow-on failures due to attempts to use facilities that have not been properly initialized. This commit therefore sets the value of cur_ops to NULL in this case and inserts a check near the beginning of rcu_torture_cleanup(), thus avoiding relying on an irrelevant cur_ops value. Reported-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-05kprobes: Prohibit probing on RCU debug routineMasami Hiramatsu
[ Upstream commit a39f15b9644fac3f950f522c39e667c3af25c588 ] Since kprobe itself depends on RCU, probing on RCU debug routine can cause recursive breakpoint bugs. Prohibit probing on RCU debug routines. int3 ->do_int3() ->ist_enter() ->RCU_LOCKDEP_WARN() ->debug_lockdep_rcu_enabled() -> int3 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrea Righi <righi.andrea@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/154998807741.31052.11229157537816341591.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-03-23rcu: Do RCU GP kthread self-wakeup from softirq and interruptZhang, Jun
commit 1d1f898df6586c5ea9aeaf349f13089c6fa37903 upstream. The rcu_gp_kthread_wake() function is invoked when it might be necessary to wake the RCU grace-period kthread. Because self-wakeups are normally a useless waste of CPU cycles, if rcu_gp_kthread_wake() is invoked from this kthread, it naturally refuses to do the wakeup. Unfortunately, natural though it might be, this heuristic fails when rcu_gp_kthread_wake() is invoked from an interrupt or softirq handler that interrupted the grace-period kthread just after the final check of the wait-event condition but just before the schedule() call. In this case, a wakeup is required, even though the call to rcu_gp_kthread_wake() is within the RCU grace-period kthread's context. Failing to provide this wakeup can result in grace periods failing to start, which in turn results in out-of-memory conditions. This race window is quite narrow, but it actually did happen during real testing. It would of course need to be fixed even if it was strictly theoretical in nature. This patch does not Cc stable because it does not apply cleanly to earlier kernel versions. Fixes: 48a7639ce80c ("rcu: Make callers awaken grace-period kthread") Reported-by: "He, Bo" <bo.he@intel.com> Co-developed-by: "Zhang, Jun" <jun.zhang@intel.com> Co-developed-by: "He, Bo" <bo.he@intel.com> Co-developed-by: "xiao, jin" <jin.xiao@intel.com> Co-developed-by: Bai, Jie A <jie.a.bai@intel.com> Signed-off: "Zhang, Jun" <jun.zhang@intel.com> Signed-off: "He, Bo" <bo.he@intel.com> Signed-off: "xiao, jin" <jin.xiao@intel.com> Signed-off: Bai, Jie A <jie.a.bai@intel.com> Signed-off-by: "Zhang, Jun" <jun.zhang@intel.com> [ paulmck: Switch from !in_softirq() to "!in_interrupt() && !in_serving_softirq() to avoid redundant wakeups and to also handle the interrupt-handler scenario as well as the softirq-handler scenario that actually occurred in testing. ] Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Link: https://lkml.kernel.org/r/CD6925E8781EFD4D8E11882D20FC406D52A11F61@SHSMSX104.ccr.corp.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-13srcu: Lock srcu_data structure in srcu_gp_start()Dennis Krein
commit eb4c2382272ae7ae5d81fdfa5b7a6c86146eaaa4 upstream. The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to rcu_segcblist_advance() and rcu_segcblist_accelerate(), thus preventing this corruption. Reported-by: Bart Van Assche <bvanassche@acm.org> Reported-by: Christoph Hellwig <hch@infradead.org> Reported-by: Sebastian Kuzminsky <seb.kuzminsky@gmail.com> Signed-off-by: Dennis Krein <Dennis.Krein@netapp.com> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Tested-by: Dennis Krein <Dennis.Krein@netapp.com> Cc: <stable@vger.kernel.org> # 4.16.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-01rcu: Make need_resched() respond to urgent RCU-QS needsPaul E. McKenney
commit 92aa39e9dc77481b90cbef25e547d66cab901496 upstream. The per-CPU rcu_dynticks.rcu_urgent_qs variable communicates an urgent need for an RCU quiescent state from the force-quiescent-state processing within the grace-period kthread to context switches and to cond_resched(). Unfortunately, such urgent needs are not communicated to need_resched(), which is sometimes used to decide when to invoke cond_resched(), for but one example, within the KVM vcpu_run() function. As of v4.15, this can result in synchronize_sched() being delayed by up to ten seconds, which can be problematic, to say nothing of annoying. This commit therefore checks rcu_dynticks.rcu_urgent_qs from within rcu_check_callbacks(), which is invoked from the scheduling-clock interrupt handler. If the current task is not an idle task and is not executing in usermode, a context switch is forced, and either way, the rcu_dynticks.rcu_urgent_qs variable is set to false. If the current task is an idle task, then RCU's dyntick-idle code will detect the quiescent state, so no further action is required. Similarly, if the task is executing in usermode, other code in rcu_check_callbacks() and its called functions will report the corresponding quiescent state. Reported-by: Marius Hillenbrand <mhillenb@amazon.de> Reported-by: David Woodhouse <dwmw2@infradead.org> Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Backported to make patch apply cleanly on older versions. ] Tested-by: Marius Hillenbrand <mhillenb@amazon.de> Cc: <stable@vger.kernel.org> # 4.12.x - 4.19.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-13Merge branch 'sched-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Thomas Gleixner: - Cleanup and improvement of NUMA balancing - Refactoring and improvements to the PELT (Per Entity Load Tracking) code - Watchdog simplification and related cleanups - The usual pile of small incremental fixes and improvements * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits) watchdog: Reduce message verbosity stop_machine: Reflow cpu_stop_queue_two_works() sched/numa: Move task_numa_placement() closer to numa_migrate_preferred() sched/numa: Use group_weights to identify if migration degrades locality sched/numa: Update the scan period without holding the numa_group lock sched/numa: Remove numa_has_capacity() sched/numa: Modify migrate_swap() to accept additional parameters sched/numa: Remove unused task_capacity from 'struct numa_stats' sched/numa: Skip nodes that are at 'hoplimit' sched/debug: Reverse the order of printing faults sched/numa: Use task faults only if numa_group is not yet set up sched/numa: Set preferred_node based on best_cpu sched/numa: Simplify load_too_imbalanced() sched/numa: Evaluate move once per node sched/numa: Remove redundant field sched/debug: Show the sum wait time of a task group sched/fair: Remove #ifdefs from scale_rt_capacity() sched/core: Remove get_cpu() from sched_fork() sched/cpufreq: Clarify sugov_get_util() sched/sysctl: Remove unused sched_time_avg_ms sysctl ...
2018-07-12Merge branches 'fixes1.2018.07.12b' and 'torture1.2018.07.12b' into HEADPaul E. McKenney
fixes1.2018.07.12b: Post-gp_seq miscellaneous fixes torture1.2018.07.12b: Post-gp_seq torture-test updates
2018-07-12rcutorture: Fix rcu_barrier successes counterJoel Fernandes (Google)
The rcutorture test module currently increments both successes and error for the barrier test upon error, which results in misleading statistics being printed. This commit therefore changes the code to increment the success counter only when the test actually passes. This change was tested by by returning from the barrier callback without incrementing the callback counter, thus introducing what appeared to rcutorture to be rcu_barrier() failures. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcutorture: Add support to detect if boost kthread prio is too lowJoel Fernandes (Google)
When rcutorture is built in to the kernel, an earlier patch detects that and raises the priority of RCU's kthreads to allow rcutorture's RCU priority boosting tests to succeed. However, if rcutorture is built as a module, those priorities must be raised manually via the rcutree.kthread_prio kernel boot parameter. If this manual step is not taken, rcutorture's RCU priority boosting tests will fail due to kthread starvation. One approach would be to raise the default priority, but that risks breaking existing users. Another approach would be to allow runtime adjustment of RCU's kthread priorities, but that introduces numerous "interesting" race conditions. This patch therefore instead detects too-low priorities, and prints a message and disables the RCU priority boosting tests in that case. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcutorture: Use monotonic timestamp for stall detectionArnd Bergmann
The get_seconds() call is deprecated because it overflows on 32-bit architectures. The algorithm in rcu_torture_stall() can deal with the overflow, but another problem here is that using a CLOCK_REALTIME stamp can lead to a false-positive stall warning when a settimeofday() happens concurrently. Using ktime_get_seconds() instead avoids those issues and will never overflow. The added cast to 'unsigned long' however is necessary to make ULONG_CMP_LT() work correctly. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcutorture: Make boost test more robustJoel Fernandes (Google)
Currently, with RCU_BOOST disabled, I get no failures when forcing rcutorture to test RCU boost priority inversion. The reason seems to be that we don't check for failures if the callback never ran at all for the duration of the boost-test loop. Further, the 'rtb' and 'rtbf' counters seem to be used inconsistently. 'rtb' is incremented at the start of each test and 'rtbf' is incremented per-cpu on each failure of call_rcu. So its possible 'rtbf' > 'rtb'. To test the boost with rcutorture, I did following on a 4-CPU x86 machine: modprobe rcutorture test_boost=2 sleep 20 rmmod rcutorture With patch: rtbf: 8 rtb: 12 Without patch: rtbf: 0 rtb: 2 In summary this patch: - Increments failed and total test counters once per boost-test. - Checks for failure cases correctly. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcutorture: Disable RT throttling for boost testsJoel Fernandes (Google)
Currently rcutorture is not able to torture RCU boosting properly. This is because the rcutorture's boost threads which are doing the torturing may be throttled due to RT throttling. This patch makes rcutorture use the right torture technique (unthrottled rcutorture boost tasks) for torturing RCU so that the test fails correctly when no boost is available. Currently this requires accessing sysctl_sched_rt_runtime directly, but that should be Ok since rcutorture is test code. Such direct access is also only possible if rcutorture is used as a built-in so make it conditional on that. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcutorture: Emphasize testing of single reader protection typePaul E. McKenney
For RCU implementations supporting multiple types of reader protection, rcutorture currently randomly selects the combinations of types of protection for each phase of each reader. The problem with this, for example, given the four kinds of protection for RCU-sched (local_irq_disable(), local_bh_disable(), preempt_disable(), and rcu_read_lock_sched()), the reader will be protected by a single mechanism only 25% of the time. We really heavier testing of single read-side mechanisms. This commit therefore uses only a single mechanism about 60% of the time, half of the time explicitly and one-eighth of the time by chance. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcutorture: Handle extended read-side critical sectionsPaul E. McKenney
This commit enables rcutorture to test whether RCU properly aggregates different types of read-side critical sections into a larger section covering the set. It does this by extending an initial read-side critical section randomly for a random number of extensions. There is a new rcu_torture_ops field ->extendable that specifies what extensions are permitted for a given flavor of RCU (for example, SRCU does not permit any extensions, while RCU-sched permits all types). Note that if a given operation (for example, local_bh_disable()) extends an RCU read-side critical section, then rcutorture feels free to also start and end the critical section with that operation's type of disabling. Disabling operations include local_bh_disable(), local_irq_disable(), and preempt_disable(). This commit also adds a new "busted_srcud" torture type, which verifies rcutorture's ability to detect extensions of RCU read-side critical sections that are not handled. Gotta test the test, after all! Note that it is not legal to invoke local_bh_disable() with interrupts disabled, and this transition is avoided by overriding the random-number generator when it wants to call local_bh_disable() while interrupts are disabled. The code instead leaves both interrupts and bh/softirq disabled in this case. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcutorture: Make rcu_torture_timer() use rcu_torture_one_read()Paul E. McKenney
This commit saves a few lines of code by making rcu_torture_timer() invoke rcu_torture_one_read(), thus completing the consolidation of code between rcu_torture_timer() and rcu_torture_reader(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcutorture: Use per-CPU random state for rcu_torture_timer()Paul E. McKenney
Currently, the rcu_torture_timer() function uses a single global torture_random_state structure protected by a single global lock. This conflicts to some extent with performance and scalability, but even more with the goal of consolidating read-side testing with rcu_torture_reader(). This commit therefore creates a per-CPU torture_random_state structure for use by rcu_torture_timer() and eliminates the lock. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Make rcu_torture_timer_rand static, per 0day Test Robot report. ]
2018-07-12rcutorture: Use atomic increment for n_rcu_torture_timersPaul E. McKenney
Currently, rcu_torture_timer() relies on a lock to guard updates to n_rcu_torture_timers. Unfortunately, consolidating code with rcu_torture_reader() will dispense with this lock. This commit therefore makes n_rcu_torture_timers be an atomic_long_t and uses atomic_long_inc() to carry out the update. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcutorture: Extract common code from rcu_torture_reader()Paul E. McKenney
This commit extracts the code executed on each pass through the loop in rcu_torture_reader() into a new rcu_torture_one_read() function. This new function will also be used by rcu_torture_timer(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcuperf: Remove unused torturing_tasks() functionPaul E. McKenney
The torturing_tasks() function in rcuperf.c is not used, so this commit removes it. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Remove rcutorture test version and sequence numberPaul E. McKenney
Back when RCU had a debugfs interface, there was a test version and sequence number that allowed associating debugfs data with a particular test run, where the test run started with modprobe and ended with rmmod, which was how tests were run back on the old ABAT system within IBM. But rcutorture testing no longer runs on ABAT, and there is no longer an RCU debugfs interface, so there is no longer any need for test versions and sequence numbers. This commit therefore removes the rcutorture_record_test_transition() and rcutorture_record_progress() functions, and along with them the rcutorture_testseq and rcutorture_vernum variables that they update. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcutorture: Change units of onoff_interval to jiffiesPaul E. McKenney
Some RCU bugs have been sensitive to the frequency of CPU-hotplug operations, which have been gradually increased over time. But this frequency is now at the one-second lower limit that can be specified using the rcutorture.onoff_interval kernel parameter. This commit therefore changes the units of rcutorture.onoff_interval from seconds to jiffies, and also sets the value specified for this kernel parameter in the TREE03 rcutorture scenario to 200, which is 200 milliseconds for HZ=1000. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Assign higher prio to RCU threads if rcutorture is built-inJoel Fernandes (Google)
The rcutorture RCU priority boosting tests fail even with CONFIG_RCU_BOOST set because rcutorture's threads run at the same priority as the default RCU kthreads (RT class with priority of 1). This patch checks if RCU torture is built into the kernel and if so, assigns RT priority 1 to the RCU threads, allowing the rcutorture boost tests to pass. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12srcu: Add grace-period number to rcutorture statistics printoutPaul E. McKenney
This commit adds the SRCU grace-period number to the rcutorture statistics printout, which allows it to be compared to the rcutorture "Writer stall state" message. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Print stall-warning NMI dyntick state in hexadecimalPaul E. McKenney
The ->dynticks_nmi_nesting field records the nesting depth of both interrupt and NMI handlers. Because the kernel can enter interrupts and never leave them (and vice versa) and because NMIs can interrupt manipulation of the ->dynticks_nmi_nesting field, the values in this field must be both chosen and maniupated very carefully. As a result, although the value is zero when the corresponding CPU is executing neither an interrupt nor an NMI handler, it is 4,611,686,018,427,387,906 on 64-bit systems when there is a single level of interrupt/NMI handling in progress. This number is difficult to remember and interpret, so this commit switches the output to hexadecimal, resulting in the much nicer 0x4000000000000002. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Make rcu_seq_diff() more exactPaul E. McKenney
The current implementatation of rcu_seq_diff() follows tradition in providing a rough-and-ready approximation of the number of elapsed grace periods between the two rcu_seq values. However, this difference is used to flag RCU-failure "near misses", which can be a valuable debugging aid, so more exactitude would be an improvement. This commit therefore improves the accuracy of rcu_seq_diff(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Check the range of jiffies_till_{first,next}_fqs when setting themByungchul Park
Currently, the range of jiffies_till_{first,next}_fqs are checked and adjusted on and on in the loop of rcu_gp_kthread on runtime. However, it's enough to check them only when setting them, not every time in the loop. So make them handled on a setting time via sysfs. Signed-off-by: Byungchul Park <byungchul.park@lge.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Add diagnostics for rcutorture writer stall warningPaul E. McKenney
This commit adds any in-the-future ->gp_seq_needed fields to the diagnostics for an rcutorture writer stall warning message. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Add comment to the last sleep in the rcu tasks loopSteven Rostedt (VMware)
At the end of rcu_tasks_kthread() there's a lonely schedule_timeout_uninterruptible() call with no apparent rationale for its existence. But there is. It is to keep the thread from going into a tight loop if there's some anomaly. That really needs a comment. Link: http://lkml.kernel.org/r/20180524223839.GU3803@linux.vnet.ibm.com Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Speed up calling of RCU tasks callbacksSteven Rostedt (VMware)
Joel Fernandes found that the synchronize_rcu_tasks() was taking a significant amount of time. He demonstrated it with the following test: # cd /sys/kernel/tracing # while [ 1 ]; do x=1; done & # echo '__schedule_bug:traceon' > set_ftrace_filter # time echo '!__schedule_bug:traceon' > set_ftrace_filter; real 0m1.064s user 0m0.000s sys 0m0.004s Where it takes a little over a second to perform the synchronize, because there's a loop that waits 1 second at a time for tasks to get through their quiescent points when there's a task that must be waited for. After discussion we came up with a simple way to wait for holdouts but increase the time for each iteration of the loop but no more than a full second. With the new patch we have: # time echo '!__schedule_bug:traceon' > set_ftrace_filter; real 0m0.131s user 0m0.000s sys 0m0.004s Which drops it down to 13% of what the original wait time was. Link: http://lkml.kernel.org/r/20180523063815.198302-2-joel@joelfernandes.org Reported-by: Joel Fernandes (Google) <joel@joelfernandes.org> Suggested-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Add comment documenting how rcu_seq_snap worksJoel Fernandes (Google)
rcu_seq_snap may be tricky to decipher. Lets document how it works with an example to make it easier. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Shrink comment as suggested by Peter Zijlstra. ]
2018-07-12rcu: Use RCU CPU stall timeout for rcu_check_gp_start_stall()Paul E. McKenney
Currently, rcu_check_gp_start_stall() waits for one second after the first request before complaining that a grace period has not yet started. This was desirable while testing the conversion from ->future_gp_needed[] to ->gp_seq_needed, but it is a bit on the hair-trigger side for production use under heavy load. This commit therefore makes this wait time be exactly that of the RCU CPU stall warning, allowing easy adjustment of both timeouts to suit the distribution or installation at hand. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Remove __maybe_unused from rcu_cpu_has_callbacks()Paul E. McKenney
The rcu_cpu_has_callbacks() function is now used in all configurations, so this commit removes the __maybe_unused. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Remove "inline" from rcu_perf_print_module_parms()Paul E. McKenney
This function is in rcuperf.c, which is not an include file, so there is no problem dropping the "inline", especially given that this function is invoked only twice per rcuperf run. This commit therefore delegates the inlining decision to the compiler by dropping the "inline". Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Remove "inline" from rcu_torture_print_module_parms()Paul E. McKenney
This function is in rcutorture.c, which is not an include file, so there is no problem dropping the "inline", especially given that this function is invoked only twice per rcutorture run. This commit therefore delegates the inlining decision to the compiler by dropping the "inline". Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Remove "inline" from panic_on_rcu_stall() and rcu_blocking_is_gp()Paul E. McKenney
These functions are in kernel/rcu/tree.c, which is not an include file, so there is no problem dropping the "inline", especially given that these functions are nowhere near a fastpath. This commit therefore delegates the inlining decision to the compiler by dropping the "inline". Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Remove unused local variable "cpu"Paul E. McKenney
One danger of using __maybe_unused is that the compiler doesn't yell at you when you remove the last reference, witness rcu_bind_gp_kthread() and its local variable "cpu". This commit removes this local variable. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Remove unused rcu_kick_nohz_cpu() functionPaul E. McKenney
The rcu_kick_nohz_cpu() function is no longer used, and the functionality it used to provide is now provided by a call to resched_cpu() in the force-quiescent-state function rcu_implicit_dynticks_qs(). This commit therefore removes rcu_kick_nohz_cpu(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Clarify and correct the rcu_preempt_qs() header commentPaul E. McKenney
The rcu_preempt_qs() function only applies to the CPU, not the task. A task really is allowed to invoke this function while in an RCU-preempt read-side critical section, but only if it has first added itself to some leaf rcu_node structure's ->blkd_tasks list. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Inline rcu_dynticks_momentary_idle() into its sole callerPaul E. McKenney
The rcu_dynticks_momentary_idle() function is invoked only from rcu_momentary_dyntick_idle(), and neither function is particularly large. This commit therefore saves a few lines by inlining rcu_dynticks_momentary_idle() into rcu_momentary_dyntick_idle(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Mark task as .need_qs less aggressivelyPaul E. McKenney
If any scheduling-clock interrupt interrupts an RCU-preempt read-side critical section, the interrupted task's ->rcu_read_unlock_special.b.need_qs field is set. This causes the outermost rcu_read_unlock() to incur the extra overhead of calling into rcu_read_unlock_special(). This commit reduces that overhead by setting ->rcu_read_unlock_special.b.need_qs only if the grace period has been in effect for more than one second. Why one second? Because this is comfortably smaller than the minimum RCU CPU stall-warning timeout of three seconds, but long enough that the .need_qs marking should happen quite rarely. And if your RCU read-side critical section has run on-CPU for a full second, it is not unreasonable to invest some CPU time in ending the grace period quickly. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Improve RCU-tasks naming and commentsPaul E. McKenney
The naming and comments associated with some RCU-tasks code make the faulty assumption that context switches due to cond_resched() are voluntary. As several people pointed out, this is not the case. This commit therefore updates function names and comments to better reflect current reality. Reported-by: Byungchul Park <byungchul.park@lge.com> Reported-by: Joel Fernandes <joel@joelfernandes.org> Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Use pr_fmt to prefix "rcu: " to logging outputJoe Perches
This commit also adjusts some whitespace while in the area. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Revert string-breaking %s as requested by Andy Shevchenko. ]
2018-07-12rcu: Improve rcu_note_voluntary_context_switch() reportingByungchul Park
We expect a quiescent state of TASKS_RCU when cond_resched_tasks_rcu_qs() is called, no matter whether it actually be scheduled or not. However, it currently doesn't report the quiescent state when the task enters into __schedule() as it's called with preempt = true. So make it report the quiescent state unconditionally when cond_resched_tasks_rcu_qs() is called. And in TINY_RCU, even though the quiescent state of rcu_bh also should be reported when the tick interrupt comes from user, it doesn't. So make it reported. Lastly in TREE_RCU, rcu_note_voluntary_context_switch() should be reported when the tick interrupt comes from not only user but also idle, as an extended quiescent state. Signed-off-by: Byungchul Park <byungchul.park@lge.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Simplify rcutiny portion given no RCU-tasks for !PREEMPT. ]
2018-07-12rcu: Make rcu_read_unlock_special() staticPaul E. McKenney
Because rcu_read_unlock_special() is no longer used outside of kernel/rcu/tree_plugin.h, this commit makes it static. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Add diagnostics for offline CPUs failing to report QSPaul E. McKenney
CPUs are expected to report quiescent states when coming online and when going offline, and grace-period initialization is supposed to handle any race conditions where a CPU's ->qsmask bit is set just after it goes offline. This commit adds diagnostics for the case where an offline CPU nevertheless has a grace period waiting on it. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Record ->gp_state for both phases of grace-period initializationPaul E. McKenney
Grace-period initialization first processes any recent CPU-hotplug operations, and then initializes state for the new grace period. These two phases of initialization are currently not distinguished in debug prints, but the distinction is valuable in a number of debug situations. This commit therefore introduces two new values for ->gp_state, RCU_GP_ONOFF and RCU_GP_INIT, in order to make this distinction. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Add CPU online/offline state to dump_blkd_tasks()Paul E. McKenney
Interactions between CPU-hotplug operations and grace-period initialization can result in dump_blkd_tasks(). One of the first debugging actions in this case is to search back in dmesg to work out which of the affected rcu_node structure's CPUs are online and to determine the last CPU-hotplug operation affecting any of those CPUs. This can be laborious and error-prone, especially when console output is lost. This commit therefore causes dump_blkd_tasks() to dump the state of the affected rcu_node structure's CPUs and the last grace period during which the last offline and online operation affected each of these CPUs. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Add up-tree information to dump_blkd_tasks() diagnosticsPaul E. McKenney
This commit updates dump_blkd_tasks() to print out quiescent-state bitmasks for the rcu_node structures further up the tree. This information helps debugging of interactions between CPU-hotplug operations and RCU grace-period initialization. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12rcu: Remove CPU-hotplug failsafe from force-quiescent-state code pathPaul E. McKenney
Now that quiescent states for newly offlined CPUs are reported either when that CPU goes offline or at the end of grace-period initialization, the CPU-hotplug failsafe in the force-quiescent-state code path is no longer needed. This commit therefore removes this failsafe. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>