aboutsummaryrefslogtreecommitdiffstats
path: root/features/clear_warn_once/sched-isolation-really-align-nohz_full-with-rcu_nocb.patch
diff options
context:
space:
mode:
Diffstat (limited to 'features/clear_warn_once/sched-isolation-really-align-nohz_full-with-rcu_nocb.patch')
-rw-r--r--features/clear_warn_once/sched-isolation-really-align-nohz_full-with-rcu_nocb.patch88
1 files changed, 88 insertions, 0 deletions
diff --git a/features/clear_warn_once/sched-isolation-really-align-nohz_full-with-rcu_nocb.patch b/features/clear_warn_once/sched-isolation-really-align-nohz_full-with-rcu_nocb.patch
new file mode 100644
index 00000000..35145584
--- /dev/null
+++ b/features/clear_warn_once/sched-isolation-really-align-nohz_full-with-rcu_nocb.patch
@@ -0,0 +1,88 @@
+Date: Mon, 21 Feb 2022 13:20:08 -0500
+Subject: [PATCH 1/2] sched/isolation: really align nohz_full with rcu_nocbs
+From: Paul Gortmaker <paul.gortmaker@windriver.com>
+To: linux-kernel@vger.kernel.org, Frederic Weisbecker <fweisbec@gmail.com>,
+ Peter Zijlstra <peterz@infradead.org>
+CC: Paul Gortmaker <paul.gortmaker@windriver.com>, Thomas Gleixner
+ <tglx@linutronix.de>, "Paul E . McKenney" <paulmck@kernel.org>, Ingo Molnar
+ <mingo@kernel.org>
+X-Mailer: git-send-email 2.17.1
+MIME-Version: 1.0
+Content-Transfer-Encoding: 8bit
+Content-Type: text/plain; charset=utf-8
+
+At the moment it is currently possible to sneak a core into nohz_full
+that lies between nr_possible and NR_CPUS - but you won't "see" it
+because cpumask_pr_args() implicitly hides anything above nr_cpu_ids.
+
+This becomes a problem when the nohz_full CPU set doesn't contain at
+least one other valid nohz CPU - in which case we end up with the
+tick_nohz_full_running set and no tick core specified, which trips an
+endless sequence of WARN() and renders the machine unusable.
+
+I inadvertently opened the door for this when fixing an overly
+restrictive nohz_full conditional in the below Fixes: commit - and then
+courtesy of my optimistic ACPI reporting nr_possible of 64 (the default
+Kconfig for NR_CPUS) and the not-so helpful implict filtering done by
+cpumask_pr_args, I unfortunately did not spot it during my testing.
+
+So here, I don't rely on what was printed anymore, but code exactly what
+our restrictions should be in order to be aligned with rcu_nocbs - which
+was the original goal. Since the checks lie in "__init" code it is largely
+free for us to do this anyway.
+
+Building with NOHZ_FULL and NR_CPUS=128 on an otherwise defconfig, and
+booting with "rcu_nocbs=8-127 nohz_full=96-127" on the same 16 core T5500
+Dell machine now results in the following (only relevant lines shown):
+
+ smpboot: Allowing 64 CPUs, 48 hotplug CPUs
+ setup_percpu: NR_CPUS:128 nr_cpumask_bits:128 nr_cpu_ids:64 nr_node_ids:2
+ housekeeping: kernel parameter 'nohz_full=' or 'isolcpus=' contains nonexistent CPUs.
+ housekeeping: kernel parameter 'nohz_full=' or 'isolcpus=' has no valid CPUs.
+ rcu: RCU restricting CPUs from NR_CPUS=128 to nr_cpu_ids=64.
+ rcu: Note: kernel parameter 'rcu_nocbs=', 'nohz_full', or 'isolcpus=' contains nonexistent CPUs.
+ rcu: Offload RCU callbacks from CPUs: 8-63.
+
+One can see both new housekeeping checks are triggered in the above.
+The same invalid boot arg combination would have previously resulted in
+an infinitely scrolling mix of WARN from all cores per tick on this box.
+
+We may need to revisit these sanity checks when nohz_full is accepted as
+a stand alone keyword "enable" w/o a cpuset (see rcu/nohz d2cf0854d728).
+
+Fixes: 915a2bc3c6b7 ("sched/isolation: Reconcile rcu_nocbs= and nohz_full=")
+Cc: Frederic Weisbecker <fweisbec@gmail.com>
+Cc: Peter Zijlstra <peterz@infradead.org>
+Cc: Thomas Gleixner <tglx@linutronix.de>
+Cc: Paul E. McKenney <paulmck@kernel.org>
+Cc: Ingo Molnar <mingo@kernel.org>
+Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
+---
+ kernel/sched/isolation.c | 11 +++++++++++
+ 1 file changed, 11 insertions(+)
+
+diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
+index b4d10815c45a..f7d2406c1f1d 100644
+--- a/kernel/sched/isolation.c
++++ b/kernel/sched/isolation.c
+@@ -126,6 +126,17 @@ static int __init housekeeping_setup(char *str, unsigned long flags)
+ goto free_non_housekeeping_mask;
+ }
+
++ if (!cpumask_subset(non_housekeeping_mask, cpu_possible_mask)) {
++ pr_info("housekeeping: kernel parameter 'nohz_full=' or 'isolcpus=' contains nonexistent CPUs.\n");
++ cpumask_and(non_housekeeping_mask, cpu_possible_mask,
++ non_housekeeping_mask);
++ }
++
++ if (cpumask_empty(non_housekeeping_mask)) {
++ pr_info("housekeeping: kernel parameter 'nohz_full=' or 'isolcpus=' has no valid CPUs.\n");
++ goto free_non_housekeeping_mask;
++ }
++
+ alloc_bootmem_cpumask_var(&housekeeping_staging);
+ cpumask_andnot(housekeeping_staging,
+ cpu_possible_mask, non_housekeeping_mask);
+--
+2.17.1
+