diff options
Diffstat (limited to 'features/clear_warn_once/sched-isolation-really-align-nohz_full-with-rcu_nocb.patch')
-rw-r--r-- | features/clear_warn_once/sched-isolation-really-align-nohz_full-with-rcu_nocb.patch | 88 |
1 files changed, 88 insertions, 0 deletions
diff --git a/features/clear_warn_once/sched-isolation-really-align-nohz_full-with-rcu_nocb.patch b/features/clear_warn_once/sched-isolation-really-align-nohz_full-with-rcu_nocb.patch new file mode 100644 index 00000000..35145584 --- /dev/null +++ b/features/clear_warn_once/sched-isolation-really-align-nohz_full-with-rcu_nocb.patch @@ -0,0 +1,88 @@ +Date: Mon, 21 Feb 2022 13:20:08 -0500 +Subject: [PATCH 1/2] sched/isolation: really align nohz_full with rcu_nocbs +From: Paul Gortmaker <paul.gortmaker@windriver.com> +To: linux-kernel@vger.kernel.org, Frederic Weisbecker <fweisbec@gmail.com>, + Peter Zijlstra <peterz@infradead.org> +CC: Paul Gortmaker <paul.gortmaker@windriver.com>, Thomas Gleixner + <tglx@linutronix.de>, "Paul E . McKenney" <paulmck@kernel.org>, Ingo Molnar + <mingo@kernel.org> +X-Mailer: git-send-email 2.17.1 +MIME-Version: 1.0 +Content-Transfer-Encoding: 8bit +Content-Type: text/plain; charset=utf-8 + +At the moment it is currently possible to sneak a core into nohz_full +that lies between nr_possible and NR_CPUS - but you won't "see" it +because cpumask_pr_args() implicitly hides anything above nr_cpu_ids. + +This becomes a problem when the nohz_full CPU set doesn't contain at +least one other valid nohz CPU - in which case we end up with the +tick_nohz_full_running set and no tick core specified, which trips an +endless sequence of WARN() and renders the machine unusable. + +I inadvertently opened the door for this when fixing an overly +restrictive nohz_full conditional in the below Fixes: commit - and then +courtesy of my optimistic ACPI reporting nr_possible of 64 (the default +Kconfig for NR_CPUS) and the not-so helpful implict filtering done by +cpumask_pr_args, I unfortunately did not spot it during my testing. + +So here, I don't rely on what was printed anymore, but code exactly what +our restrictions should be in order to be aligned with rcu_nocbs - which +was the original goal. Since the checks lie in "__init" code it is largely +free for us to do this anyway. + +Building with NOHZ_FULL and NR_CPUS=128 on an otherwise defconfig, and +booting with "rcu_nocbs=8-127 nohz_full=96-127" on the same 16 core T5500 +Dell machine now results in the following (only relevant lines shown): + + smpboot: Allowing 64 CPUs, 48 hotplug CPUs + setup_percpu: NR_CPUS:128 nr_cpumask_bits:128 nr_cpu_ids:64 nr_node_ids:2 + housekeeping: kernel parameter 'nohz_full=' or 'isolcpus=' contains nonexistent CPUs. + housekeeping: kernel parameter 'nohz_full=' or 'isolcpus=' has no valid CPUs. + rcu: RCU restricting CPUs from NR_CPUS=128 to nr_cpu_ids=64. + rcu: Note: kernel parameter 'rcu_nocbs=', 'nohz_full', or 'isolcpus=' contains nonexistent CPUs. + rcu: Offload RCU callbacks from CPUs: 8-63. + +One can see both new housekeeping checks are triggered in the above. +The same invalid boot arg combination would have previously resulted in +an infinitely scrolling mix of WARN from all cores per tick on this box. + +We may need to revisit these sanity checks when nohz_full is accepted as +a stand alone keyword "enable" w/o a cpuset (see rcu/nohz d2cf0854d728). + +Fixes: 915a2bc3c6b7 ("sched/isolation: Reconcile rcu_nocbs= and nohz_full=") +Cc: Frederic Weisbecker <fweisbec@gmail.com> +Cc: Peter Zijlstra <peterz@infradead.org> +Cc: Thomas Gleixner <tglx@linutronix.de> +Cc: Paul E. McKenney <paulmck@kernel.org> +Cc: Ingo Molnar <mingo@kernel.org> +Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> +--- + kernel/sched/isolation.c | 11 +++++++++++ + 1 file changed, 11 insertions(+) + +diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c +index b4d10815c45a..f7d2406c1f1d 100644 +--- a/kernel/sched/isolation.c ++++ b/kernel/sched/isolation.c +@@ -126,6 +126,17 @@ static int __init housekeeping_setup(char *str, unsigned long flags) + goto free_non_housekeeping_mask; + } + ++ if (!cpumask_subset(non_housekeeping_mask, cpu_possible_mask)) { ++ pr_info("housekeeping: kernel parameter 'nohz_full=' or 'isolcpus=' contains nonexistent CPUs.\n"); ++ cpumask_and(non_housekeeping_mask, cpu_possible_mask, ++ non_housekeeping_mask); ++ } ++ ++ if (cpumask_empty(non_housekeeping_mask)) { ++ pr_info("housekeeping: kernel parameter 'nohz_full=' or 'isolcpus=' has no valid CPUs.\n"); ++ goto free_non_housekeeping_mask; ++ } ++ + alloc_bootmem_cpumask_var(&housekeeping_staging); + cpumask_andnot(housekeeping_staging, + cpu_possible_mask, non_housekeeping_mask); +-- +2.17.1 + |