Age | Commit message (Collapse) | Author |
|
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
1/83 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: sched: Constrain locks in sched_submit_work()
Date: Fri, 8 Sep 2023 18:22:48 +0200
Even though sched_submit_work() is ran from preemptible context,
it is discouraged to have it use blocking locks due to the recursion
potential.
Enforce this.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20230908162254.999499-2-bigeasy@linutronix.de
]
2/83 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locking/rtmutex: Avoid unconditional slowpath for DEBUG_RT_MUTEXES
Date: Fri, 8 Sep 2023 18:22:49 +0200
With DEBUG_RT_MUTEXES enabled the fast-path rt_mutex_cmpxchg_acquire()
always fails and all lock operations take the slow path.
Provide a new helper inline rt_mutex_try_acquire() which maps to
rt_mutex_cmpxchg_acquire() in the non-debug case. For the debug case
it invokes rt_mutex_slowtrylock() which can acquire a non-contended
rtmutex under full debug coverage.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20230908162254.999499-3-bigeasy@linutronix.de
]
3/83 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Extract __schedule_loop()
Date: Fri, 8 Sep 2023 18:22:50 +0200
There are currently two implementations of this basic __schedule()
loop, and there is soon to be a third.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20230908162254.999499-4-bigeasy@linutronix.de
]
4/83 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: sched: Provide rt_mutex specific scheduler helpers
Date: Fri, 8 Sep 2023 18:22:51 +0200
With PREEMPT_RT there is a rt_mutex recursion problem where
sched_submit_work() can use an rtlock (aka spinlock_t). More
specifically what happens is:
mutex_lock() /* really rt_mutex */
...
__rt_mutex_slowlock_locked()
task_blocks_on_rt_mutex()
// enqueue current task as waiter
// do PI chain walk
rt_mutex_slowlock_block()
schedule()
sched_submit_work()
...
spin_lock() /* really rtlock */
...
__rt_mutex_slowlock_locked()
task_blocks_on_rt_mutex()
// enqueue current task as waiter *AGAIN*
// *CONFUSION*
Fix this by making rt_mutex do the sched_submit_work() early, before
it enqueues itself as a waiter -- before it even knows *if* it will
wait.
[[ basically Thomas' patch but with different naming and a few asserts
added ]]
Originally-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20230908162254.999499-5-bigeasy@linutronix.de
]
5/83 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locking/rtmutex: Use rt_mutex specific scheduler helpers
Date: Fri, 8 Sep 2023 18:22:52 +0200
Have rt_mutex use the rt_mutex specific scheduler helpers to avoid
recursion vs rtlock on the PI state.
[[ peterz: adapted to new names ]]
Reported-by: Crystal Wood <swood@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20230908162254.999499-6-bigeasy@linutronix.de
]
6/83 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: locking/rtmutex: Add a lockdep assert to catch potential nested blocking
Date: Fri, 8 Sep 2023 18:22:53 +0200
There used to be a BUG_ON(current->pi_blocked_on) in the lock acquisition
functions, but that vanished in one of the rtmutex overhauls.
Bring it back in form of a lockdep assert to catch code paths which take
rtmutex based locks with current::pi_blocked_on != NULL.
Reported-by: Crystal Wood <swood@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20230908162254.999499-7-bigeasy@linutronix.de
]
7/83 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: futex/pi: Fix recursive rt_mutex waiter state
Date: Fri, 15 Sep 2023 17:19:44 +0200
Some new assertions pointed out that the existing code has nested rt_mutex wait
state in the futex code.
Specifically, the futex_lock_pi() cancel case uses spin_lock() while there
still is a rt_waiter enqueued for this task, resulting in a state where there
are two waiters for the same task (and task_struct::pi_blocked_on gets
scrambled).
The reason to take hb->lock at this point is to avoid the wake_futex_pi()
EAGAIN case.
This happens when futex_top_waiter() and rt_mutex_top_waiter() state becomes
inconsistent. The current rules are such that this inconsistency will not be
observed.
Notably the case that needs to be avoided is where futex_lock_pi() and
futex_unlock_pi() interleave such that unlock will fail to observe a new
waiter.
*However* the case at hand is where a waiter is leaving, in this case the race
means a waiter that is going away is not observed -- which is harmless,
provided this race is explicitly handled.
This is a somewhat dangerous proposition because the converse race is not
observing a new waiter, which must absolutely not happen. But since the race is
valid this cannot be asserted.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lkml.kernel.org/r/20230915151943.GD6743@noisy.programming.kicks-ass.net
]
8/83 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: signal: Add proper comment about the preempt-disable in ptrace_stop().
Date: Thu, 3 Aug 2023 12:09:31 +0200
Commit 53da1d9456fe7 ("fix ptrace slowness") added a preempt-disable section
between read_unlock() and the following schedule() invocation without
explaining why it is needed.
Replace the comment with an explanation why this is needed. Clarify that
it is needed for correctness but for performance reasons.
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lore.kernel.org/r/20230803100932.325870-2-bigeasy@linutronix.de
]
9/83 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: signal: Don't disable preemption in ptrace_stop() on PREEMPT_RT.
Date: Thu, 3 Aug 2023 12:09:32 +0200
On PREEMPT_RT keeping preemption disabled during the invocation of
cgroup_enter_frozen() is a problem because the function acquires css_set_lock
which is a sleeping lock on PREEMPT_RT and must not be acquired with disabled
preemption.
The preempt-disabled section is only for performance optimisation
reasons and can be avoided.
Extend the comment and don't disable preemption before scheduling on
PREEMPT_RT.
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lore.kernel.org/r/20230803100932.325870-3-bigeasy@linutronix.de
]
10/83 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: drm/amd/display: Remove migrate_en/dis from dc_fpu_begin().
Date: Thu, 21 Sep 2023 16:15:12 +0200
This is a revert of the commit mentioned below while it is not wrong, as
in the kernel will explode, having migrate_disable() here it is
complete waste of resources.
Additionally commit message is plain wrong the review tag does not make
it any better. The migrate_disable() interface has a fat comment
describing it and it includes the word "undesired" in the headline which
should tickle people to read it before using it.
Initially I assumed it is worded too harsh but now I beg to differ.
The reviewer of the original commit, even not understanding what
migrate_disable() does should ask the following:
- migrate_disable() is added only to the CONFIG_X86 block and it claims
to protect fpu_recursion_depth. Why are the other the architectures
excluded?
- migrate_disable() is added after fpu_recursion_depth was modified.
Shouldn't it be added before the modification or referencing takes
place?
Moving on.
Disabling preemption DOES prevent CPU migration. A task, that can not be
pushed away from the CPU by the scheduler (due to disabled preemption)
can not be pushed or migrated to another CPU.
Disabling migration DOES NOT ensure consistency of per-CPU variables. It
only ensures that the task acts always on the same per-CPU variable. The
task remains preemptible meaning multiple tasks can access the same
per-CPU variable. This in turn leads to inconsistency for the statement
*pcpu -= 1;
with two tasks on one CPU and a preemption point during the RMW
operation:
Task A Task B
read pcpu to reg # 0
inc reg # 0 -> 1
read pcpu to reg # 0
inc reg # 0 -> 1
write reg to pcpu # 1
write reg to pcpu # 1
At the end pcpu reads 1 but should read 2 instead. Boom.
get_cpu_ptr() already contains a preempt_disable() statement. That means
that the per-CPU variable can only be referenced by a single task which
is currently running. The only inconsistency that can occur if the
variable is additionally accessed from an interrupt.
Remove migrate_disable/enable() from dc_fpu_begin/end().
Cc: Tianci Yin <tianci.yin@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Fixes: 0c316556d1249 ("drm/amd/display: Disable migration to ensure consistency of per-CPU variable")
Link: https://lore.kernel.org/r/20230921141516.520471-2-bigeasy@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
11/83 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: drm/amd/display: Simplify the per-CPU usage.
Date: Thu, 21 Sep 2023 16:15:13 +0200
The fpu_recursion_depth counter is used to ensure that dc_fpu_begin()
can be invoked multiple times while the FPU-disable function itself is
only invoked once. Also the counter part (dc_fpu_end()) is ballanced
properly.
Instead of using the get_cpu_ptr() dance around the inc it is simpler to
increment the per-CPU variable directly. Also the per-CPU variable has
to be incremented and decremented on the same CPU. This is ensured by
the inner-part which disables preemption. This is kind of not obvious,
works and the preempt-counter is touched a few times for no reason.
Disable preemption before incrementing fpu_recursion_depth for the first
time. Keep preemption disabled until dc_fpu_end() where the counter is
decremented making it obvious that the preemption has to stay disabled
while the counter is non-zero.
Use simple inc/dec functions.
Remove the nested preempt_disable/enable functions which are now not
needed.
Link: https://lore.kernel.org/r/20230921141516.520471-3-bigeasy@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
12/83 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: drm/amd/display: Add a warning if the FPU is used outside from task context.
Date: Thu, 21 Sep 2023 16:15:14 +0200
Add a warning if the FPU is used from any context other than task
context. This is only precaution since the code is not able to be used
from softirq while the API allows it on x86 for instance.
Link: https://lore.kernel.org/r/20230921141516.520471-4-bigeasy@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
13/83 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: drm/amd/display: Move the memory allocation out of dcn20_validate_bandwidth_fp().
Date: Thu, 21 Sep 2023 16:15:16 +0200
dcn20_validate_bandwidth_fp() is invoked while FPU access has been
enabled. FPU access requires disabling preemption even on PREEMPT_RT.
It is not possible to allocate memory with disabled preemption even with
GFP_ATOMIC on PREEMPT_RT.
Move the memory allocation before FPU access is enabled.
To preserve previous "clean" state of "pipes" add a memset() before the
second invocation of dcn20_validate_bandwidth_internal() where the
variable is used.
Link: https://lore.kernel.org/r/20230921141516.520471-6-bigeasy@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
14/83 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: drm/amd/display: Move the memory allocation out of dcn20_validate_bandwidth_fp().
Date: Thu, 21 Sep 2023 16:15:16 +0200
dcn20_validate_bandwidth_fp() is invoked while FPU access has been
enabled. FPU access requires disabling preemption even on PREEMPT_RT.
It is not possible to allocate memory with disabled preemption even with
GFP_ATOMIC on PREEMPT_RT.
Move the memory allocation before FPU access is enabled.
To preserve previous "clean" state of "pipes" add a memset() before the
second invocation of dcn20_validate_bandwidth_internal() where the
variable is used.
Link: https://lore.kernel.org/r/20230921141516.520471-6-bigeasy@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
15/83 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net: Avoid the IPI to free the
Date: Mon, 15 Aug 2022 17:29:50 +0200
skb_attempt_defer_free() collects a skbs, which was allocated on a
remote CPU, on a per-CPU list. These skbs are either freed on that
remote CPU once the CPU enters NET_RX or an remote IPI function is
invoked in to raise the NET_RX softirq if a threshold of pending skb has
been exceeded.
This remote IPI can cause the wakeup of ksoftirqd on PREEMPT_RT if the
remote CPU idle was idle. This is undesired because once the ksoftirqd
is running it will acquire all pending softirqs and they will not be
executed as part of the threaded interrupt until ksoftird goes idle
again.
To void all this, schedule the deferred clean up from a worker.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
16/83 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: x86: Allow to enable RT
Date: Wed, 7 Aug 2019 18:15:38 +0200
Allow to select RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
17/83 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: x86: Enable RT also on 32bit
Date: Thu, 7 Nov 2019 17:49:20 +0100
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
18/83 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: sched/rt: Don't try push tasks if there are none.
Date: Tue, 1 Aug 2023 17:26:48 +0200
I have a RT task X at a high priority and cyclictest on each CPU with
lower priority than X's. If X is active and each CPU wakes their own
cylictest thread then it ends in a longer rto_push storm.
A random CPU determines via balance_rt() that the CPU on which X is
running needs to push tasks. X has the highest priority, cyclictest is
next in line so there is nothing that can be done since the task with
the higher priority is not touched.
tell_cpu_to_push() increments rto_loop_next and schedules
rto_push_irq_work_func() on X's CPU. The other CPUs also increment the
loop counter and do the same. Once rto_push_irq_work_func() is active it
does nothing because it has _no_ pushable tasks on its runqueue. Then
checks rto_next_cpu() and decides to queue irq_work on the local CPU
because another CPU requested a push by incrementing the counter.
I have traces where ~30 CPUs request this ~3 times each before it
finally ends. This greatly increases X's runtime while X isn't making
much progress.
Teach rto_next_cpu() to only return CPUs which also have tasks on their
runqueue which can be pushed away. This does not reduce the
tell_cpu_to_push() invocations (rto_loop_next counter increments) but
reduces the amount of issued rto_push_irq_work_func() if nothing can be
done. As the result the overloaded CPU is blocked less often.
There are still cases where the "same job" is repeated several times
(for instance the current CPU needs to resched but didn't yet because
the irq-work is repeated a few times and so the old task remains on the
CPU) but the majority of request end in tell_cpu_to_push() before an IPI
is issued.
Reviewed-by: "Steven Rostedt (Google)" <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20230801152648._y603AS_@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
19/83 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: softirq: Use a dedicated thread for timer wakeups.
Date: Wed, 1 Dec 2021 17:41:09 +0100
A timer/hrtimer softirq is raised in-IRQ context. With threaded
interrupts enabled or on PREEMPT_RT this leads to waking the ksoftirqd
for the processing of the softirq.
Once the ksoftirqd is marked as pending (or is running) it will collect
all raised softirqs. This in turn means that a softirq which would have
been processed at the end of the threaded interrupt, which runs at an
elevated priority, is now moved to ksoftirqd which runs at SCHED_OTHER
priority and competes with every regular task for CPU resources.
This introduces long delays on heavy loaded systems and is not desired
especially if the system is not overloaded by the softirqs.
Split the TIMER_SOFTIRQ and HRTIMER_SOFTIRQ processing into a dedicated
timers thread and let it run at the lowest SCHED_FIFO priority.
RT tasks are are woken up from hardirq context so only timer_list timers
and hrtimers for "regular" tasks are processed here. The higher priority
ensures that wakeups are performed before scheduling SCHED_OTHER tasks.
Using a dedicated variable to store the pending softirq bits values
ensure that the timer are not accidentally picked up by ksoftirqd and
other threaded interrupts.
It shouldn't be picked up by ksoftirqd since it runs at lower priority.
However if the timer bits are ORed while a threaded interrupt is
running, then the timer softirq would be performed at higher priority.
The new timer thread will block on the softirq lock before it starts
softirq work. This "race window" isn't closed because while timer thread
is performing the softirq it can get PI-boosted via the softirq lock by
a random force-threaded thread.
The timer thread can pick up pending softirqs from ksoftirqd but only
if the softirq load is high. It is not be desired that the picked up
softirqs are processed at SCHED_FIFO priority under high softirq load
but this can already happen by a PI-boost by a force-threaded interrupt.
Reported-by: kernel test robot <lkp@intel.com> [ static timer_threads ]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
20/83 [
Author: Frederic Weisbecker
Email: frederic@kernel.org
Subject: rcutorture: Also force sched priority to timersd on boosting test.
Date: Tue, 5 Apr 2022 03:07:51 +0200
ksoftirqd is statically boosted to the priority level right above the
one of rcu_torture_boost() so that timers, which torture readers rely on,
get a chance to run while rcu_torture_boost() is polling.
However timers processing got split from ksoftirqd into their own kthread
(timersd) that isn't boosted. It has the same SCHED_FIFO low prio as
rcu_torture_boost() and therefore timers can't preempt it and may
starve.
The issue can be triggered in practice on v5.17.1-rt17 using:
./kvm.sh --allcpus --configs TREE04 --duration 10m --kconfig "CONFIG_EXPERT=y CONFIG_PREEMPT_RT=y"
Fix this with statically boosting timersd just like is done with
ksoftirqd in commit
ea6d962e80b61 ("rcutorture: Judge RCU priority boosting on grace periods, not callbacks")
Suggested-by: Mel Gorman <mgorman@suse.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lkml.kernel.org/r/20220405010752.1347437-1-frederic@kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
21/83 [
Author: Frederic Weisbecker
Email: frederic@kernel.org
Subject: tick: Fix timer storm since introduction of timersd
Date: Tue, 5 Apr 2022 03:07:52 +0200
If timers are pending while the tick is reprogrammed on nohz_mode, the
next expiry is not armed to fire now, it is delayed one jiffy forward
instead so as not to raise an inextinguishable timer storm with such
scenario:
1) IRQ triggers and queue a timer
2) ksoftirqd() is woken up
3) IRQ tail: timer is reprogrammed to fire now
4) IRQ exit
5) TIMER interrupt
6) goto 3)
...all that until we finally reach ksoftirqd.
Unfortunately we are checking the wrong softirq vector bitmask since
timersd kthread has split from ksoftirqd. Timers now have their own
vector state field that must be checked separately. As a result, the
old timer storm is back. This shows up early on boot with extremely long
initcalls:
[ 333.004807] initcall dquot_init+0x0/0x111 returned 0 after 323822879 usecs
and the cause is uncovered with the right trace events showing just
10 microseconds between ticks (~100 000 Hz):
|swapper/-1 1dn.h111 60818582us : hrtimer_expire_entry: hrtimer=00000000e0ef0f6b function=tick_sched_timer now=60415486608
|swapper/-1 1dn.h111 60818592us : hrtimer_expire_entry: hrtimer=00000000e0ef0f6b function=tick_sched_timer now=60415496082
|swapper/-1 1dn.h111 60818601us : hrtimer_expire_entry: hrtimer=00000000e0ef0f6b function=tick_sched_timer now=60415505550
Fix this by checking the right timer vector state from the nohz code.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lkml.kernel.org/r/20220405010752.1347437-2-frederic@kernel.org
]
22/83 [
Author: Junxiao Chang
Email: junxiao.chang@intel.com
Subject: softirq: Wake ktimers thread also in softirq.
Date: Mon, 20 Feb 2023 09:12:20 +0100
If the hrtimer is raised while a softirq is processed then it does not
wake the corresponding ktimers thread. This is due to the optimisation in the
irq-exit path which is also used to wake the ktimers thread. For the other
softirqs, this is okay because the additional softirq bits will be handled by
the currently running softirq handler.
The timer related softirq bits are added to a different variable and rely on
the ktimers thread.
As a consuequence the wake up of ktimersd is delayed until the next timer tick.
Always wake the ktimers thread if a timer related softirq is pending.
Reported-by: Peh, Hock Zhang <hock.zhang.peh@intel.com>
Signed-off-by: Junxiao Chang <junxiao.chang@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
23/83 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: zram: Replace bit spinlocks with spinlock_t for PREEMPT_RT.
Date: Thu, 31 Mar 2016 04:08:28 +0200
The bit spinlock disables preemption. The spinlock_t lock becomes a sleeping
lock on PREEMPT_RT and it can not be acquired in this context. In this locked
section, zs_free() acquires a zs_pool::lock, and there is access to
zram::wb_limit_lock.
Use a spinlock_t on PREEMPT_RT for locking and set/ clear ZRAM_LOCK bit after
the lock has been acquired/ dropped.
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lkml.kernel.org/r/YqIbMuHCPiQk+Ac2@linutronix.de
Link: https://lore.kernel.org/20230323161830.jFbWCosd@linutronix.de
]
24/83 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: preempt: Put preempt_enable() within an instrumentation*() section.
Date: Wed, 8 Mar 2023 16:29:38 +0100
Callers of preempt_enable() can be within an noinstr section leading to:
| vmlinux.o: warning: objtool: native_sched_clock+0x97: call to preempt_schedule_notrace_thunk() leaves .noinstr.text section
| vmlinux.o: warning: objtool: kvm_clock_read+0x22: call to preempt_schedule_notrace_thunk() leaves .noinstr.text section
| vmlinux.o: warning: objtool: local_clock+0xb4: call to preempt_schedule_notrace_thunk() leaves .noinstr.text section
| vmlinux.o: warning: objtool: enter_from_user_mode+0xea: call to preempt_schedule_thunk() leaves .noinstr.text section
| vmlinux.o: warning: objtool: syscall_enter_from_user_mode+0x140: call to preempt_schedule_thunk() leaves .noinstr.text section
| vmlinux.o: warning: objtool: syscall_enter_from_user_mode_prepare+0xf2: call to preempt_schedule_thunk() leaves .noinstr.text section
| vmlinux.o: warning: objtool: irqentry_enter_from_user_mode+0xea: call to preempt_schedule_thunk() leaves .noinstr.text section
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lore.kernel.org/r/20230309072724.3F6zRkvw@linutronix.de
]
25/83 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: sched/core: Provide a method to check if a task is PI-boosted.
Date: Fri, 4 Aug 2023 13:30:37 +0200
Provide a method to check if a task inherited the priority from another
task. This happens if a task owns a lock which is requested by a task
with higher priority. This can be used as a hint to add a preemption
point to the critical section.
Provide a function which reports true if the task is PI-boosted.
Link: https://lore.kernel.org/r/20230804113039.419794-2-bigeasy@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
26/83 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: softirq: Add function to preempt serving softirqs.
Date: Fri, 4 Aug 2023 13:30:38 +0200
Add a functionality for the softirq handler to preempt its current work
if needed. The softirq core has no particular state. It reads and resets
the pending softirq bits and then processes one after the other.
It can already be preempted while it invokes a certain softirq handler.
By enabling the BH the softirq core releases the per-CPU bh lock which
serializes all softirq handler. It is safe to do as long as the code
does not expect any serialisation in between. A typical scenarion would
after the invocation of callback where no state needs to be preserved
before the next callback is invoked.
Add functionaliry to preempt the serving softirqs.
Link: https://lore.kernel.org/r/20230804113039.419794-3-bigeasy@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
27/83 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: time: Allow to preempt after a callback.
Date: Fri, 4 Aug 2023 13:30:39 +0200
The TIMER_SOFTIRQ handler invokes timer callbacks of the expired timers.
Before each invocation the timer_base::lock is dropped. The only lock
that is still held is the timer_base::expiry_lock and the per-CPU
bh-lock as part of local_bh_disable(). The former is released as part
of lock up prevention if the timer is preempted by the caller which is
waiting for its completion.
Both locks are already released as part of timer_sync_wait_running().
This can be extended by also releasing in bh-lock. The timer core does
not rely on any state that is serialized by the bh-lock. The timer
callback expects the bh-state to be serialized by the lock but there is
no need to keep state synchronized while invoking multiple callbacks.
Preempt handling softirqs and release all locks after a timer invocation
if the current has inherited priority.
Link: https://lore.kernel.org/r/20230804113039.419794-4-bigeasy@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
28/83 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: printk: Add non-BKL console basic infrastructure
Date: Sun, 11 Sep 2022 00:28:01 +0200
The current console/printk subsystem is protected by a Big Kernel Lock,
(aka console_lock) which has ill defined semantics and is more or less
stateless. This puts severe limitations on the console subsystem and
makes forced takeover and output in emergency and panic situations a
fragile endavour which is based on try and pray.
The goal of non-BKL consoles is to break out of the console lock jail
and to provide a new infrastructure that avoids the pitfalls and
allows console drivers to be gradually converted over.
The proposed infrastructure aims for the following properties:
- Per console locking instead of global locking
- Per console state which allows to make informed decisions
- Stateful handover and takeover
As a first step state is added to struct console. The per console state
is an atomic_long_t with a 32bit bit field and on 64bit also a 32bit
sequence for tracking the last printed ringbuffer sequence number. On
32bit the sequence is separate from state for obvious reasons which
requires handling a few extra race conditions.
Reserve state bits, which will be populated later in the series. Wire
it up into the console register/unregister functionality and exclude
such consoles from being handled in the console BKL mechanisms. Since
the non-BKL consoles will not depend on the console lock/unlock dance
for printing, only perform said dance if a BKL console is registered.
The decision to use a bitfield was made as using a plain u32 with
mask/shift operations turned out to result in uncomprehensible code.
Co-developed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
29/83 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: printk: nobkl: Add acquire/release logic
Date: Sun, 11 Sep 2022 00:28:02 +0200
Add per console acquire/release functionality. The console 'locked'
state is a combination of several state fields:
- The 'locked' bit
- The 'cpu' field that denotes on which CPU the console is locked
- The 'cur_prio' field that contains the severity of the printk
context that owns the console. This field is used for decisions
whether to attempt friendly handovers and also prevents takeovers
from a less severe context, e.g. to protect the panic CPU.
The acquire mechanism comes with several flavours:
- Straight forward acquire when the console is not contended
- Friendly handover mechanism based on a request/grant handshake
The requesting context:
1) Puts the desired handover state (CPU nr, prio) into a
separate handover state
2) Sets the 'req_prio' field in the real console state
3) Waits (with a timeout) for the owning context to handover
The owning context:
1) Observes the 'req_prio' field set
2) Hands the console over to the requesting context by
switching the console state to the handover state that was
provided by the requester
- Hostile takeover
The new owner takes the console over without handshake
This is required when friendly handovers are not possible,
i.e. the higher priority context interrupted the owning context
on the same CPU or the owning context is not able to make
progress on a remote CPU.
The release is the counterpart which either releases the console
directly or hands it gracefully over to a requester.
All operations on console::atomic_state[CUR|REQ] are atomic
cmpxchg based to handle concurrency.
The acquire/release functions implement only minimal policies:
- Preference for higher priority contexts
- Protection of the panic CPU
All other policy decisions have to be made at the call sites.
The design allows to implement the well known:
acquire()
output_one_line()
release()
algorithm, but also allows to avoid the per line acquire/release for
e.g. panic situations by doing the acquire once and then relying on
the panic CPU protection for the rest.
Co-developed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
30/83 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: printk: nobkl: Add buffer management
Date: Sun, 11 Sep 2022 00:28:04 +0200
In case of hostile takeovers it must be ensured that the previous
owner cannot scribble over the output buffer of the emergency/panic
context. This is achieved by:
- Adding a global output buffer instance for early boot (pre per CPU
data being available).
- Allocating an output buffer per console for threaded printers once
printer threads become available.
- Allocating per CPU output buffers per console for printing from
all contexts not covered by the other buffers.
- Choosing the appropriate buffer is handled in the acquire/release
functions.
The output buffer is wrapped into a separate data structure so other
context related fields can be added in later steps.
Co-developed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
31/83 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: printk: nobkl: Add sequence handling
Date: Sun, 11 Sep 2022 00:28:06 +0200
On 64bit systems the sequence tracking is embedded into the atomic
console state, on 32bit it has to be stored in a separate atomic
member. The latter needs to handle the non-atomicity in hostile
takeover cases, while 64bit can completely rely on the state
atomicity.
The ringbuffer sequence number is 64bit, but having a 32bit
representation in the console is sufficient. If a console ever gets
more than 2^31 records behind the ringbuffer then this is the least
of the problems.
On acquire() the atomic 32bit sequence number is expanded to 64 bit
by folding the ringbuffer's sequence into it carefully.
Co-developed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
32/83 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: printk: nobkl: Add print state functions
Date: Sun, 11 Sep 2022 00:28:07 +0200
Provide three functions which are related to the safe handover
mechanism and allow console drivers to denote takeover unsafe
sections:
- console_can_proceed()
Invoked by a console driver to check whether a handover request
is pending or whether the console was taken over in a hostile
fashion.
- console_enter/exit_unsafe()
Invoked by a console driver to denote that the driver output
function is about to enter or to leave an critical region where a
hostile take over is unsafe. These functions are also
cancellation points.
The unsafe state is stored in the console state and allows a
takeover attempt to make informed decisions whether to take over
and/or output on such a console at all. The unsafe state is also
available to the driver in the write context for the
atomic_write() output function so the driver can make informed
decisions about the required actions or take a special emergency
path.
Co-developed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
33/83 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: printk: nobkl: Add emit function and callback functions for atomic printing
Date: Sun, 11 Sep 2022 00:28:10 +0200
Implement an emit function for non-BKL consoles to output printk
messages. It utilizes the lockless printk_get_next_message() and
console_prepend_dropped() functions to retrieve/build the output
message. The emit function includes the required safety points to
check for handover/takeover and calls a new write_atomic callback
of the console driver to output the message. It also includes proper
handling for updating the non-BKL console sequence number.
Co-developed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
34/83 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: printk: nobkl: Introduce printer threads
Date: Sun, 11 Sep 2022 00:28:12 +0200
Add the infrastructure to create a printer thread per console along
with the required thread function, which is takeover/handover aware.
Co-developed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
35/83 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: printk: nobkl: Add printer thread wakeups
Date: Tue, 28 Feb 2023 10:01:59 +0000
Add a function to wakeup the printer threads. Use the new function
when:
- records are added to the printk ringbuffer
- consoles are started
- consoles are resumed
The actual waking is performed via irq_work so that the wakeup can
be triggered from any context.
Co-developed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
36/83 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: printk: nobkl: Add write context storage for atomic writes
Date: Sun, 11 Sep 2022 00:28:13 +0200
The number of consoles is unknown at compile time and allocating
write contexts on stack in emergency/panic situations is not desired
either.
Allocate a write context array (one for each priority level) along
with the per CPU output buffers, thus allowing atomic contexts on
multiple CPUs and priority levels to execute simultaneously without
clobbering each other's write context.
Co-developed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
37/83 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: printk: nobkl: Provide functions for atomic write enforcement
Date: Sun, 11 Sep 2022 00:28:15 +0200
Threaded printk is the preferred mechanism to tame the noisyness of
printk, but WARN/OOPS/PANIC require printing out immediately since
the printer threads might not be able to run.
Add per CPU state to denote the priority/urgency of the output and
provide functions to flush the printk backlog for priority elevated
contexts and when the printing threads are not available (such as
early boot).
Note that when a CPU is in a priority elevated state, flushing only
occurs when dropping back to a lower priority. This allows the full
set of printk records (WARN/OOPS/PANIC output) to be stored in the
ringbuffer before beginning to flush the backlog.
Co-developed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
38/83 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: nobkl: Stop threads on shutdown/reboot
Date: Mon, 27 Feb 2023 17:59:01 +0100
Register a syscore_ops shutdown function to stop all threaded
printers on shutdown/reboot. This allows printk to transition back
to atomic printing in order to provide a robust mechanism for
outputting the final messages.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
39/83 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: tty: tty_io: Show non-BKL consoles as active
Date: Fri, 10 Mar 2023 13:08:01 +0000
/sys/class/tty/console/active shows the consoles that are currently
registered and enabled and are able to print (i.e. have implemented
a write() callback). This is used by userspace programs such as
systemd's systemd-getty-generator to determine where consoles are
in order to automatically start a getty instance.
The non-BKL consoles do not implement write() but also should be
shown as an active console. Expand the conditions to also check if
the callbacks write_thread() or write_atomic() are implemented for
non-BKL consoles.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
40/83 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: proc: consoles: Add support for non-BKL consoles
Date: Fri, 10 Mar 2023 13:36:01 +0000
Show 'W' if a non-BKL console implements write_thread() or
write_atomic().
Add a new flag 'N' to show if it is a non-BKL console.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
41/83 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: kernel/panic: Add atomic write enforcement to warn/panic
Date: Sun, 11 Sep 2022 00:28:17 +0200
Invoke the atomic write enforcement functions for warn/panic to
ensure that the information gets out to the consoles.
For the panic case, add explicit intermediate atomic flush calls to
ensure immediate flushing at important points. Otherwise the atomic
flushing only occurs when dropping out of the elevated priority,
which for panic may never happen.
It is important to note that if there are any legacy consoles
registered, they will be attempting to directly print from the
printk-caller context, which may jeopardize the reliability of the
atomic consoles. Optimally there should be no legacy consoles
registered.
Co-developed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
42/83 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: rcu: Add atomic write enforcement for rcu stalls
Date: Wed, 14 Sep 2022 08:17:13 +0206
Invoke the atomic write enforcement functions for rcu stalls to
ensure that the information gets out to the consoles.
It is important to note that if there are any legacy consoles
registered, they will be attempting to directly print from the
printk-caller context, which may jeopardize the reliability of the
atomic consoles. Optimally there should be no legacy consoles
registered.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
43/83 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: Perform atomic flush in console_flush_on_panic()
Date: Tue, 28 Feb 2023 13:55:00 +0000
Typically the panic() function will take care of atomic flushing the
non-BKL consoles on panic. However, there are several users of
console_flush_on_panic() outside of panic().
Also perform atomic flushing in console_flush_on_panic(). A new
function cons_force_seq() is implemented to support the
mode=CONSOLE_REPLAY_ALL feature.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
44/83 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: only disable if actually unregistered
Date: Tue, 7 Mar 2023 18:22:26 +0000
Currently in unregister_console() a printk message is generated and
the console is disabled, even it was never registered. There are
code paths (such as uart_remove_one_port()) that call
unregister_console() even if the console is not registered.
It is confusing to see messages about consoles being disabled that
were never disabled. Move the printk and disabling later, when it
is known that the console is actually registered.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
45/83 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: Add threaded printing support for BKL consoles.
Date: Sun, 11 Sep 2022 00:28:12 +0200
Add threaded printing support for BKL consoles on PREEMPT_RT.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
46/83 [
Author: Sebastian Siewior
Email: bigeasy@linutronix.de
Subject: printk: replace local_irq_save with local_lock for safe mode
Date: Tue, 14 Mar 2023 16:51:44 +0100
Safe mode disables interrupts in order to minimize the window where
printk calls use deferred printing. Currently local_irq_save() is
used for this, however on PREEMPT_RT this can lead to large latencies
because safe mode is enabled for the duration of printing a record.
Use a local_lock instead of local_irq_save(). For !PREEMPT_RT it
has the same affect of disabling interrupts for that CPU. For
PREEMPT_RT it will disable preemption, which is enough to prevent
interruption from the irq threads.
Note that disabling preemption for PREEMPT_RT is also very bad
since it is still blocking RT tasks. The atomic/threaded (NOBKL)
consoles were developed such that safe mode is not needed. So it
is expected that a PREEMPT_RT machine does not run with any legacy
consoles registered.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
47/83 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: serial: 8250: implement non-BKL console
Date: Fri, 3 Mar 2023 12:27:31 +0000
Implement the necessary callbacks to allow the 8250 console driver
to perform as a non-BKL console. Remove the implementation for the
legacy console callback (write) and add implementations for the
non-BKL consoles (write_atomic, write_thread, port_lock) and add
CON_NO_BKL to the initial flags.
Although non-BKL consoles can coexist with legacy consoles, you
will only receive all the benefits of the non-BKL consoles if
this console driver is the only console. That means no netconsole,
no tty1, no earlyprintk, no earlycon. Just the uart8250.
For example: console=ttyS0,115200
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
48/83 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: printk: Check only for migration in printk_deferred_*().
Date: Fri, 23 Jun 2023 15:58:51 +0200
Atomic context is not required by the implementation. The only
requirement is that the caller does not migrate to another CPU between
the _enter() and _exit() invocation. The reason is to increment and
decrement the per-CPU variable on the same CPU.
Checking for migration only allows to use deferred printk on PREEMPT_RT
when only sleeping locks are acquired.
Check for disabled migration instead for atomic context in
printk_deferred_*()
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
49/83 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: drm/i915: Use preempt_disable/enable_rt() where recommended
Date: Sat, 27 Feb 2016 08:09:11 +0100
Mario Kleiner suggest in commit
ad3543ede630f ("drm/intel: Push get_scanout_position() timestamping into kms driver.")
a spots where preemption should be disabled on PREEMPT_RT. The
difference is that on PREEMPT_RT the intel_uncore::lock disables neither
preemption nor interrupts and so region remains preemptible.
The area covers only register reads and writes. The part that worries me
is:
- __intel_get_crtc_scanline() the worst case is 100us if no match is
found.
- intel_crtc_scanlines_since_frame_timestamp() not sure how long this
may take in the worst case.
It was in the RT queue for a while and nobody complained.
Disable preemption on PREEPMPT_RT during timestamping.
[bigeasy: patch description.]
Cc: Mario Kleiner <mario.kleiner.de@gmail.com>
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
50/83 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: drm/i915: Don't disable interrupts on PREEMPT_RT during atomic updates
Date: Sat, 27 Feb 2016 09:01:42 +0100
Commit
8d7849db3eab7 ("drm/i915: Make sprite updates atomic")
started disabling interrupts across atomic updates. This breaks on PREEMPT_RT
because within this section the code attempt to acquire spinlock_t locks which
are sleeping locks on PREEMPT_RT.
According to the comment the interrupts are disabled to avoid random delays and
not required for protection or synchronisation.
If this needs to happen with disabled interrupts on PREEMPT_RT, and the
whole section is restricted to register access then all sleeping locks
need to be acquired before interrupts are disabled and some function
maybe moved after enabling interrupts again.
This includes:
- prepare_to_wait() + finish_wait() due its wake queue.
- drm_crtc_vblank_put() -> vblank_disable_fn() drm_device::vbl_lock.
- skl_pfit_enable(), intel_update_plane(), vlv_atomic_update_fifo() and
maybe others due to intel_uncore::lock
- drm_crtc_arm_vblank_event() due to drm_device::event_lock and
drm_device::vblank_time_lock.
Don't disable interrupts on PREEMPT_RT during atomic updates.
[bigeasy: drop local locks, commit message]
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
51/83 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: drm/i915: Don't check for atomic context on PREEMPT_RT
Date: Mon, 25 Oct 2021 15:05:18 +0200
The !in_atomic() check in _wait_for_atomic() triggers on PREEMPT_RT
because the uncore::lock is a spinlock_t and does not disable
preemption or interrupts.
Changing the uncore:lock to a raw_spinlock_t doubles the worst case
latency on an otherwise idle testbox during testing. Therefore I'm
currently unsure about changing this.
Link: https://lore.kernel.org/all/20211006164628.s2mtsdd2jdbfyf7g@linutronix.de/
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
52/83 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: drm/i915: Disable tracing points on PREEMPT_RT
Date: Thu, 6 Dec 2018 09:52:20 +0100
Luca Abeni reported this:
| BUG: scheduling while atomic: kworker/u8:2/15203/0x00000003
| CPU: 1 PID: 15203 Comm: kworker/u8:2 Not tainted 4.19.1-rt3 #10
| Call Trace:
| rt_spin_lock+0x3f/0x50
| gen6_read32+0x45/0x1d0 [i915]
| g4x_get_vblank_counter+0x36/0x40 [i915]
| trace_event_raw_event_i915_pipe_update_start+0x7d/0xf0 [i915]
The tracing events use trace_i915_pipe_update_start() among other events
use functions acquire spinlock_t locks which are transformed into
sleeping locks on PREEMPT_RT. A few trace points use
intel_get_crtc_scanline(), others use ->get_vblank_counter() wich also
might acquire a sleeping locks on PREEMPT_RT.
At the time the arguments are evaluated within trace point, preemption
is disabled and so the locks must not be acquired on PREEMPT_RT.
Based on this I don't see any other way than disable trace points on
PREMPT_RT.
Reported-by: Luca Abeni <lucabe72@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
53/83 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: drm/i915: skip DRM_I915_LOW_LEVEL_TRACEPOINTS with NOTRACE
Date: Wed, 19 Dec 2018 10:47:02 +0100
The order of the header files is important. If this header file is
included after tracepoint.h was included then the NOTRACE here becomes a
nop. Currently this happens for two .c files which use the tracepoitns
behind DRM_I915_LOW_LEVEL_TRACEPOINTS.
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
54/83 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: drm/i915/gt: Queue and wait for the irq_work item.
Date: Wed, 8 Sep 2021 17:18:00 +0200
Disabling interrupts and invoking the irq_work function directly breaks
on PREEMPT_RT.
PREEMPT_RT does not invoke all irq_work from hardirq context because
some of the user have spinlock_t locking in the callback function.
These locks are then turned into a sleeping locks which can not be
acquired with disabled interrupts.
Using irq_work_queue() has the benefit that the irqwork will be invoked
in the regular context. In general there is "no" delay between enqueuing
the callback and its invocation because the interrupt is raised right
away on architectures which support it (which includes x86).
Use irq_work_queue() + irq_work_sync() instead invoking the callback
directly.
Reported-by: Clark Williams <williams@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
]
55/83 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: drm/i915/gt: Use spin_lock_irq() instead of local_irq_disable() + spin_lock()
Date: Wed, 8 Sep 2021 19:03:41 +0200
execlists_dequeue() is invoked from a function which uses
local_irq_disable() to disable interrupts so the spin_lock() behaves
like spin_lock_irq().
This breaks PREEMPT_RT because local_irq_disable() + spin_lock() is not
the same as spin_lock_irq().
execlists_dequeue_irq() and execlists_dequeue() has each one caller
only. If intel_engine_cs::active::lock is acquired and released with the
_irq suffix then it behaves almost as if execlists_dequeue() would be
invoked with disabled interrupts. The difference is the last part of the
function which is then invoked with enabled interrupts.
I can't tell if this makes a difference. From looking at it, it might
work to move the last unlock at the end of the function as I didn't find
anything that would acquire the lock again.
Reported-by: Clark Williams <williams@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
]
56/83 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: drm/i915: Drop the irqs_disabled() check
Date: Fri, 1 Oct 2021 20:01:03 +0200
The !irqs_disabled() check triggers on PREEMPT_RT even with
i915_sched_engine::lock acquired. The reason is the lock is transformed
into a sleeping lock on PREEMPT_RT and does not disable interrupts.
There is no need to check for disabled interrupts. The lockdep
annotation below already check if the lock has been acquired by the
caller and will yell if the interrupts are not disabled.
Remove the !irqs_disabled() check.
Reported-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
57/83 [
Author: Tvrtko Ursulin
Email: tvrtko.ursulin@intel.com
Subject: drm/i915: Do not disable preemption for resets
Date: Wed, 5 Jul 2023 10:30:25 +0100
Commit ade8a0f59844 ("drm/i915: Make all GPU resets atomic") added a
preempt disable section over the hardware reset callback to prepare the
driver for being able to reset from atomic contexts.
In retrospect I can see that the work item at a time was about removing
the struct mutex from the reset path. Code base also briefly entertained
the idea of doing the reset under stop_machine in order to serialize
userspace mmap and temporary glitch in the fence registers (see
eb8d0f5af4ec ("drm/i915: Remove GPU reset dependence on struct_mutex"),
but that never materialized and was soon removed in 2caffbf11762
("drm/i915: Revoke mmaps and prevent access to fence registers across
reset") and replaced with a SRCU based solution.
As such, as far as I can see, today we still have a requirement that
resets must not sleep (invoked from submission tasklets), but no need to
support invoking them from a truly atomic context.
Given that the preemption section is problematic on RT kernels, since the
uncore lock becomes a sleeping lock and so is invalid in such section,
lets try and remove it. Potential downside is that our short waits on GPU
to complete the reset may get extended if CPU scheduling interferes, but
in practice that probably isn't a deal breaker.
In terms of mechanics, since the preemption disabled block is being
removed we just need to replace a few of the wait_for_atomic macros into
busy looping versions which will work (and not complain) when called from
non-atomic sections.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris.p.wilson@intel.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lore.kernel.org/r/20230705093025.3689748-1-tvrtko.ursulin@linux.intel.com
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
58/83 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: Revert "drm/i915: Depend on !PREEMPT_RT."
Date: Mon, 21 Feb 2022 17:59:14 +0100
Once the known issues are addressed, it should be safe to enable the
driver.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
59/83 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Add support for lazy preemption
Date: Fri, 26 Oct 2012 18:50:54 +0100
It has become an obsession to mitigate the determinism vs. throughput
loss of RT. Looking at the mainline semantics of preemption points
gives a hint why RT sucks throughput wise for ordinary SCHED_OTHER
tasks. One major issue is the wakeup of tasks which are right away
preempting the waking task while the waking task holds a lock on which
the woken task will block right after having preempted the wakee. In
mainline this is prevented due to the implicit preemption disable of
spin/rw_lock held regions. On RT this is not possible due to the fully
preemptible nature of sleeping spinlocks.
Though for a SCHED_OTHER task preempting another SCHED_OTHER task this
is really not a correctness issue. RT folks are concerned about
SCHED_FIFO/RR tasks preemption and not about the purely fairness
driven SCHED_OTHER preemption latencies.
So I introduced a lazy preemption mechanism which only applies to
SCHED_OTHER tasks preempting another SCHED_OTHER task. Aside of the
existing preempt_count each tasks sports now a preempt_lazy_count
which is manipulated on lock acquiry and release. This is slightly
incorrect as for lazyness reasons I coupled this on
migrate_disable/enable so some other mechanisms get the same treatment
(e.g. get_cpu_light).
Now on the scheduler side instead of setting NEED_RESCHED this sets
NEED_RESCHED_LAZY in case of a SCHED_OTHER/SCHED_OTHER preemption and
therefor allows to exit the waking task the lock held region before
the woken task preempts. That also works better for cross CPU wakeups
as the other side can stay in the adaptive spinning loop.
For RT class preemption there is no change. This simply sets
NEED_RESCHED and forgoes the lazy preemption counter.
Initial test do not expose any observable latency increasement, but
history shows that I've been proven wrong before :)
The lazy preemption mode is per default on, but with
CONFIG_SCHED_DEBUG enabled it can be disabled via:
# echo NO_PREEMPT_LAZY >/sys/kernel/debug/sched_features
and reenabled via
# echo PREEMPT_LAZY >/sys/kernel/debug/sched_features
The test results so far are very machine and workload dependent, but
there is a clear trend that it enhances the non RT workload
performance.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
60/83 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: x86/entry: Use should_resched() in idtentry_exit_cond_resched()
Date: Tue, 30 Jun 2020 11:45:14 +0200
The TIF_NEED_RESCHED bit is inlined on x86 into the preemption counter.
By using should_resched(0) instead of need_resched() the same check can
be performed which uses the same variable as 'preempt_count()` which was
issued before.
Use should_resched(0) instead need_resched().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
61/83 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: x86: Support for lazy preemption
Date: Thu, 1 Nov 2012 11:03:47 +0100
Implement the x86 pieces for lazy preempt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
62/83 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: arm: Add support for lazy preemption
Date: Wed, 31 Oct 2012 12:04:11 +0100
Implement the arm pieces for lazy preempt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
63/83 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: powerpc: Add support for lazy preemption
Date: Thu, 1 Nov 2012 10:14:11 +0100
Implement the powerpc pieces for lazy preempt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
64/83 [
Author: Anders Roxell
Email: anders.roxell@linaro.org
Subject: arch/arm64: Add lazy preempt support
Date: Thu, 14 May 2015 17:52:17 +0200
arm64 is missing support for PREEMPT_RT. The main feature which is
lacking is support for lazy preemption. The arch-specific entry code,
thread information structure definitions, and associated data tables
have to be extended to provide this support. Then the Kconfig file has
to be extended to indicate the support is available, and also to
indicate that support for full RT preemption is now available.
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
65/83 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: arm: Disable jump-label on PREEMPT_RT.
Date: Wed, 8 Jul 2015 17:14:48 +0200
jump-labels are used to efficiently switch between two possible code
paths. To achieve this, stop_machine() is used to keep the CPU in a
known state while the opcode is modified. The usage of stop_machine()
here leads to large latency spikes which can be observed on PREEMPT_RT.
Jump labels may change the target during runtime and are not restricted
to debug or "configuration/ setup" part of a PREEMPT_RT system where
high latencies could be defined as acceptable.
Disable jump-label support on a PREEMPT_RT system.
[bigeasy: Patch description.]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lkml.kernel.org/r/20220613182447.112191-2-bigeasy@linutronix.de
]
66/83 [
Author: Yadi.hu
Email: yadi.hu@windriver.com
Subject: ARM: enable irq in translation/section permission fault handlers
Date: Wed, 10 Dec 2014 10:32:09 +0800
Probably happens on all ARM, with
CONFIG_PREEMPT_RT
CONFIG_DEBUG_ATOMIC_SLEEP
This simple program....
int main() {
*((char*)0xc0001000) = 0;
};
[ 512.742724] BUG: sleeping function called from invalid context at kernel/rtmutex.c:658
[ 512.743000] in_atomic(): 0, irqs_disabled(): 128, pid: 994, name: a
[ 512.743217] INFO: lockdep is turned off.
[ 512.743360] irq event stamp: 0
[ 512.743482] hardirqs last enabled at (0): [< (null)>] (null)
[ 512.743714] hardirqs last disabled at (0): [<c0426370>] copy_process+0x3b0/0x11c0
[ 512.744013] softirqs last enabled at (0): [<c0426370>] copy_process+0x3b0/0x11c0
[ 512.744303] softirqs last disabled at (0): [< (null)>] (null)
[ 512.744631] [<c041872c>] (unwind_backtrace+0x0/0x104)
[ 512.745001] [<c09af0c4>] (dump_stack+0x20/0x24)
[ 512.745355] [<c0462490>] (__might_sleep+0x1dc/0x1e0)
[ 512.745717] [<c09b6770>] (rt_spin_lock+0x34/0x6c)
[ 512.746073] [<c0441bf0>] (do_force_sig_info+0x34/0xf0)
[ 512.746457] [<c0442668>] (force_sig_info+0x18/0x1c)
[ 512.746829] [<c041d880>] (__do_user_fault+0x9c/0xd8)
[ 512.747185] [<c041d938>] (do_bad_area+0x7c/0x94)
[ 512.747536] [<c041d990>] (do_sect_fault+0x40/0x48)
[ 512.747898] [<c040841c>] (do_DataAbort+0x40/0xa0)
[ 512.748181] Exception stack(0xecaa1fb0 to 0xecaa1ff8)
Oxc0000000 belongs to kernel address space, user task can not be
allowed to access it. For above condition, correct result is that
test case should receive a “segment fault” and exits but not stacks.
the root cause is commit 02fe2845d6a8 ("avoid enabling interrupts in
prefetch/data abort handlers"),it deletes irq enable block in Data
abort assemble code and move them into page/breakpiont/alignment fault
handlers instead. But author does not enable irq in translation/section
permission fault handlers. ARM disables irq when it enters exception/
interrupt mode, if kernel doesn't enable irq, it would be still disabled
during translation/section permission fault.
We see the above splat because do_force_sig_info is still called with
IRQs off, and that code eventually does a:
spin_lock_irqsave(&t->sighand->siglock, flags);
As this is architecture independent code, and we've not seen any other
need for other arch to have the siglock converted to raw lock, we can
conclude that we should enable irq for ARM translation/section
permission exception.
Signed-off-by: Yadi.hu <yadi.hu@windriver.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
67/83 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: tty/serial/omap: Make the locking RT aware
Date: Thu, 28 Jul 2011 13:32:57 +0200
The lock is a sleeping lock and local_irq_save() is not the
optimsation we are looking for. Redo it to make it work on -RT and
non-RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
68/83 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: tty/serial/pl011: Make the locking work on RT
Date: Tue, 8 Jan 2013 21:36:51 +0100
The lock is a sleeping lock and local_irq_save() is not the optimsation
we are looking for. Redo it to make it work on -RT and non-RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
69/83 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: ARM: vfp: Provide vfp_lock() for VFP locking.
Date: Fri, 19 May 2023 16:57:29 +0200
kernel_neon_begin() uses local_bh_disable() to ensure exclusive access
to the VFP unit. This is broken on PREEMPT_RT because a BH disabled
section remains preemptible on PREEMPT_RT.
Introduce vfp_lock() which uses local_bh_disable() and preempt_disable()
on PREEMPT_RT. Since softirqs are processed always in thread context,
disabling preemption is enough to ensure that the current context won't
get interrupted by something that is using the VFP. Use it in
kernel_neon_begin().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
70/83 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: ARM: vfp: Use vfp_lock() in vfp_sync_hwstate().
Date: Fri, 19 May 2023 16:57:30 +0200
vfp_sync_hwstate() uses preempt_disable() followed by local_bh_disable()
to ensure that it won't get interrupted while checking the VFP state.
This harms PREEMPT_RT because softirq handling can get preempted and
local_bh_disable() synchronizes the related section with a sleeping lock
which does not work with disabled preemption.
Use the vfp_lock() to synchronize the access.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
71/83 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: ARM: vfp: Use vfp_lock() in vfp_support_entry().
Date: Wed, 28 Jun 2023 09:36:10 +0200
vfp_entry() is invoked from exception handler and is fully preemptible.
It uses local_bh_disable() to remain uninterrupted while checking the
VFP state.
This is not working on PREEMPT_RT because local_bh_disable()
synchronizes the relevant section but the context remains fully
preemptible.
Use vfp_lock() for uninterrupted access.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
72/83 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: ARM: vfp: Move sending signals outside of vfp_lock()ed section.
Date: Wed, 28 Jun 2023 09:39:33 +0200
VFP_bounce() is invoked from within vfp_support_entry() and may send a
signal. Sending a signal uses spinlock_t which becomes a sleeping lock
on PREEMPT_RT and must not be acquired within a preempt-disabled
section.
Move the vfp_raise_sigfpe() block outside of the vfp_lock() section.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
73/83 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: ARM: Allow to enable RT
Date: Fri, 11 Oct 2019 13:14:29 +0200
Allow to select RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
74/83 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: ARM64: Allow to enable RT
Date: Fri, 11 Oct 2019 13:14:35 +0200
Allow to select RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
75/83 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: powerpc: traps: Use PREEMPT_RT
Date: Fri, 26 Jul 2019 11:30:49 +0200
Add PREEMPT_RT to the backtrace if enabled.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
76/83 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: powerpc/pseries/iommu: Use a locallock instead local_irq_save()
Date: Tue, 26 Mar 2019 18:31:54 +0100
The locallock protects the per-CPU variable tce_page. The function
attempts to allocate memory while tce_page is protected (by disabling
interrupts).
Use local_irq_save() instead of local_irq_disable().
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
77/83 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: powerpc/imc-pmu: Use the correct spinlock initializer.
Date: Thu, 9 Mar 2023 09:09:01 +0100
The macro __SPIN_LOCK_INITIALIZER() is implementation specific. Users
that desire to initialize a spinlock in a struct must use
__SPIN_LOCK_UNLOCKED().
Use __SPIN_LOCK_UNLOCKED() for the spinlock_t in imc_global_refc.
Fixes: 76d588dddc459 ("powerpc/imc-pmu: Fix use of mutex in IRQs disabled section")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lore.kernel.org/20230309134831.Nz12nqsU@linutronix.de
]
78/83 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: powerpc/pseries: Select the generic memory allocator.
Date: Thu, 9 Mar 2023 09:13:52 +0100
The RTAS work area allocator is using the generic memory allocator and
as such it must select it.
Select the generic memory allocator on pseries.
Fixes: 43033bc62d349 ("powerpc/pseries: add RTAS work area allocator")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lore.kernel.org/20230309135110.uAxhqRFk@linutronix.de
]
79/83 [
Author: Bogdan Purcareata
Email: bogdan.purcareata@freescale.com
Subject: powerpc/kvm: Disable in-kernel MPIC emulation for PREEMPT_RT
Date: Fri, 24 Apr 2015 15:53:13 +0000
While converting the openpic emulation code to use a raw_spinlock_t enables
guests to run on RT, there's still a performance issue. For interrupts sent in
directed delivery mode with a multiple CPU mask, the emulated openpic will loop
through all of the VCPUs, and for each VCPUs, it call IRQ_check, which will loop
through all the pending interrupts for that VCPU. This is done while holding the
raw_lock, meaning that in all this time the interrupts and preemption are
disabled on the host Linux. A malicious user app can max both these number and
cause a DoS.
This temporary fix is sent for two reasons. First is so that users who want to
use the in-kernel MPIC emulation are aware of the potential latencies, thus
making sure that the hardware MPIC and their usage scenario does not involve
interrupts sent in directed delivery mode, and the number of possible pending
interrupts is kept small. Secondly, this should incentivize the development of a
proper openpic emulation that would be better suited for RT.
Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Bogdan Purcareata <bogdan.purcareata@freescale.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
80/83 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: powerpc/stackprotector: work around stack-guard init from atomic
Date: Tue, 26 Mar 2019 18:31:29 +0100
This is invoked from the secondary CPU in atomic context. On x86 we use
tsc instead. On Power we XOR it against mftb() so lets use stack address
as the initial value.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
81/83 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: POWERPC: Allow to enable RT
Date: Fri, 11 Oct 2019 13:14:41 +0200
Allow to select RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
82/83 [
Author: Clark Williams
Email: williams@redhat.com
Subject: sysfs: Add /sys/kernel/realtime entry
Date: Sat, 30 Jul 2011 21:55:53 -0500
Add a /sys/kernel entry to indicate that the kernel is a
realtime kernel.
Clark says that he needs this for udev rules, udev needs to evaluate
if its a PREEMPT_RT kernel a few thousand times and parsing uname
output is too slow or so.
Are there better solutions? Should it exist and return 0 on !-rt?
Signed-off-by: Clark Williams <williams@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
83/83 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: Add localversion for -RT release
Date: Fri, 8 Jul 2011 20:25:16 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
1/50 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: vduse: Remove include of rwlock.h
Date: Tue, 16 Aug 2022 09:45:22 +0200
rwlock.h should not be included directly. Instead linux/splinlock.h
should be included. Including it directly will break the RT build.
Remove the rwlock.h include.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lkml.kernel.org/r/20221026134407.711768-1-bigeasy@linutronix.de
]
2/50 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: signal: Don't disable preemption in ptrace_stop() on PREEMPT_RT.
Date: Wed, 22 Jun 2022 11:36:17 +0200
Commit
53da1d9456fe7 ("fix ptrace slowness")
is just band aid around the problem.
The invocation of do_notify_parent_cldstop() wakes the parent and makes
it runnable. The scheduler then wants to replace this still running task
with the parent. With the read_lock() acquired this is not possible
because preemption is disabled and so this is deferred until read_unlock().
This scheduling point is undesired and is avoided by disabling preemption
around the unlock operation enabled again before the schedule() invocation
without a preemption point.
This is only undesired because the parent sleeps a cycle in
wait_task_inactive() until the traced task leaves the run-queue in
schedule(). It is not a correctness issue, it is just band aid to avoid the
visbile delay which sums up over multiple invocations.
The task can still be preempted if an interrupt occurs between
preempt_enable_no_resched() and freezable_schedule() because on the IRQ-exit
path of the interrupt scheduling _will_ happen. This is ignored since it does
not happen very often.
On PREEMPT_RT keeping preemption disabled during the invocation of
cgroup_enter_frozen() becomes a problem because the function acquires
css_set_lock which is a sleeping lock on PREEMPT_RT and must not be
acquired with disabled preemption.
Don't disable preemption on PREEMPT_RT. Remove the TODO regarding adding
read_unlock_no_resched() as there is no need for it and will cause harm.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lkml.kernel.org/r/20220720154435.232749-2-bigeasy@linutronix.de
]
3/50 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: sched: Consider task_struct::saved_state in wait_task_inactive().
Date: Wed, 22 Jun 2022 12:27:05 +0200
Ptrace is using wait_task_inactive() to wait for the tracee to reach a
certain task state. On PREEMPT_RT that state may be stored in
task_struct::saved_state while the tracee blocks on a sleeping lock and
task_struct::__state is set to TASK_RTLOCK_WAIT.
It is not possible to check only for TASK_RTLOCK_WAIT to be sure that the task
is blocked on a sleeping lock because during wake up (after the sleeping lock
has been acquired) the task state is set TASK_RUNNING. After the task in on CPU
and acquired the pi_lock it will reset the state accordingly but until then
TASK_RUNNING will be observed (with the desired state saved in saved_state).
Check also for task_struct::saved_state if the desired match was not found in
task_struct::__state on PREEMPT_RT. If the state was found in saved_state, wait
until the task is idle and state is visible in task_struct::__state.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Link: https://lkml.kernel.org/r/Yt%2FpQAFQ1xKNK0RY@linutronix.de
]
4/50 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: spi: Remove the obsolte u64_stats_fetch_*_irq() users.
Date: Thu, 25 Aug 2022 16:15:32 +0200
Now that the 32bit UP oddity is gone and 32bit uses always a sequence
count, there is no need for the fetch_irq() variants anymore.
Convert to the regular interface.
Cc: Mark Brown <broonie@kernel.org>
Cc: linux-spi@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
]
5/50 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: net: Remove the obsolte u64_stats_fetch_*_irq() users (drivers).
Date: Thu, 25 Aug 2022 16:15:44 +0200
Now that the 32bit UP oddity is gone and 32bit uses always a sequence
count, there is no need for the fetch_irq() variants anymore.
Convert to the regular interface.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
]
6/50 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: net: Remove the obsolte u64_stats_fetch_*_irq() users (net).
Date: Thu, 25 Aug 2022 16:17:37 +0200
Now that the 32bit UP oddity is gone and 32bit uses always a sequence
count, there is no need for the fetch_irq() variants anymore.
Convert to the regular interface.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
]
7/50 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: bpf: Remove the obsolte u64_stats_fetch_*_irq() users.
Date: Thu, 25 Aug 2022 16:17:57 +0200
Now that the 32bit UP oddity is gone and 32bit uses always a sequence
count, there is no need for the fetch_irq() variants anymore.
Convert to the regular interface.
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Hao Luo <haoluo@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Song Liu <song@kernel.org>
Cc: Stanislav Fomichev <sdf@google.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: bpf@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
]
8/50 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: u64_stat: Remove the obsolete fetch_irq() variants.
Date: Thu, 25 Aug 2022 16:43:46 +0200
Now that the 32bit UP oddity is gone and 32bit uses always a sequence
count, there is no need for the fetch_irq() variants anymore.
Delete the obsolete interfaces.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
]
9/50 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net: Avoid the IPI to free the
Date: Mon, 15 Aug 2022 17:29:50 +0200
skb_attempt_defer_free() collects a skbs, which was allocated on a
remote CPU, on a per-CPU list. These skbs are either freed on that
remote CPU once the CPU enters NET_RX or an remote IPI function is
invoked in to raise the NET_RX softirq if a threshold of pending skb has
been exceeded.
This remote IPI can cause the wakeup of ksoftirqd on PREEMPT_RT if the
remote CPU idle was idle. This is undesired because once the ksoftirqd
is running it will acquire all pending softirqs and they will not be
executed as part of the threaded interrupt until ksoftird goes idle
again.
To void all this, schedule the deferred clean up from a worker.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
10/50 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: x86: Allow to enable RT
Date: Wed, 7 Aug 2019 18:15:38 +0200
Allow to select RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
11/50 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: x86: Enable RT also on 32bit
Date: Thu, 7 Nov 2019 17:49:20 +0100
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
12/50 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: softirq: Use a dedicated thread for timer wakeups.
Date: Wed, 1 Dec 2021 17:41:09 +0100
A timer/hrtimer softirq is raised in-IRQ context. With threaded
interrupts enabled or on PREEMPT_RT this leads to waking the ksoftirqd
for the processing of the softirq.
Once the ksoftirqd is marked as pending (or is running) it will collect
all raised softirqs. This in turn means that a softirq which would have
been processed at the end of the threaded interrupt, which runs at an
elevated priority, is now moved to ksoftirqd which runs at SCHED_OTHER
priority and competes with every regular task for CPU resources.
This introduces long delays on heavy loaded systems and is not desired
especially if the system is not overloaded by the softirqs.
Split the TIMER_SOFTIRQ and HRTIMER_SOFTIRQ processing into a dedicated
timers thread and let it run at the lowest SCHED_FIFO priority.
RT tasks are are woken up from hardirq context so only timer_list timers
and hrtimers for "regular" tasks are processed here. The higher priority
ensures that wakeups are performed before scheduling SCHED_OTHER tasks.
Using a dedicated variable to store the pending softirq bits values
ensure that the timer are not accidentally picked up by ksoftirqd and
other threaded interrupts.
It shouldn't be picked up by ksoftirqd since it runs at lower priority.
However if the timer bits are ORed while a threaded interrupt is
running, then the timer softirq would be performed at higher priority.
The new timer thread will block on the softirq lock before it starts
softirq work. This "race window" isn't closed because while timer thread
is performing the softirq it can get PI-boosted via the softirq lock by
a random force-threaded thread.
The timer thread can pick up pending softirqs from ksoftirqd but only
if the softirq load is high. It is not be desired that the picked up
softirqs are processed at SCHED_FIFO priority under high softirq load
but this can already happen by a PI-boost by a force-threaded interrupt.
Reported-by: kernel test robot <lkp@intel.com> [ static timer_threads ]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
13/50 [
Author: Frederic Weisbecker
Email: frederic@kernel.org
Subject: rcutorture: Also force sched priority to timersd on boosting test.
Date: Tue, 5 Apr 2022 03:07:51 +0200
ksoftirqd is statically boosted to the priority level right above the
one of rcu_torture_boost() so that timers, which torture readers rely on,
get a chance to run while rcu_torture_boost() is polling.
However timers processing got split from ksoftirqd into their own kthread
(timersd) that isn't boosted. It has the same SCHED_FIFO low prio as
rcu_torture_boost() and therefore timers can't preempt it and may
starve.
The issue can be triggered in practice on v5.17.1-rt17 using:
./kvm.sh --allcpus --configs TREE04 --duration 10m --kconfig "CONFIG_EXPERT=y CONFIG_PREEMPT_RT=y"
Fix this with statically boosting timersd just like is done with
ksoftirqd in commit
ea6d962e80b61 ("rcutorture: Judge RCU priority boosting on grace periods, not callbacks")
Suggested-by: Mel Gorman <mgorman@suse.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lkml.kernel.org/r/20220405010752.1347437-1-frederic@kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
14/50 [
Author: Frederic Weisbecker
Email: frederic@kernel.org
Subject: tick: Fix timer storm since introduction of timersd
Date: Tue, 5 Apr 2022 03:07:52 +0200
If timers are pending while the tick is reprogrammed on nohz_mode, the
next expiry is not armed to fire now, it is delayed one jiffy forward
instead so as not to raise an inextinguishable timer storm with such
scenario:
1) IRQ triggers and queue a timer
2) ksoftirqd() is woken up
3) IRQ tail: timer is reprogrammed to fire now
4) IRQ exit
5) TIMER interrupt
6) goto 3)
...all that until we finally reach ksoftirqd.
Unfortunately we are checking the wrong softirq vector bitmask since
timersd kthread has split from ksoftirqd. Timers now have their own
vector state field that must be checked separately. As a result, the
old timer storm is back. This shows up early on boot with extremely long
initcalls:
[ 333.004807] initcall dquot_init+0x0/0x111 returned 0 after 323822879 usecs
and the cause is uncovered with the right trace events showing just
10 microseconds between ticks (~100 000 Hz):
|swapper/-1 1dn.h111 60818582us : hrtimer_expire_entry: hrtimer=00000000e0ef0f6b function=tick_sched_timer now=60415486608
|swapper/-1 1dn.h111 60818592us : hrtimer_expire_entry: hrtimer=00000000e0ef0f6b function=tick_sched_timer now=60415496082
|swapper/-1 1dn.h111 60818601us : hrtimer_expire_entry: hrtimer=00000000e0ef0f6b function=tick_sched_timer now=60415505550
Fix this by checking the right timer vector state from the nohz code.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lkml.kernel.org/r/20220405010752.1347437-2-frederic@kernel.org
]
15/50 [
Author: Haris Okanovic
Email: haris.okanovic@ni.com
Subject: tpm_tis: fix stall after iowrite*()s
Date: Tue, 15 Aug 2017 15:13:08 -0500
ioread8() operations to TPM MMIO addresses can stall the cpu when
immediately following a sequence of iowrite*()'s to the same region.
For example, cyclitest measures ~400us latency spikes when a non-RT
usermode application communicates with an SPI-based TPM chip (Intel Atom
E3940 system, PREEMPT_RT kernel). The spikes are caused by a
stalling ioread8() operation following a sequence of 30+ iowrite8()s to
the same address. I believe this happens because the write sequence is
buffered (in cpu or somewhere along the bus), and gets flushed on the
first LOAD instruction (ioread*()) that follows.
The enclosed change appears to fix this issue: read the TPM chip's
access register (status code) after every iowrite*() operation to
amortize the cost of flushing data to chip across multiple instructions.
Signed-off-by: Haris Okanovic <haris.okanovic@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
16/50 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: zram: Replace bit spinlocks with spinlock_t for PREEMPT_RT.
Date: Thu, 31 Mar 2016 04:08:28 +0200
The bit spinlock disables preemption on PREEMPT_RT. With disabled preemption it
is not allowed to acquire other sleeping locks which includes invoking
zs_free().
Use a spinlock_t on PREEMPT_RT for locking and set/ clear ZRAM_LOCK after the
lock has been acquired/ dropped.
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lkml.kernel.org/r/YqIbMuHCPiQk+Ac2@linutronix.de
]
17/50 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locking/lockdep: Remove lockdep_init_map_crosslock.
Date: Fri, 11 Mar 2022 17:44:57 +0100
The cross-release bits have been removed, lockdep_init_map_crosslock() is
a leftover.
Remove lockdep_init_map_crosslock.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Waiman Long <longman@redhat.com>
Link: https://lore.kernel.org/r/20220311164457.46461-1-bigeasy@linutronix.de
Link: https://lore.kernel.org/r/YqITgY+2aPITu96z@linutronix.de
]
18/50 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: printk: Bring back the RT bits.
Date: Tue, 19 Jul 2022 20:08:01 +0200
This is a revert of the commits:
| 07a22b61946f0 Revert "printk: add functions to prefer direct printing"
| 5831788afb17b Revert "printk: add kthread console printers"
| 2d9ef940f89e0 Revert "printk: extend console_lock for per-console locking"
| 007eeab7e9f03 Revert "printk: remove @console_locked"
| 05c96b3713aa2 Revert "printk: Block console kthreads when direct printing will be required"
| 20fb0c8272bbb Revert "printk: Wait for the global console lock when the system is going down"
which is needed for the atomic consoles which are used on PREEMPT_RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
19/50 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: add infrastucture for atomic consoles
Date: Fri, 4 Feb 2022 16:01:17 +0106
Many times it is not possible to see the console output on
panic because printing threads cannot be scheduled and/or the
console is already taken and forcibly overtaking/busting the
locks does provide the hoped results.
Introduce a new infrastructure to support "atomic consoles".
A new optional callback in struct console, write_atomic(), is
available for consoles to provide an implemention for writing
console messages. The implementation must be NMI safe if they
can run on an architecture where NMIs exist.
Console drivers implementing the write_atomic() callback must
also select CONFIG_HAVE_ATOMIC_CONSOLE in order to enable the
atomic console code within the printk subsystem.
If atomic consoles are available, panic() will flush the kernel
log only to the atomic consoles (before busting spinlocks).
Afterwards, panic() will continue as before, which includes
attempting to flush the other (non-atomic) consoles.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
20/50 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: serial: 8250: implement write_atomic
Date: Fri, 4 Feb 2022 16:01:17 +0106
Implement a non-sleeping NMI-safe write_atomic() console function in
order to support atomic console printing during a panic.
Trasmitting data requires disabling interrupts. Since write_atomic()
can be called from any context, it may be called while another CPU
is executing in console code. In order to maintain the correct state
of the IER register, use the global cpu_sync to synchronize all
access to the IER register. This synchronization is only necessary
for serial ports that are being used as consoles.
The global cpu_sync is also used to synchronize between the write()
and write_atomic() callbacks. write() synchronizes per character,
write_atomic() synchronizes per line.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
21/50 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: avoid preempt_disable() for PREEMPT_RT
Date: Fri, 4 Feb 2022 16:01:17 +0106
During non-normal operation, printk() calls will attempt to
write the messages directly to the consoles. This involves
using console_trylock() to acquire @console_sem.
Preemption is disabled while directly printing to the consoles
in order to ensure that the printing task is not scheduled away
while holding @console_sem, thus blocking all other printers
and causing delays in printing.
Commit fd5f7cde1b85 ("printk: Never set console_may_schedule in
console_trylock()") specifically reverted a previous attempt at
allowing preemption while printing.
However, on PREEMPT_RT systems, disabling preemption while
printing is not allowed because console drivers typically
acquire a spin lock (which under PREEMPT_RT is an rtmutex).
Since direct printing is only used during early boot and
non-panic dumps, the risks of delayed print output for these
scenarios will be accepted under PREEMPT_RT.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
22/50 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: drm/i915: Use preempt_disable/enable_rt() where recommended
Date: Sat, 27 Feb 2016 08:09:11 +0100
Mario Kleiner suggest in commit
ad3543ede630f ("drm/intel: Push get_scanout_position() timestamping into kms driver.")
a spots where preemption should be disabled on PREEMPT_RT. The
difference is that on PREEMPT_RT the intel_uncore::lock disables neither
preemption nor interrupts and so region remains preemptible.
The area covers only register reads and writes. The part that worries me
is:
- __intel_get_crtc_scanline() the worst case is 100us if no match is
found.
- intel_crtc_scanlines_since_frame_timestamp() not sure how long this
may take in the worst case.
It was in the RT queue for a while and nobody complained.
Disable preemption on PREEPMPT_RT during timestamping.
[bigeasy: patch description.]
Cc: Mario Kleiner <mario.kleiner.de@gmail.com>
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
23/50 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: drm/i915: Don't disable interrupts on PREEMPT_RT during atomic updates
Date: Sat, 27 Feb 2016 09:01:42 +0100
Commit
8d7849db3eab7 ("drm/i915: Make sprite updates atomic")
started disabling interrupts across atomic updates. This breaks on PREEMPT_RT
because within this section the code attempt to acquire spinlock_t locks which
are sleeping locks on PREEMPT_RT.
According to the comment the interrupts are disabled to avoid random delays and
not required for protection or synchronisation.
If this needs to happen with disabled interrupts on PREEMPT_RT, and the
whole section is restricted to register access then all sleeping locks
need to be acquired before interrupts are disabled and some function
maybe moved after enabling interrupts again.
This includes:
- prepare_to_wait() + finish_wait() due its wake queue.
- drm_crtc_vblank_put() -> vblank_disable_fn() drm_device::vbl_lock.
- skl_pfit_enable(), intel_update_plane(), vlv_atomic_update_fifo() and
maybe others due to intel_uncore::lock
- drm_crtc_arm_vblank_event() due to drm_device::event_lock and
drm_device::vblank_time_lock.
Don't disable interrupts on PREEMPT_RT during atomic updates.
[bigeasy: drop local locks, commit message]
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
24/50 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: drm/i915: Don't check for atomic context on PREEMPT_RT
Date: Mon, 25 Oct 2021 15:05:18 +0200
The !in_atomic() check in _wait_for_atomic() triggers on PREEMPT_RT
because the uncore::lock is a spinlock_t and does not disable
preemption or interrupts.
Changing the uncore:lock to a raw_spinlock_t doubles the worst case
latency on an otherwise idle testbox during testing. Therefore I'm
currently unsure about changing this.
Link: https://lore.kernel.org/all/20211006164628.s2mtsdd2jdbfyf7g@linutronix.de/
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
25/50 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: drm/i915: Disable tracing points on PREEMPT_RT
Date: Thu, 6 Dec 2018 09:52:20 +0100
Luca Abeni reported this:
| BUG: scheduling while atomic: kworker/u8:2/15203/0x00000003
| CPU: 1 PID: 15203 Comm: kworker/u8:2 Not tainted 4.19.1-rt3 #10
| Call Trace:
| rt_spin_lock+0x3f/0x50
| gen6_read32+0x45/0x1d0 [i915]
| g4x_get_vblank_counter+0x36/0x40 [i915]
| trace_event_raw_event_i915_pipe_update_start+0x7d/0xf0 [i915]
The tracing events use trace_i915_pipe_update_start() among other events
use functions acquire spinlock_t locks which are transformed into
sleeping locks on PREEMPT_RT. A few trace points use
intel_get_crtc_scanline(), others use ->get_vblank_counter() wich also
might acquire a sleeping locks on PREEMPT_RT.
At the time the arguments are evaluated within trace point, preemption
is disabled and so the locks must not be acquired on PREEMPT_RT.
Based on this I don't see any other way than disable trace points on
PREMPT_RT.
Reported-by: Luca Abeni <lucabe72@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
26/50 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: drm/i915: skip DRM_I915_LOW_LEVEL_TRACEPOINTS with NOTRACE
Date: Wed, 19 Dec 2018 10:47:02 +0100
The order of the header files is important. If this header file is
included after tracepoint.h was included then the NOTRACE here becomes a
nop. Currently this happens for two .c files which use the tracepoitns
behind DRM_I915_LOW_LEVEL_TRACEPOINTS.
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
27/50 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: drm/i915/gt: Queue and wait for the irq_work item.
Date: Wed, 8 Sep 2021 17:18:00 +0200
Disabling interrupts and invoking the irq_work function directly breaks
on PREEMPT_RT.
PREEMPT_RT does not invoke all irq_work from hardirq context because
some of the user have spinlock_t locking in the callback function.
These locks are then turned into a sleeping locks which can not be
acquired with disabled interrupts.
Using irq_work_queue() has the benefit that the irqwork will be invoked
in the regular context. In general there is "no" delay between enqueuing
the callback and its invocation because the interrupt is raised right
away on architectures which support it (which includes x86).
Use irq_work_queue() + irq_work_sync() instead invoking the callback
directly.
Reported-by: Clark Williams <williams@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
]
28/50 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: drm/i915/gt: Use spin_lock_irq() instead of local_irq_disable() + spin_lock()
Date: Wed, 8 Sep 2021 19:03:41 +0200
execlists_dequeue() is invoked from a function which uses
local_irq_disable() to disable interrupts so the spin_lock() behaves
like spin_lock_irq().
This breaks PREEMPT_RT because local_irq_disable() + spin_lock() is not
the same as spin_lock_irq().
execlists_dequeue_irq() and execlists_dequeue() has each one caller
only. If intel_engine_cs::active::lock is acquired and released with the
_irq suffix then it behaves almost as if execlists_dequeue() would be
invoked with disabled interrupts. The difference is the last part of the
function which is then invoked with enabled interrupts.
I can't tell if this makes a difference. From looking at it, it might
work to move the last unlock at the end of the function as I didn't find
anything that would acquire the lock again.
Reported-by: Clark Williams <williams@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
]
29/50 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: drm/i915: Drop the irqs_disabled() check
Date: Fri, 1 Oct 2021 20:01:03 +0200
The !irqs_disabled() check triggers on PREEMPT_RT even with
i915_sched_engine::lock acquired. The reason is the lock is transformed
into a sleeping lock on PREEMPT_RT and does not disable interrupts.
There is no need to check for disabled interrupts. The lockdep
annotation below already check if the lock has been acquired by the
caller and will yell if the interrupts are not disabled.
Remove the !irqs_disabled() check.
Reported-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
30/50 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: Revert "drm/i915: Depend on !PREEMPT_RT."
Date: Mon, 21 Feb 2022 17:59:14 +0100
Once the known issues are addressed, it should be safe to enable the
driver.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
31/50 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Add support for lazy preemption
Date: Fri, 26 Oct 2012 18:50:54 +0100
It has become an obsession to mitigate the determinism vs. throughput
loss of RT. Looking at the mainline semantics of preemption points
gives a hint why RT sucks throughput wise for ordinary SCHED_OTHER
tasks. One major issue is the wakeup of tasks which are right away
preempting the waking task while the waking task holds a lock on which
the woken task will block right after having preempted the wakee. In
mainline this is prevented due to the implicit preemption disable of
spin/rw_lock held regions. On RT this is not possible due to the fully
preemptible nature of sleeping spinlocks.
Though for a SCHED_OTHER task preempting another SCHED_OTHER task this
is really not a correctness issue. RT folks are concerned about
SCHED_FIFO/RR tasks preemption and not about the purely fairness
driven SCHED_OTHER preemption latencies.
So I introduced a lazy preemption mechanism which only applies to
SCHED_OTHER tasks preempting another SCHED_OTHER task. Aside of the
existing preempt_count each tasks sports now a preempt_lazy_count
which is manipulated on lock acquiry and release. This is slightly
incorrect as for lazyness reasons I coupled this on
migrate_disable/enable so some other mechanisms get the same treatment
(e.g. get_cpu_light).
Now on the scheduler side instead of setting NEED_RESCHED this sets
NEED_RESCHED_LAZY in case of a SCHED_OTHER/SCHED_OTHER preemption and
therefor allows to exit the waking task the lock held region before
the woken task preempts. That also works better for cross CPU wakeups
as the other side can stay in the adaptive spinning loop.
For RT class preemption there is no change. This simply sets
NEED_RESCHED and forgoes the lazy preemption counter.
Initial test do not expose any observable latency increasement, but
history shows that I've been proven wrong before :)
The lazy preemption mode is per default on, but with
CONFIG_SCHED_DEBUG enabled it can be disabled via:
# echo NO_PREEMPT_LAZY >/sys/kernel/debug/sched_features
and reenabled via
# echo PREEMPT_LAZY >/sys/kernel/debug/sched_features
The test results so far are very machine and workload dependent, but
there is a clear trend that it enhances the non RT workload
performance.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
32/50 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: x86/entry: Use should_resched() in idtentry_exit_cond_resched()
Date: Tue, 30 Jun 2020 11:45:14 +0200
The TIF_NEED_RESCHED bit is inlined on x86 into the preemption counter.
By using should_resched(0) instead of need_resched() the same check can
be performed which uses the same variable as 'preempt_count()` which was
issued before.
Use should_resched(0) instead need_resched().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
33/50 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: x86: Support for lazy preemption
Date: Thu, 1 Nov 2012 11:03:47 +0100
Implement the x86 pieces for lazy preempt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
34/50 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: entry: Fix the preempt lazy fallout
Date: Tue, 13 Jul 2021 07:52:52 +0200
Common code needs common defines....
Fixes: f2f9e496208c ("x86: Support for lazy preemption")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
35/50 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: arm: Add support for lazy preemption
Date: Wed, 31 Oct 2012 12:04:11 +0100
Implement the arm pieces for lazy preempt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
36/50 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: powerpc: Add support for lazy preemption
Date: Thu, 1 Nov 2012 10:14:11 +0100
Implement the powerpc pieces for lazy preempt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
37/50 [
Author: Anders Roxell
Email: anders.roxell@linaro.org
Subject: arch/arm64: Add lazy preempt support
Date: Thu, 14 May 2015 17:52:17 +0200
arm64 is missing support for PREEMPT_RT. The main feature which is
lacking is support for lazy preemption. The arch-specific entry code,
thread information structure definitions, and associated data tables
have to be extended to provide this support. Then the Kconfig file has
to be extended to indicate the support is available, and also to
indicate that support for full RT preemption is now available.
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
38/50 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: arm: Disable jump-label on PREEMPT_RT.
Date: Wed, 8 Jul 2015 17:14:48 +0200
jump-labels are used to efficiently switch between two possible code
paths. To achieve this, stop_machine() is used to keep the CPU in a
known state while the opcode is modified. The usage of stop_machine()
here leads to large latency spikes which can be observed on PREEMPT_RT.
Jump labels may change the target during runtime and are not restricted
to debug or "configuration/ setup" part of a PREEMPT_RT system where
high latencies could be defined as acceptable.
Disable jump-label support on a PREEMPT_RT system.
[bigeasy: Patch description.]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lkml.kernel.org/r/20220613182447.112191-2-bigeasy@linutronix.de
]
39/50 [
Author: Yadi.hu
Email: yadi.hu@windriver.com
Subject: ARM: enable irq in translation/section permission fault handlers
Date: Wed, 10 Dec 2014 10:32:09 +0800
Probably happens on all ARM, with
CONFIG_PREEMPT_RT
CONFIG_DEBUG_ATOMIC_SLEEP
This simple program....
int main() {
*((char*)0xc0001000) = 0;
};
[ 512.742724] BUG: sleeping function called from invalid context at kernel/rtmutex.c:658
[ 512.743000] in_atomic(): 0, irqs_disabled(): 128, pid: 994, name: a
[ 512.743217] INFO: lockdep is turned off.
[ 512.743360] irq event stamp: 0
[ 512.743482] hardirqs last enabled at (0): [< (null)>] (null)
[ 512.743714] hardirqs last disabled at (0): [<c0426370>] copy_process+0x3b0/0x11c0
[ 512.744013] softirqs last enabled at (0): [<c0426370>] copy_process+0x3b0/0x11c0
[ 512.744303] softirqs last disabled at (0): [< (null)>] (null)
[ 512.744631] [<c041872c>] (unwind_backtrace+0x0/0x104)
[ 512.745001] [<c09af0c4>] (dump_stack+0x20/0x24)
[ 512.745355] [<c0462490>] (__might_sleep+0x1dc/0x1e0)
[ 512.745717] [<c09b6770>] (rt_spin_lock+0x34/0x6c)
[ 512.746073] [<c0441bf0>] (do_force_sig_info+0x34/0xf0)
[ 512.746457] [<c0442668>] (force_sig_info+0x18/0x1c)
[ 512.746829] [<c041d880>] (__do_user_fault+0x9c/0xd8)
[ 512.747185] [<c041d938>] (do_bad_area+0x7c/0x94)
[ 512.747536] [<c041d990>] (do_sect_fault+0x40/0x48)
[ 512.747898] [<c040841c>] (do_DataAbort+0x40/0xa0)
[ 512.748181] Exception stack(0xecaa1fb0 to 0xecaa1ff8)
Oxc0000000 belongs to kernel address space, user task can not be
allowed to access it. For above condition, correct result is that
test case should receive a “segment fault” and exits but not stacks.
the root cause is commit 02fe2845d6a8 ("avoid enabling interrupts in
prefetch/data abort handlers"),it deletes irq enable block in Data
abort assemble code and move them into page/breakpiont/alignment fault
handlers instead. But author does not enable irq in translation/section
permission fault handlers. ARM disables irq when it enters exception/
interrupt mode, if kernel doesn't enable irq, it would be still disabled
during translation/section permission fault.
We see the above splat because do_force_sig_info is still called with
IRQs off, and that code eventually does a:
spin_lock_irqsave(&t->sighand->siglock, flags);
As this is architecture independent code, and we've not seen any other
need for other arch to have the siglock converted to raw lock, we can
conclude that we should enable irq for ARM translation/section
permission exception.
Signed-off-by: Yadi.hu <yadi.hu@windriver.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
40/50 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: tty/serial/omap: Make the locking RT aware
Date: Thu, 28 Jul 2011 13:32:57 +0200
The lock is a sleeping lock and local_irq_save() is not the
optimsation we are looking for. Redo it to make it work on -RT and
non-RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
41/50 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: tty/serial/pl011: Make the locking work on RT
Date: Tue, 8 Jan 2013 21:36:51 +0100
The lock is a sleeping lock and local_irq_save() is not the optimsation
we are looking for. Redo it to make it work on -RT and non-RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
42/50 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: ARM: Allow to enable RT
Date: Fri, 11 Oct 2019 13:14:29 +0200
Allow to select RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
43/50 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: ARM64: Allow to enable RT
Date: Fri, 11 Oct 2019 13:14:35 +0200
Allow to select RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
44/50 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: powerpc: traps: Use PREEMPT_RT
Date: Fri, 26 Jul 2019 11:30:49 +0200
Add PREEMPT_RT to the backtrace if enabled.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
45/50 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: powerpc/pseries/iommu: Use a locallock instead local_irq_save()
Date: Tue, 26 Mar 2019 18:31:54 +0100
The locallock protects the per-CPU variable tce_page. The function
attempts to allocate memory while tce_page is protected (by disabling
interrupts).
Use local_irq_save() instead of local_irq_disable().
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
46/50 [
Author: Bogdan Purcareata
Email: bogdan.purcareata@freescale.com
Subject: powerpc/kvm: Disable in-kernel MPIC emulation for PREEMPT_RT
Date: Fri, 24 Apr 2015 15:53:13 +0000
While converting the openpic emulation code to use a raw_spinlock_t enables
guests to run on RT, there's still a performance issue. For interrupts sent in
directed delivery mode with a multiple CPU mask, the emulated openpic will loop
through all of the VCPUs, and for each VCPUs, it call IRQ_check, which will loop
through all the pending interrupts for that VCPU. This is done while holding the
raw_lock, meaning that in all this time the interrupts and preemption are
disabled on the host Linux. A malicious user app can max both these number and
cause a DoS.
This temporary fix is sent for two reasons. First is so that users who want to
use the in-kernel MPIC emulation are aware of the potential latencies, thus
making sure that the hardware MPIC and their usage scenario does not involve
interrupts sent in directed delivery mode, and the number of possible pending
interrupts is kept small. Secondly, this should incentivize the development of a
proper openpic emulation that would be better suited for RT.
Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Bogdan Purcareata <bogdan.purcareata@freescale.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
47/50 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: powerpc/stackprotector: work around stack-guard init from atomic
Date: Tue, 26 Mar 2019 18:31:29 +0100
This is invoked from the secondary CPU in atomic context. On x86 we use
tsc instead. On Power we XOR it against mftb() so lets use stack address
as the initial value.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
48/50 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: POWERPC: Allow to enable RT
Date: Fri, 11 Oct 2019 13:14:41 +0200
Allow to select RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
49/50 [
Author: Clark Williams
Email: williams@redhat.com
Subject: sysfs: Add /sys/kernel/realtime entry
Date: Sat, 30 Jul 2011 21:55:53 -0500
Add a /sys/kernel entry to indicate that the kernel is a
realtime kernel.
Clark says that he needs this for udev rules, udev needs to evaluate
if its a PREEMPT_RT kernel a few thousand times and parsing uname
output is too slow or so.
Are there better solutions? Should it exist and return 0 on !-rt?
Signed-off-by: Clark Williams <williams@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
50/50 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: Add localversion for -RT release
Date: Fri, 8 Jul 2011 20:25:16 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
1/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: highmem: Don't disable preemption on RT in kmap_atomic()
Date: Fri, 30 Oct 2020 13:59:06 +0100
Disabling preemption makes it impossible to acquire sleeping locks within
kmap_atomic() section.
For PREEMPT_RT it is sufficient to disable migration.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
2/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: timers: Move clearing of base::timer_running under base::lock
Date: Sun, 6 Dec 2020 22:40:07 +0100
syzbot reported KCSAN data races vs. timer_base::timer_running being set to
NULL without holding base::lock in expire_timers().
This looks innocent and most reads are clearly not problematic but for a
non-RT kernel it's completely irrelevant whether the store happens before
or after taking the lock. For an RT kernel moving the store under the lock
requires an extra unlock/lock pair in the case that there is a waiter for
the timer. But that's not the end of the world and definitely not worth the
trouble of adding boatloads of comments and annotations to the code. Famous
last words...
Reported-by: syzbot+aa7c2385d46c5eba0b89@syzkaller.appspotmail.com
Reported-by: syzbot+abea4558531bae1ba9fe@syzkaller.appspotmail.com
Link: https://lkml.kernel.org/r/87lfea7gw8.fsf@nanos.tec.linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: stable-rt@vger.kernel.org
]
3/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: kthread: Move prio/affinite change into the newly created thread
Date: Mon, 9 Nov 2020 21:30:41 +0100
With enabled threaded interrupts the nouveau driver reported the
following:
| Chain exists of:
| &mm->mmap_lock#2 --> &device->mutex --> &cpuset_rwsem
|
| Possible unsafe locking scenario:
|
| CPU0 CPU1
| ---- ----
| lock(&cpuset_rwsem);
| lock(&device->mutex);
| lock(&cpuset_rwsem);
| lock(&mm->mmap_lock#2);
The device->mutex is nvkm_device::mutex.
Unblocking the lockchain at `cpuset_rwsem' is probably the easiest thing
to do.
Move the priority reset to the start of the newly created thread.
Fixes: 710da3c8ea7df ("sched/core: Prevent race condition between cpuset and __sched_setscheduler()")
Reported-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lkml.kernel.org/r/a23a826af7c108ea5651e73b8fbae5e653f16e86.camel@gmx.de
]
4/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: genirq: Move prio assignment into the newly created thread
Date: Mon, 9 Nov 2020 23:32:39 +0100
With enabled threaded interrupts the nouveau driver reported the
following:
| Chain exists of:
| &mm->mmap_lock#2 --> &device->mutex --> &cpuset_rwsem
|
| Possible unsafe locking scenario:
|
| CPU0 CPU1
| ---- ----
| lock(&cpuset_rwsem);
| lock(&device->mutex);
| lock(&cpuset_rwsem);
| lock(&mm->mmap_lock#2);
The device->mutex is nvkm_device::mutex.
Unblocking the lockchain at `cpuset_rwsem' is probably the easiest thing
to do.
Move the priority assignment to the start of the newly created thread.
Fixes: 710da3c8ea7df ("sched/core: Prevent race condition between cpuset and __sched_setscheduler()")
Reported-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy: Patch description]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lkml.kernel.org/r/a23a826af7c108ea5651e73b8fbae5e653f16e86.camel@gmx.de
]
5/191 [
Author: Valentin Schneider
Email: valentin.schneider@arm.com
Subject: notifier: Make atomic_notifiers use raw_spinlock
Date: Sun, 22 Nov 2020 20:19:04 +0000
Booting a recent PREEMPT_RT kernel (v5.10-rc3-rt7-rebase) on my arm64 Juno
leads to the idle task blocking on an RT sleeping spinlock down some
notifier path:
[ 1.809101] BUG: scheduling while atomic: swapper/5/0/0x00000002
[ 1.809116] Modules linked in:
[ 1.809123] Preemption disabled at:
[ 1.809125] secondary_start_kernel (arch/arm64/kernel/smp.c:227)
[ 1.809146] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G W 5.10.0-rc3-rt7 #168
[ 1.809153] Hardware name: ARM Juno development board (r0) (DT)
[ 1.809158] Call trace:
[ 1.809160] dump_backtrace (arch/arm64/kernel/stacktrace.c:100 (discriminator 1))
[ 1.809170] show_stack (arch/arm64/kernel/stacktrace.c:198)
[ 1.809178] dump_stack (lib/dump_stack.c:122)
[ 1.809188] __schedule_bug (kernel/sched/core.c:4886)
[ 1.809197] __schedule (./arch/arm64/include/asm/preempt.h:18 kernel/sched/core.c:4913 kernel/sched/core.c:5040)
[ 1.809204] preempt_schedule_lock (kernel/sched/core.c:5365 (discriminator 1))
[ 1.809210] rt_spin_lock_slowlock_locked (kernel/locking/rtmutex.c:1072)
[ 1.809217] rt_spin_lock_slowlock (kernel/locking/rtmutex.c:1110)
[ 1.809224] rt_spin_lock (./include/linux/rcupdate.h:647 kernel/locking/rtmutex.c:1139)
[ 1.809231] atomic_notifier_call_chain_robust (kernel/notifier.c:71 kernel/notifier.c:118 kernel/notifier.c:186)
[ 1.809240] cpu_pm_enter (kernel/cpu_pm.c:39 kernel/cpu_pm.c:93)
[ 1.809249] psci_enter_idle_state (drivers/cpuidle/cpuidle-psci.c:52 drivers/cpuidle/cpuidle-psci.c:129)
[ 1.809258] cpuidle_enter_state (drivers/cpuidle/cpuidle.c:238)
[ 1.809267] cpuidle_enter (drivers/cpuidle/cpuidle.c:353)
[ 1.809275] do_idle (kernel/sched/idle.c:132 kernel/sched/idle.c:213 kernel/sched/idle.c:273)
[ 1.809282] cpu_startup_entry (kernel/sched/idle.c:368 (discriminator 1))
[ 1.809288] secondary_start_kernel (arch/arm64/kernel/smp.c:273)
Two points worth noting:
1) That this is conceptually the same issue as pointed out in:
313c8c16ee62 ("PM / CPU: replace raw_notifier with atomic_notifier")
2) Only the _robust() variant of atomic_notifier callchains suffer from
this
AFAICT only the cpu_pm_notifier_chain really needs to be changed, but
singling it out would mean introducing a new (truly) non-blocking API. At
the same time, callers that are fine with any blocking within the call
chain should use blocking notifiers, so patching up all atomic_notifier's
doesn't seem *too* crazy to me.
Fixes: 70d932985757 ("notifier: Fix broken error handling pattern")
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Link: https://lkml.kernel.org/r/20201122201904.30940-1-valentin.schneider@arm.com
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
6/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: powerpc/mm: Move the linear_mapping_mutex to the ifdef where it is used
Date: Fri, 19 Feb 2021 17:51:07 +0100
The mutex linear_mapping_mutex is defined at the of the file while its
only two user are within the CONFIG_MEMORY_HOTPLUG block.
A compile without CONFIG_MEMORY_HOTPLUG set fails on PREEMPT_RT because
its mutex implementation is smart enough to realize that it is unused.
Move the definition of linear_mapping_mutex to ifdef block where it is
used.
Fixes: 1f73ad3e8d755 ("powerpc/mm: print warning in arch_remove_linear_mapping()")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
7/191 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: limit second loop of syslog_print_all
Date: Wed, 17 Feb 2021 16:15:31 +0100
The second loop of syslog_print_all() subtracts lengths that were
added in the first loop. With commit b031a684bfd0 ("printk: remove
logbuf_lock writer-protection of ringbuffer") it is possible that
records are (over)written during syslog_print_all(). This allows the
possibility of the second loop subtracting lengths that were never
added in the first loop.
This situation can result in syslog_print_all() filling the buffer
starting from a later record, even though there may have been room
to fit the earlier record(s) as well.
Fixes: b031a684bfd0 ("printk: remove logbuf_lock writer-protection of ringbuffer")
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
]
8/191 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: kmsg_dump: remove unused fields
Date: Mon, 21 Dec 2020 11:19:39 +0106
struct kmsg_dumper still contains some fields that were used to
iterate the old ringbuffer. They are no longer used. Remove them
and update the struct documentation.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
9/191 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: refactor kmsg_dump_get_buffer()
Date: Mon, 30 Nov 2020 01:41:56 +0106
kmsg_dump_get_buffer() requires nearly the same logic as
syslog_print_all(), but uses different variable names and
does not make use of the ringbuffer loop macros. Modify
kmsg_dump_get_buffer() so that the implementation is as similar
to syslog_print_all() as possible.
A follow-up commit will move this common logic into a
separate helper function.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
10/191 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: consolidate kmsg_dump_get_buffer/syslog_print_all code
Date: Wed, 13 Jan 2021 11:29:53 +0106
The logic for finding records to fit into a buffer is the same for
kmsg_dump_get_buffer() and syslog_print_all(). Introduce a helper
function find_first_fitting_seq() to handle this logic.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
]
11/191 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: introduce CONSOLE_LOG_MAX for improved multi-line support
Date: Thu, 10 Dec 2020 12:48:01 +0106
Instead of using "LOG_LINE_MAX + PREFIX_MAX" for temporary buffer
sizes, introduce CONSOLE_LOG_MAX. This represents the maximum size
that is allowed to be printed to the console for a single record.
Rather than setting CONSOLE_LOG_MAX to "LOG_LINE_MAX + PREFIX_MAX"
(1024), increase it to 4096. With a larger buffer size, multi-line
records that are nearly LOG_LINE_MAX in length will have a better
chance of being fully printed. (When formatting a record for the
console, each line of a multi-line record is prepended with a copy
of the prefix.)
Signed-off-by: John Ogness <john.ogness@linutronix.de>
]
12/191 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: use seqcount_latch for clear_seq
Date: Mon, 30 Nov 2020 01:41:58 +0106
kmsg_dump_rewind_nolock() locklessly reads @clear_seq. However,
this is not done atomically. Since @clear_seq is 64-bit, this
cannot be an atomic operation for all platforms. Therefore, use
a seqcount_latch to allow readers to always read a consistent
value.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
13/191 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: use atomic64_t for devkmsg_user.seq
Date: Thu, 10 Dec 2020 15:33:40 +0106
@user->seq is indirectly protected by @logbuf_lock. Once @logbuf_lock
is removed, @user->seq will be no longer safe from an atomicity point
of view.
In preparation for the removal of @logbuf_lock, change it to
atomic64_t to provide this safety.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
]
14/191 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: add syslog_lock
Date: Thu, 10 Dec 2020 16:58:02 +0106
The global variables @syslog_seq, @syslog_partial, @syslog_time
and write access to @clear_seq are protected by @logbuf_lock.
Once @logbuf_lock is removed, these variables will need their
own synchronization method. Introduce @syslog_lock for this
purpose.
@syslog_lock is a raw_spin_lock for now. This simplifies the
transition to removing @logbuf_lock. Once @logbuf_lock and the
safe buffers are removed, @syslog_lock can change to spin_lock.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
15/191 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: introduce a kmsg_dump iterator
Date: Fri, 18 Dec 2020 11:40:08 +0000
Rather than store the iterator information into the registered
kmsg_dump structure, create a separate iterator structure. The
kmsg_dump_iter structure can reside on the stack of the caller,
thus allowing lockless use of the kmsg_dump functions.
This is in preparation for removal of @logbuf_lock.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
16/191 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: um: synchronize kmsg_dumper
Date: Mon, 21 Dec 2020 11:10:03 +0106
The kmsg_dumper can be called from any context and CPU, possibly
from multiple CPUs simultaneously. Since a static buffer is used
to retrieve the kernel logs, this buffer must be protected against
simultaneous dumping.
Cc: Richard Weinberger <richard@nod.at>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
17/191 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: remove logbuf_lock
Date: Tue, 26 Jan 2021 17:43:19 +0106
Since the ringbuffer is lockless, there is no need for it to be
protected by @logbuf_lock. Remove @logbuf_lock.
This means that printk_nmi_direct and printk_safe_flush_on_panic()
no longer need to acquire any lock to run.
@console_seq, @exclusive_console_stop_seq, @console_dropped are
protected by @console_lock.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
18/191 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: kmsg_dump: remove _nolock() variants
Date: Mon, 21 Dec 2020 10:27:58 +0106
kmsg_dump_rewind() and kmsg_dump_get_line() are lockless, so there is
no need for _nolock() variants. Remove these functions and switch all
callers of the _nolock() variants.
The functions without _nolock() were chosen because they are already
exported to kernel modules.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
]
19/191 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: kmsg_dump: use kmsg_dump_rewind
Date: Wed, 17 Feb 2021 18:23:16 +0100
kmsg_dump() is open coding the kmsg_dump_rewind(). Call
kmsg_dump_rewind() instead.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
20/191 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: console: remove unnecessary safe buffer usage
Date: Wed, 17 Feb 2021 18:28:05 +0100
Upon registering a console, safe buffers are activated when setting
up the sequence number to replay the log. However, these are already
protected by @console_sem and @syslog_lock. Remove the unnecessary
safe buffer usage.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
]
21/191 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: track/limit recursion
Date: Fri, 11 Dec 2020 00:55:25 +0106
Limit printk() recursion to 1 level. This is enough to print a
stacktrace for the printk call, should a WARN or BUG occur.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
22/191 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: remove safe buffers
Date: Mon, 30 Nov 2020 01:42:00 +0106
With @logbuf_lock removed, the high level printk functions for
storing messages are lockless. Messages can be stored from any
context, so there is no need for the NMI and safe buffers anymore.
Remove the NMI and safe buffers. In NMI or safe contexts, store
the message immediately but still use irq_work to defer the console
printing.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
23/191 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: convert @syslog_lock to spin_lock
Date: Thu, 18 Feb 2021 17:37:41 +0100
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
24/191 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: console: add write_atomic interface
Date: Mon, 30 Nov 2020 01:42:01 +0106
Add a write_atomic() callback to the console. This is an optional
function for console drivers. The function must be atomic (including
NMI safe) for writing to the console.
Console drivers must still implement the write() callback. The
write_atomic() callback will only be used in special situations,
such as when the kernel panics.
Creating an NMI safe write_atomic() that must synchronize with
write() requires a careful implementation of the console driver. To
aid with the implementation, a set of console_atomic_*() functions
are provided:
void console_atomic_lock(unsigned int *flags);
void console_atomic_unlock(unsigned int flags);
These functions synchronize using a processor-reentrant spinlock
(called a cpulock).
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
25/191 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: serial: 8250: implement write_atomic
Date: Mon, 30 Nov 2020 01:42:02 +0106
Implement a non-sleeping NMI-safe write_atomic() console function in
order to support emergency console printing.
Since interrupts need to be disabled during transmit, all usage of
the IER register is wrapped with access functions that use the
console_atomic_lock() function to synchronize register access while
tracking the state of the interrupts. This is necessary because
write_atomic() can be called from an NMI context that has preempted
write_atomic().
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
26/191 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: relocate printk_delay() and vprintk_default()
Date: Mon, 30 Nov 2020 01:42:03 +0106
Move printk_delay() and vprintk_default() "as is" further up so that
they can be used by new functions in an upcoming commit.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
27/191 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: combine boot_delay_msec() into printk_delay()
Date: Mon, 30 Nov 2020 01:42:04 +0106
boot_delay_msec() is always called immediately before printk_delay()
so just combine the two.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
28/191 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: change @console_seq to atomic64_t
Date: Mon, 30 Nov 2020 01:42:05 +0106
In preparation for atomic printing, change @console_seq to atomic
so that it can be accessed without requiring @console_sem.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
29/191 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: introduce kernel sync mode
Date: Mon, 30 Nov 2020 01:42:06 +0106
When the kernel performs an OOPS, enter into "sync mode":
- only atomic consoles (write_atomic() callback) will print
- printing occurs within vprintk_store() instead of console_unlock()
CONSOLE_LOG_MAX is moved to printk.h to support the per-console
buffer used in sync mode.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
30/191 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: move console printing to kthreads
Date: Mon, 30 Nov 2020 01:42:07 +0106
Create a kthread for each console to perform console printing. Now
all console printing is fully asynchronous except for the boot
console and when the kernel enters sync mode (and there are atomic
consoles available).
The console_lock() and console_unlock() functions now only do what
their name says... locking and unlocking of the console.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
31/191 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: remove deferred printing
Date: Mon, 30 Nov 2020 01:42:08 +0106
Since printing occurs either atomically or from the printing
kthread, there is no need for any deferring or tracking possible
recursion paths. Remove all printk context tracking.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
32/191 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: add console handover
Date: Mon, 30 Nov 2020 01:42:09 +0106
If earlyprintk is used, a boot console will print directly to the
console immediately. The boot console will unregister itself as soon
as a non-boot console registers. However, the non-boot console does
not begin printing until its kthread has started. Since this happens
much later, there is a long pause in the console output. If the
ringbuffer is small, messages could even be dropped during the
pause.
Add a new CON_HANDOVER console flag to be used internally by printk
in order to track which non-boot console took over from a boot
console. If handover consoles have implemented write_atomic(), they
are allowed to print directly to the console until their kthread can
take over.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
33/191 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: add pr_flush()
Date: Mon, 30 Nov 2020 01:42:10 +0106
Provide a function to allow waiting for console printers to catch
up to the latest logged message.
Use pr_flush() to give console printers a chance to finish in
critical situations if no atomic console is available. For now
pr_flush() is only used in the most common error paths:
panic(), print_oops_end_marker(), report_bug(), kmsg_dump().
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
34/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: kcov: Remove kcov include from sched.h and move it to its users.
Date: Thu, 18 Feb 2021 18:31:24 +0100
The recent addition of in_serving_softirq() to kconv.h results in
compile failure on PREEMPT_RT because it requires
task_struct::softirq_disable_cnt. This is not available if kconv.h is
included from sched.h.
It is not needed to include kconv.h from sched.h. All but the net/ user
already include the kconv header file.
Move the include of the kconv.h header from sched.h it its users.
Additionally include sched.h from kconv.h to ensure that everything
task_struct related is available.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Andrey Konovalov <andreyknvl@google.com>
Link: https://lkml.kernel.org/r/20210218173124.iy5iyqv3a4oia4vv@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
35/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: cgroup: use irqsave in cgroup_rstat_flush_locked()
Date: Tue, 3 Jul 2018 18:19:48 +0200
All callers of cgroup_rstat_flush_locked() acquire cgroup_rstat_lock
either with spin_lock_irq() or spin_lock_irqsave().
cgroup_rstat_flush_locked() itself acquires cgroup_rstat_cpu_lock which
is a raw_spin_lock. This lock is also acquired in cgroup_rstat_updated()
in IRQ context and therefore requires _irqsave() locking suffix in
cgroup_rstat_flush_locked().
Since there is no difference between spin_lock_t and raw_spin_lock_t
on !RT lockdep does not complain here. On RT lockdep complains because
the interrupts were not disabled here and a deadlock is possible.
Acquire the raw_spin_lock_t with disabled interrupts.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
36/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: mm: workingset: replace IRQ-off check with a lockdep assert.
Date: Mon, 11 Feb 2019 10:40:46 +0100
Commit
68d48e6a2df57 ("mm: workingset: add vmstat counter for shadow nodes")
introduced an IRQ-off check to ensure that a lock is held which also
disabled interrupts. This does not work the same way on -RT because none
of the locks, that are held, disable interrupts.
Replace this check with a lockdep assert which ensures that the lock is
held.
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
37/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: shmem: Use raw_spinlock_t for ->stat_lock
Date: Fri, 14 Aug 2020 18:53:34 +0200
Each CPU has SHMEM_INO_BATCH inodes available in `->ino_batch' which is
per-CPU. Access here is serialized by disabling preemption. If the pool is
empty, it gets reloaded from `->next_ino'. Access here is serialized by
->stat_lock which is a spinlock_t and can not be acquired with disabled
preemption.
One way around it would make per-CPU ino_batch struct containing the inode
number a local_lock_t.
Another sollution is to promote ->stat_lock to a raw_spinlock_t. The critical
sections are short. The mpol_put() should be moved outside of the critical
section to avoid invoking the destrutor with disabled preemption.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
38/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: net: Move lockdep where it belongs
Date: Tue, 8 Sep 2020 07:32:20 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
39/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: tcp: Remove superfluous BH-disable around listening_hash
Date: Mon, 12 Oct 2020 17:33:54 +0200
Commit
9652dc2eb9e40 ("tcp: relax listening_hash operations")
removed the need to disable bottom half while acquiring
listening_hash.lock. There are still two callers left which disable
bottom half before the lock is acquired.
Drop local_bh_disable() around __inet_hash() which acquires
listening_hash->lock, invoke inet_ehash_nolisten() with disabled BH.
inet_unhash() conditionally acquires listening_hash->lock.
Reported-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lore.kernel.org/linux-rt-users/12d6f9879a97cd56c09fb53dee343cbb14f7f1f7.camel@gmx.de/
Link: https://lkml.kernel.org/r/X9CheYjuXWc75Spa@hirez.programming.kicks-ass.net
]
40/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: smp: Wake ksoftirqd on PREEMPT_RT instead do_softirq().
Date: Mon, 15 Feb 2021 18:44:12 +0100
The softirq implementation on PREEMPT_RT does not provide do_softirq().
The other user of do_softirq() is replaced with a local_bh_disable()
+ enable() around the possible raise-softirq invocation. This can not be
done here because migration_cpu_stop() is invoked with disabled
preemption.
Wake the softirq thread on PREEMPT_RT if there are any pending softirqs.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
41/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: tasklets: Replace barrier() with cpu_relax() in tasklet_unlock_wait()
Date: Tue, 9 Mar 2021 09:42:04 +0100
A barrier() in a tight loop which waits for something to happen on a remote
CPU is a pointless exercise. Replace it with cpu_relax() which allows HT
siblings to make progress.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
42/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: tasklets: Use static inlines for stub implementations
Date: Tue, 9 Mar 2021 09:42:05 +0100
Inlines exist for a reason.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
43/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: tasklets: Provide tasklet_disable_in_atomic()
Date: Tue, 9 Mar 2021 09:42:06 +0100
Replacing the spin wait loops in tasklet_unlock_wait() with
wait_var_event() is not possible as a handful of tasklet_disable()
invocations are happening in atomic context. All other invocations are in
teardown paths which can sleep.
Provide tasklet_disable_in_atomic() and tasklet_unlock_spin_wait() to
convert the few atomic use cases over, which allows to change
tasklet_disable() and tasklet_unlock_wait() in a later step.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
44/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: tasklets: Use spin wait in tasklet_disable() temporarily
Date: Tue, 9 Mar 2021 09:42:07 +0100
To ease the transition use spin waiting in tasklet_disable() until all
usage sites from atomic context have been cleaned up.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
45/191 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: tasklets: Replace spin wait in tasklet_unlock_wait()
Date: Tue, 9 Mar 2021 09:42:08 +0100
tasklet_unlock_wait() spin waits for TASKLET_STATE_RUN to be cleared. This
is wasting CPU cycles in a tight loop which is especially painful in a
guest when the CPU running the tasklet is scheduled out.
tasklet_unlock_wait() is invoked from tasklet_kill() which is used in
teardown paths and not performance critical at all. Replace the spin wait
with wait_var_event().
There are no users of tasklet_unlock_wait() which are invoked from atomic
contexts. The usage in tasklet_disable() has been replaced temporarily with
the spin waiting variant until the atomic users are fixed up and will be
converted to the sleep wait variant later.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
46/191 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: tasklets: Replace spin wait in tasklet_kill()
Date: Tue, 9 Mar 2021 09:42:09 +0100
tasklet_kill() spin waits for TASKLET_STATE_SCHED to be cleared invoking
yield() from inside the loop. yield() is an ill defined mechanism and the
result might still be wasting CPU cycles in a tight loop which is
especially painful in a guest when the CPU running the tasklet is scheduled
out.
tasklet_kill() is used in teardown paths and not performance critical at
all. Replace the spin wait with wait_var_event().
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
47/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: tasklets: Prevent tasklet_unlock_spin_wait() deadlock on RT
Date: Tue, 9 Mar 2021 09:42:10 +0100
tasklet_unlock_spin_wait() spin waits for the TASKLET_STATE_SCHED bit in
the tasklet state to be cleared. This works on !RT nicely because the
corresponding execution can only happen on a different CPU.
On RT softirq processing is preemptible, therefore a task preempting the
softirq processing thread can spin forever.
Prevent this by invoking local_bh_disable()/enable() inside the loop. In
case that the softirq processing thread was preempted by the current task,
current will block on the local lock which yields the CPU to the preempted
softirq processing thread. If the tasklet is processed on a different CPU
then the local_bh_disable()/enable() pair is just a waste of processor
cycles.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
48/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net: jme: Replace link-change tasklet with work
Date: Tue, 9 Mar 2021 09:42:11 +0100
The link change tasklet disables the tasklets for tx/rx processing while
upating hw parameters and then enables the tasklets again.
This update can also be pushed into a workqueue where it can be performed
in preemptible context. This allows tasklet_disable() to become sleeping.
Replace the linkch_task tasklet with a work.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
49/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net: sundance: Use tasklet_disable_in_atomic().
Date: Tue, 9 Mar 2021 09:42:12 +0100
tasklet_disable() is used in the timer callback. This might be distangled,
but without access to the hardware that's a bit risky.
Replace it with tasklet_disable_in_atomic() so tasklet_disable() can be
changed to a sleep wait once all remaining atomic users are converted.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Denis Kirjanov <kda@linux-powerpc.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
50/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: ath9k: Use tasklet_disable_in_atomic()
Date: Tue, 9 Mar 2021 09:42:13 +0100
All callers of ath9k_beacon_ensure_primary_slot() are preemptible /
acquire a mutex except for this callchain:
spin_lock_bh(&sc->sc_pcu_lock);
ath_complete_reset()
-> ath9k_calculate_summary_state()
-> ath9k_beacon_ensure_primary_slot()
It's unclear how that can be distangled, so use tasklet_disable_in_atomic()
for now. This allows tasklet_disable() to become sleepable once the
remaining atomic users are cleaned up.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: ath9k-devel@qca.qualcomm.com
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: linux-wireless@vger.kernel.org
Cc: netdev@vger.kernel.org
Acked-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
51/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: atm: eni: Use tasklet_disable_in_atomic() in the send() callback
Date: Tue, 9 Mar 2021 09:42:14 +0100
The atmdev_ops::send callback which calls tasklet_disable() is invoked with
bottom halfs disabled from net_device_ops::ndo_start_xmit(). All other
invocations of tasklet_disable() in this driver happen in preemptible
context.
Change the send() call to use tasklet_disable_in_atomic() which allows
tasklet_disable() to be made sleepable once the remaining atomic context
usage sites are cleaned up.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Chas Williams <3chas3@gmail.com>
Cc: linux-atm-general@lists.sourceforge.net
Cc: netdev@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
52/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: PCI: hv: Use tasklet_disable_in_atomic()
Date: Tue, 9 Mar 2021 09:42:15 +0100
The hv_compose_msi_msg() callback in irq_chip::irq_compose_msi_msg is
invoked via irq_chip_compose_msi_msg(), which itself is always invoked from
atomic contexts from the guts of the interrupt core code.
There is no way to change this w/o rewriting the whole driver, so use
tasklet_disable_in_atomic() which allows to make tasklet_disable()
sleepable once the remaining atomic users are addressed.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-hyperv@vger.kernel.org
Cc: linux-pci@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
53/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: firewire: ohci: Use tasklet_disable_in_atomic() where required
Date: Tue, 9 Mar 2021 09:42:16 +0100
tasklet_disable() is invoked in several places. Some of them are in atomic
context which prevents a conversion of tasklet_disable() to a sleepable
function.
The atomic callchains are:
ar_context_tasklet()
ohci_cancel_packet()
tasklet_disable()
...
ohci_flush_iso_completions()
tasklet_disable()
The invocation of tasklet_disable() from at_context_flush() is always in
preemptible context.
Use tasklet_disable_in_atomic() for the two invocations in
ohci_cancel_packet() and ohci_flush_iso_completions().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
Cc: linux1394-devel@lists.sourceforge.net
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
54/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: tasklets: Switch tasklet_disable() to the sleep wait variant
Date: Tue, 9 Mar 2021 09:42:17 +0100
-- NOT FOR IMMEDIATE MERGING --
Now that all users of tasklet_disable() are invoked from sleepable context,
convert it to use tasklet_unlock_wait() which might sleep.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
55/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: softirq: Add RT specific softirq accounting
Date: Tue, 9 Mar 2021 09:55:53 +0100
RT requires the softirq processing and local bottomhalf disabled regions to
be preemptible. Using the normal preempt count based serialization is
therefore not possible because this implicitely disables preemption.
RT kernels use a per CPU local lock to serialize bottomhalfs. As
local_bh_disable() can nest the lock can only be acquired on the outermost
invocation of local_bh_disable() and released when the nest count becomes
zero. Tasks which hold the local lock can be preempted so its required to
keep track of the nest count per task.
Add a RT only counter to task struct and adjust the relevant macros in
preempt.h.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
56/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: irqtime: Make accounting correct on RT
Date: Tue, 9 Mar 2021 09:55:54 +0100
vtime_account_irq and irqtime_account_irq() base checks on preempt_count()
which fails on RT because preempt_count() does not contain the softirq
accounting which is seperate on RT.
These checks do not need the full preempt count as they only operate on the
hard and softirq sections.
Use irq_count() instead which provides the correct value on both RT and non
RT kernels. The compiler is clever enough to fold the masking for !RT:
99b: 65 8b 05 00 00 00 00 mov %gs:0x0(%rip),%eax
- 9a2: 25 ff ff ff 7f and $0x7fffffff,%eax
+ 9a2: 25 00 ff ff 00 and $0xffff00,%eax
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
57/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: softirq: Move various protections into inline helpers
Date: Tue, 9 Mar 2021 09:55:55 +0100
To allow reuse of the bulk of softirq processing code for RT and to avoid
#ifdeffery all over the place, split protections for various code sections
out into inline helpers so the RT variant can just replace them in one go.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
58/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: softirq: Make softirq control and processing RT aware
Date: Tue, 9 Mar 2021 09:55:56 +0100
Provide a local lock based serialization for soft interrupts on RT which
allows the local_bh_disabled() sections and servicing soft interrupts to be
preemptible.
Provide the necessary inline helpers which allow to reuse the bulk of the
softirq processing code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
59/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: tick/sched: Prevent false positive softirq pending warnings on RT
Date: Tue, 9 Mar 2021 09:55:57 +0100
On RT a task which has soft interrupts disabled can block on a lock and
schedule out to idle while soft interrupts are pending. This triggers the
warning in the NOHZ idle code which complains about going idle with pending
soft interrupts. But as the task is blocked soft interrupt processing is
temporarily blocked as well which means that such a warning is a false
positive.
To prevent that check the per CPU state which indicates that a scheduled
out task has soft interrupts disabled.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
60/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rcu: Prevent false positive softirq warning on RT
Date: Tue, 9 Mar 2021 09:55:58 +0100
Soft interrupt disabled sections can legitimately be preempted or schedule
out when blocking on a lock on RT enabled kernels so the RCU preempt check
warning has to be disabled for RT kernels.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
61/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locking/rtmutex: Remove cruft
Date: Tue, 29 Sep 2020 15:21:17 +0200
Most of this is around since the very beginning. I'm not sure if this
was used while the rtmutex-deadlock-tester was around but today it seems
to only waste memory:
- save_state: No users
- name: Assigned and printed if a dead lock was detected. I'm keeping it
but want to point out that lockdep has the same information.
- file + line: Printed if ::name was NULL. This is only used for
in-kernel locks so it ::name shouldn't be NULL and then ::file and
::line isn't used.
- magic: Assigned to NULL by rt_mutex_destroy().
Remove members of rt_mutex which are not used.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
62/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locking/rtmutex: Remove output from deadlock detector.
Date: Tue, 29 Sep 2020 16:05:11 +0200
In commit
f5694788ad8da ("rt_mutex: Add lockdep annotations")
rtmutex gained lockdep annotation for rt_mutex_lock() and and related
functions.
lockdep will see the locking order and may complain about a deadlock
before rtmutex' own mechanism gets a chance to detect it.
The rtmutex deadlock detector will only complain locks with the
RT_MUTEX_MIN_CHAINWALK and a waiter must be pending. That means it
works only for in-kernel locks because the futex interface always uses
RT_MUTEX_FULL_CHAINWALK.
The requirement for an active waiter limits the detector to actual
deadlocks and makes it possible to report potential deadlocks like
lockdep does.
It looks like lockdep is better suited for reporting deadlocks.
Remove rtmutex' debug print on deadlock detection.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
63/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locking/rtmutex: Move rt_mutex_init() outside of CONFIG_DEBUG_RT_MUTEXES
Date: Tue, 29 Sep 2020 16:32:49 +0200
rt_mutex_init() only initializes lockdep if CONFIG_DEBUG_RT_MUTEXES is
enabled. The static initializer (DEFINE_RT_MUTEX) does not have such a
restriction.
Move rt_mutex_init() outside of CONFIG_DEBUG_RT_MUTEXES.
Move the remaining functions in this CONFIG_DEBUG_RT_MUTEXES block to
the upper block.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
64/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locking/rtmutex: Remove rt_mutex_timed_lock()
Date: Wed, 7 Oct 2020 12:11:33 +0200
rt_mutex_timed_lock() has no callers since commit
c051b21f71d1f ("rtmutex: Confine deadlock logic to futex")
Remove rt_mutex_timed_lock().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
65/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: locking/rtmutex: Handle the various new futex race conditions
Date: Fri, 10 Jun 2011 11:04:15 +0200
RT opens a few new interesting race conditions in the rtmutex/futex
combo due to futex hash bucket lock being a 'sleeping' spinlock and
therefor not disabling preemption.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
66/191 [
Author: Steven Rostedt
Email: rostedt@goodmis.org
Subject: futex: Fix bug on when a requeued RT task times out
Date: Tue, 14 Jul 2015 14:26:34 +0200
Requeue with timeout causes a bug with PREEMPT_RT.
The bug comes from a timed out condition.
TASK 1 TASK 2
------ ------
futex_wait_requeue_pi()
futex_wait_queue_me()
<timed out>
double_lock_hb();
raw_spin_lock(pi_lock);
if (current->pi_blocked_on) {
} else {
current->pi_blocked_on = PI_WAKE_INPROGRESS;
run_spin_unlock(pi_lock);
spin_lock(hb->lock); <-- blocked!
plist_for_each_entry_safe(this) {
rt_mutex_start_proxy_lock();
task_blocks_on_rt_mutex();
BUG_ON(task->pi_blocked_on)!!!!
The BUG_ON() actually has a check for PI_WAKE_INPROGRESS, but the
problem is that, after TASK 1 sets PI_WAKE_INPROGRESS, it then tries to
grab the hb->lock, which it fails to do so. As the hb->lock is a mutex,
it will block and set the "pi_blocked_on" to the hb->lock.
When TASK 2 goes to requeue it, the check for PI_WAKE_INPROGESS fails
because the task1's pi_blocked_on is no longer set to that, but instead,
set to the hb->lock.
The fix:
When calling rt_mutex_start_proxy_lock() a check is made to see
if the proxy tasks pi_blocked_on is set. If so, exit out early.
Otherwise set it to a new flag PI_REQUEUE_INPROGRESS, which notifies
the proxy task that it is being requeued, and will handle things
appropriately.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
67/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: locking/rtmutex: Make lock_killable work
Date: Sat, 1 Apr 2017 12:50:59 +0200
Locking an rt mutex killable does not work because signal handling is
restricted to TASK_INTERRUPTIBLE.
Use signal_pending_state() unconditionally.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
68/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: locking/spinlock: Split the lock types header
Date: Wed, 29 Jun 2011 19:34:01 +0200
Split raw_spinlock into its own file and the remaining spinlock_t into
its own non-RT header. The non-RT header will be replaced later by sleeping
spinlocks.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
69/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: locking/rtmutex: Avoid include hell
Date: Wed, 29 Jun 2011 20:06:39 +0200
Include only the required raw types. This avoids pulling in the
complete spinlock header which in turn requires rtmutex.h at some point.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
70/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: lockdep: Reduce header files in debug_locks.h
Date: Fri, 14 Aug 2020 16:55:25 +0200
The inclusion of printk.h leads to circular dependency if spinlock_t is
based on rt_mutex.
Include only atomic.h (xchg()) and cache.h (__read_mostly).
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
71/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locking: split out the rbtree definition
Date: Fri, 14 Aug 2020 17:08:41 +0200
rtmutex.h needs the definition for rb_root_cached. By including kernel.h
we will get to spinlock.h which requires rtmutex.h again.
Split out the required struct definition and move it into its own header
file which can be included by rtmutex.h
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
72/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: locking/rtmutex: Provide rt_mutex_slowlock_locked()
Date: Thu, 12 Oct 2017 16:14:22 +0200
This is the inner-part of rt_mutex_slowlock(), required for rwsem-rt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
73/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: locking/rtmutex: export lockdep-less version of rt_mutex's lock, trylock and unlock
Date: Thu, 12 Oct 2017 16:36:39 +0200
Required for lock implementation ontop of rtmutex.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
74/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Add saved_state for tasks blocked on sleeping locks
Date: Sat, 25 Jun 2011 09:21:04 +0200
Spinlocks are state preserving in !RT. RT changes the state when a
task gets blocked on a lock. So we need to remember the state before
the lock contention. If a regular wakeup (not a RTmutex related
wakeup) happens, the saved_state is updated to running. When the lock
sleep is done, the saved state is restored.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
75/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: locking/rtmutex: add sleeping lock implementation
Date: Thu, 12 Oct 2017 17:11:19 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
76/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locking/rtmutex: Allow rt_mutex_trylock() on PREEMPT_RT
Date: Wed, 2 Dec 2015 11:34:07 +0100
Non PREEMPT_RT kernel can deadlock on rt_mutex_trylock() in softirq
context.
On PREEMPT_RT the softirq context is handled in thread context. This
avoids the deadlock in the slow path and PI-boosting will be done on the
correct thread.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
77/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: locking/rtmutex: add mutex implementation based on rtmutex
Date: Thu, 12 Oct 2017 17:17:03 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
78/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: locking/rtmutex: add rwsem implementation based on rtmutex
Date: Thu, 12 Oct 2017 17:28:34 +0200
The RT specific R/W semaphore implementation restricts the number of readers
to one because a writer cannot block on multiple readers and inherit its
priority or budget.
The single reader restricting is painful in various ways:
- Performance bottleneck for multi-threaded applications in the page fault
path (mmap sem)
- Progress blocker for drivers which are carefully crafted to avoid the
potential reader/writer deadlock in mainline.
The analysis of the writer code paths shows, that properly written RT tasks
should not take them. Syscalls like mmap(), file access which take mmap sem
write locked have unbound latencies which are completely unrelated to mmap
sem. Other R/W sem users like graphics drivers are not suitable for RT tasks
either.
So there is little risk to hurt RT tasks when the RT rwsem implementation is
changed in the following way:
- Allow concurrent readers
- Make writers block until the last reader left the critical section. This
blocking is not subject to priority/budget inheritance.
- Readers blocked on a writer inherit their priority/budget in the normal
way.
There is a drawback with this scheme. R/W semaphores become writer unfair
though the applications which have triggered writer starvation (mostly on
mmap_sem) in the past are not really the typical workloads running on a RT
system. So while it's unlikely to hit writer starvation, it's possible. If
there are unexpected workloads on RT systems triggering it, we need to rethink
the approach.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
79/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: locking/rtmutex: add rwlock implementation based on rtmutex
Date: Thu, 12 Oct 2017 17:18:06 +0200
The implementation is bias-based, similar to the rwsem implementation.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
80/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: locking/rtmutex: wire up RT's locking
Date: Thu, 12 Oct 2017 17:31:14 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
81/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locking/rtmutex: add ww_mutex addon for mutex-rt
Date: Thu, 12 Oct 2017 17:34:38 +0200
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
82/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locking/rtmutex: Use custom scheduling function for spin-schedule()
Date: Tue, 6 Oct 2020 13:07:17 +0200
PREEMPT_RT builds the rwsem, mutex, spinlock and rwlock typed locks on
top of a rtmutex lock. While blocked task->pi_blocked_on is set
(tsk_is_pi_blocked()) and task needs to schedule away while waiting.
The schedule process must distinguish between blocking on a regular
sleeping lock (rwsem and mutex) and a RT-only sleeping lock (spinlock
and rwlock):
- rwsem and mutex must flush block requests (blk_schedule_flush_plug())
even if blocked on a lock. This can not deadlock because this also
happens for non-RT.
There should be a warning if the scheduling point is within a RCU read
section.
- spinlock and rwlock must not flush block requests. This will deadlock
if the callback attempts to acquire a lock which is already acquired.
Similarly to being preempted, there should be no warning if the
scheduling point is within a RCU read section.
Add preempt_schedule_lock() which is invoked if scheduling is required
while blocking on a PREEMPT_RT-only sleeping lock.
Remove tsk_is_pi_blocked() from the scheduler path which is no longer
needed with the additional scheduler entry point.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
83/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: signal: Revert ptrace preempt magic
Date: Wed, 21 Sep 2011 19:57:12 +0200
Upstream commit '53da1d9456fe7f8 fix ptrace slowness' is nothing more
than a bandaid around the ptrace design trainwreck. It's not a
correctness issue, it's merily a cosmetic bandaid.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
84/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: preempt: Provide preempt_*_(no)rt variants
Date: Fri, 24 Jul 2009 12:38:56 +0200
RT needs a few preempt_disable/enable points which are not necessary
otherwise. Implement variants to avoid #ifdeffery.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
85/191 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: mm/vmstat: Protect per cpu variables with preempt disable on RT
Date: Fri, 3 Jul 2009 08:30:13 -0500
Disable preemption on -RT for the vmstat code. On vanila the code runs in
IRQ-off regions while on -RT it is not. "preempt_disable" ensures that the
same ressources is not updated in parallel due to preemption.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
86/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: mm/memcontrol: Disable preemption in __mod_memcg_lruvec_state()
Date: Wed, 28 Oct 2020 18:15:32 +0100
The callers expect disabled preemption/interrupts while invoking
__mod_memcg_lruvec_state(). This works mainline because a lock of
somekind is acquired.
Use preempt_disable_rt() where per-CPU variables are accessed and a
stable pointer is expected. This is also done in __mod_zone_page_state()
for the same reason.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
87/191 [
Author: Ahmed S. Darwish
Email: a.darwish@linutronix.de
Subject: xfrm: Use sequence counter with associated spinlock
Date: Wed, 10 Jun 2020 12:53:22 +0200
A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.
Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.
If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.
Upstream-status: The xfrm locking used for seqcoun writer serialization
appears to be broken. If that's the case, a proper fix will need to be
submitted upstream. (e.g. make the seqcount per network namespace?)
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
88/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: u64_stats: Disable preemption on 32bit-UP/SMP with RT during updates
Date: Mon, 17 Aug 2020 12:28:10 +0200
On RT the seqcount_t is required even on UP because the softirq can be
preempted. The IRQ handler is threaded so it is also preemptible.
Disable preemption on 32bit-RT during value updates. There is no need to
disable interrupts on RT because the handler is run threaded. Therefore
disabling preemption is enough to guarantee that the update is not
interruped.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
89/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: fs/dcache: use swait_queue instead of waitqueue
Date: Wed, 14 Sep 2016 14:35:49 +0200
__d_lookup_done() invokes wake_up_all() while holding a hlist_bl_lock()
which disables preemption. As a workaround convert it to swait.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
90/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: fs/dcache: disable preemption on i_dir_seq's write side
Date: Fri, 20 Oct 2017 11:29:53 +0200
i_dir_seq is an opencoded seqcounter. Based on the code it looks like we
could have two writers in parallel despite the fact that the d_lock is
held. The problem is that during the write process on RT the preemption
is still enabled and if this process is interrupted by a reader with RT
priority then we lock up.
To avoid that lock up I am disabling the preemption during the update.
The rename of i_dir_seq is here to ensure to catch new write sides in
future.
Cc: stable-rt@vger.kernel.org
Reported-by: Oleg.Karfich@wago.com
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
91/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net/Qdisc: use a seqlock instead seqcount
Date: Wed, 14 Sep 2016 17:36:35 +0200
The seqcount disables preemption on -RT while it is held which can't
remove. Also we don't want the reader to spin for ages if the writer is
scheduled out. The seqlock on the other hand will serialize / sleep on
the lock while writer is active.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
92/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net: Properly annotate the try-lock for the seqlock
Date: Tue, 8 Sep 2020 16:57:11 +0200
In patch
("net/Qdisc: use a seqlock instead seqcount")
the seqcount has been replaced with a seqlock to allow to reader to
boost the preempted writer.
The try_write_seqlock() acquired the lock with a try-lock but the
seqcount annotation was "lock".
Opencode write_seqcount_t_begin() and use the try-lock annotation for
lockdep.
Reported-by: Mike Galbraith <efault@gmx.de>
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
93/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: kconfig: Disable config options which are not RT compatible
Date: Sun, 24 Jul 2011 12:11:43 +0200
Disable stuff which is known to have issues on RT
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
94/191 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: mm: Allow only SLUB on RT
Date: Fri, 3 Jul 2009 08:44:03 -0500
Memory allocation disables interrupts as part of the allocation and freeing
process. For -RT it is important that this section remain short and don't
depend on the size of the request or an internal state of the memory allocator.
At the beginning the SLAB memory allocator was adopted for RT's needs and it
required substantial changes. Later, with the addition of the SLUB memory
allocator we adopted this one as well and the changes were smaller. More
important, due to the design of the SLUB allocator it performs better and its
worst case latency was smaller. In the end only SLUB remained supported.
Disable SLAB and SLOB on -RT. Only SLUB is adopted to -RT needs.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
95/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Disable CONFIG_RT_GROUP_SCHED on RT
Date: Mon, 18 Jul 2011 17:03:52 +0200
Carsten reported problems when running:
taskset 01 chrt -f 1 sleep 1
from within rc.local on a F15 machine. The task stays running and
never gets on the run queue because some of the run queues have
rt_throttled=1 which does not go away. Works nice from a ssh login
shell. Disabling CONFIG_RT_GROUP_SCHED solves that as well.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
96/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net/core: disable NET_RX_BUSY_POLL on RT
Date: Sat, 27 May 2017 19:02:06 +0200
napi_busy_loop() disables preemption and performs a NAPI poll. We can't acquire
sleeping locks with disabled preemption so we would have to work around this
and add explicit locking for synchronisation against ksoftirqd.
Without explicit synchronisation a low priority process would "own" the NAPI
state (by setting NAPIF_STATE_SCHED) and could be scheduled out (no
preempt_disable() and BH is preemptible on RT).
In case a network packages arrives then the interrupt handler would set
NAPIF_STATE_MISSED and the system would wait until the task owning the NAPI
would be scheduled in again.
Should a task with RT priority busy poll then it would consume the CPU instead
allowing tasks with lower priority to run.
The NET_RX_BUSY_POLL is disabled by default (the system wide sysctls for
poll/read are set to zero) so disable NET_RX_BUSY_POLL on RT to avoid wrong
locking context on RT. Should this feature be considered useful on RT systems
then it could be enabled again with proper locking and synchronisation.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
97/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: efi: Disable runtime services on RT
Date: Thu, 26 Jul 2018 15:03:16 +0200
Based on meassurements the EFI functions get_variable /
get_next_variable take up to 2us which looks okay.
The functions get_time, set_time take around 10ms. Those 10ms are too
much. Even one ms would be too much.
Ard mentioned that SetVariable might even trigger larger latencies if
the firware will erase flash blocks on NOR.
The time-functions are used by efi-rtc and can be triggered during
runtimed (either via explicit read/write or ntp sync).
The variable write could be used by pstore.
These functions can be disabled without much of a loss. The poweroff /
reboot hooks may be provided by PSCI.
Disable EFI's runtime wrappers.
This was observed on "EFI v2.60 by SoftIron Overdrive 1000".
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
98/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: efi: Allow efi=runtime
Date: Thu, 26 Jul 2018 15:06:10 +0200
In case the command line option "efi=noruntime" is default at built-time, the user
could overwrite its state by `efi=runtime' and allow it again.
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
99/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rt: Add local irq locks
Date: Mon, 20 Jun 2011 09:03:47 +0200
Introduce locallock. For !RT this maps to preempt_disable()/
local_irq_disable() so there is not much that changes. For RT this will
map to a spinlock. This makes preemption possible and locked "ressource"
gets the lockdep anotation it wouldn't have otherwise. The locks are
recursive for owner == current. Also, all locks user migrate_disable()
which ensures that the task is not migrated to another CPU while the lock
is held and the owner is preempted.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
100/191 [
Author: Oleg Nesterov
Email: oleg@redhat.com
Subject: signal/x86: Delay calling signals in atomic
Date: Tue, 14 Jul 2015 14:26:34 +0200
On x86_64 we must disable preemption before we enable interrupts
for stack faults, int3 and debugging, because the current task is using
a per CPU debug stack defined by the IST. If we schedule out, another task
can come in and use the same stack and cause the stack to be corrupted
and crash the kernel on return.
When CONFIG_PREEMPT_RT is enabled, spin_locks become mutexes, and
one of these is the spin lock used in signal handling.
Some of the debug code (int3) causes do_trap() to send a signal.
This function calls a spin lock that has been converted to a mutex
and has the possibility to sleep. If this happens, the above issues with
the corrupted stack is possible.
Instead of calling the signal right away, for PREEMPT_RT and x86_64,
the signal information is stored on the stacks task_struct and
TIF_NOTIFY_RESUME is set. Then on exit of the trap, the signal resume
code will send the signal when preemption is enabled.
[ rostedt: Switched from #ifdef CONFIG_PREEMPT_RT to
ARCH_RT_DELAYS_SIGNAL_SEND and added comments to the code. ]
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy: also needed on 32bit as per Yang Shi <yang.shi@linaro.org>]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
101/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: kernel/sched: add {put|get}_cpu_light()
Date: Sat, 27 May 2017 19:02:06 +0200
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
102/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: trace: Add migrate-disabled counter to tracing output
Date: Sun, 17 Jul 2011 21:56:42 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
103/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locking: don't check for __LINUX_SPINLOCK_TYPES_H on -RT archs
Date: Fri, 4 Aug 2017 17:40:42 +0200
Upstream uses arch_spinlock_t within spinlock_t and requests that
spinlock_types.h header file is included first.
On -RT we have the rt_mutex with its raw_lock wait_lock which needs
architectures' spinlock_types.h header file for its definition. However
we need rt_mutex first because it is used to build the spinlock_t so
that check does not work for us.
Therefore I am dropping that check.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
104/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mm: sl[au]b: Change list_lock to raw_spinlock_t
Date: Mon, 28 May 2018 15:24:22 +0200
The list_lock is used with used with IRQs off on PREEMPT_RT. Make it a
raw_spinlock_t otherwise the interrupts won't be disabled on PREEMPT_RT.
The locking rules remain unchanged.
The lock is updated for SLAB and SLUB since both share the same header
file for struct kmem_cache_node defintion.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
105/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: mm: slub: Make object_map_lock a raw_spinlock_t
Date: Thu, 16 Jul 2020 18:47:50 +0200
The variable object_map is protected by object_map_lock. The lock is always
acquired in debug code and within already atomic context
Make object_map_lock a raw_spinlock_t.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
106/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mm: slub: Enable irqs for __GFP_WAIT
Date: Wed, 9 Jan 2013 12:08:15 +0100
SYSTEM_RUNNING might be too late for enabling interrupts. Allocations
with GFP_WAIT can happen before that. So use this as an indicator.
[bigeasy: Add warning on RT for allocations in atomic context.
Don't enable interrupts on allocations during SYSTEM_SUSPEND. This is done
during suspend by ACPI, noticed by Liwei Song <liwei.song@windriver.com>
]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
107/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: mm: slub: Move discard_slab() invocations out of IRQ-off sections
Date: Fri, 26 Feb 2021 15:14:15 +0100
discard_slab() gives the memory back to the page-allocator. Some of its
invocation occur from IRQ-disabled sections which were disabled by SLUB.
An example is the deactivate_slab() invocation from within
___slab_alloc() or put_cpu_partial().
Instead of giving the memory back directly, put the pages on a list and
process it once the caller is out of the known IRQ-off region.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
108/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: mm: slub: Move flush_cpu_slab() invocations __free_slab() invocations out of IRQ context
Date: Fri, 26 Feb 2021 17:11:55 +0100
flush_all() flushes a specific SLAB cache on each CPU (where the cache
is present). The discard_delayed()/__free_slab() invocation happens
within IPI handler and is problematic for PREEMPT_RT.
The flush operation is not a frequent operation or a hot path. The
per-CPU flush operation can be moved to within a workqueue.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
109/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: mm: slub: Don't resize the location tracking cache on PREEMPT_RT
Date: Fri, 26 Feb 2021 17:26:04 +0100
The location tracking cache has a size of a page and is resized if its
current size is too small.
This allocation happens with disabled interrupts and can't happen on
PREEMPT_RT.
Should one page be too small, then we have to allocate more at the
beginning. The only downside is that less callers will be visible.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
110/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: mm: page_alloc: Use migrate_disable() in drain_local_pages_wq()
Date: Thu, 2 Jul 2020 14:27:23 +0200
drain_local_pages_wq() disables preemption to avoid CPU migration during
CPU hotplug and can't use cpus_read_lock().
Using migrate_disable() works here, too. The scheduler won't take the
CPU offline until the task left the migrate-disable section.
Use migrate_disable() in drain_local_pages_wq().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
111/191 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: mm: page_alloc: Use a local_lock instead of explicit local_irq_save().
Date: Fri, 3 Jul 2009 08:29:37 -0500
The page-allocator disables interrupts for a few reasons:
- Decouple interrupt the irqsave operation from spin_lock() so it can be
extended over the actual lock region and cover other areas. Areas like
counters increments where the preemptible version can be avoided.
- Access to the per-CPU pcp from struct zone.
Replace the irqsave with a local-lock. The counters are expected to be
always modified with disabled preemption and no access from interrupt
context.
Contains fixes from:
Peter Zijlstra <a.p.zijlstra@chello.nl>
Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
112/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: mm: slub: Don't enable partial CPU caches on PREEMPT_RT by default
Date: Tue, 2 Mar 2021 18:58:04 +0100
SLUB's partial CPU caches lead to higher latencies in a hackbench
benchmark.
Don't enable partial CPU caches by default on PREEMPT_RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
113/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: mm: memcontrol: Provide a local_lock for per-CPU memcg_stock
Date: Tue, 18 Aug 2020 10:30:00 +0200
The interrupts are disabled to ensure CPU-local access to the per-CPU
variable `memcg_stock'.
As the code inside the interrupt disabled section acquires regular
spinlocks, which are converted to 'sleeping' spinlocks on a PREEMPT_RT
kernel, this conflicts with the RT semantics.
Convert it to a local_lock which allows RT kernels to substitute them with
a real per CPU lock. On non RT kernels this maps to local_irq_save() as
before, but provides also lockdep coverage of the critical region.
No functional change.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
114/191 [
Author: Yang Shi
Email: yang.shi@windriver.com
Subject: mm/memcontrol: Don't call schedule_work_on in preemption disabled context
Date: Wed, 30 Oct 2013 11:48:33 -0700
The following trace is triggered when running ltp oom test cases:
BUG: sleeping function called from invalid context at kernel/rtmutex.c:659
in_atomic(): 1, irqs_disabled(): 0, pid: 17188, name: oom03
Preemption disabled at:[<ffffffff8112ba70>] mem_cgroup_reclaim+0x90/0xe0
CPU: 2 PID: 17188 Comm: oom03 Not tainted 3.10.10-rt3 #2
Hardware name: Intel Corporation Calpella platform/MATXM-CORE-411-B, BIOS 4.6.3 08/18/2010
ffff88007684d730 ffff880070df9b58 ffffffff8169918d ffff880070df9b70
ffffffff8106db31 ffff88007688b4a0 ffff880070df9b88 ffffffff8169d9c0
ffff88007688b4a0 ffff880070df9bc8 ffffffff81059da1 0000000170df9bb0
Call Trace:
[<ffffffff8169918d>] dump_stack+0x19/0x1b
[<ffffffff8106db31>] __might_sleep+0xf1/0x170
[<ffffffff8169d9c0>] rt_spin_lock+0x20/0x50
[<ffffffff81059da1>] queue_work_on+0x61/0x100
[<ffffffff8112b361>] drain_all_stock+0xe1/0x1c0
[<ffffffff8112ba70>] mem_cgroup_reclaim+0x90/0xe0
[<ffffffff8112beda>] __mem_cgroup_try_charge+0x41a/0xc40
[<ffffffff810f1c91>] ? release_pages+0x1b1/0x1f0
[<ffffffff8106f200>] ? sched_exec+0x40/0xb0
[<ffffffff8112cc87>] mem_cgroup_charge_common+0x37/0x70
[<ffffffff8112e2c6>] mem_cgroup_newpage_charge+0x26/0x30
[<ffffffff8110af68>] handle_pte_fault+0x618/0x840
[<ffffffff8103ecf6>] ? unpin_current_cpu+0x16/0x70
[<ffffffff81070f94>] ? migrate_enable+0xd4/0x200
[<ffffffff8110cde5>] handle_mm_fault+0x145/0x1e0
[<ffffffff810301e1>] __do_page_fault+0x1a1/0x4c0
[<ffffffff8169c9eb>] ? preempt_schedule_irq+0x4b/0x70
[<ffffffff8169e3b7>] ? retint_kernel+0x37/0x40
[<ffffffff8103053e>] do_page_fault+0xe/0x10
[<ffffffff8169e4c2>] page_fault+0x22/0x30
So, to prevent schedule_work_on from being called in preempt disabled context,
replace the pair of get/put_cpu() to get/put_cpu_light().
Signed-off-by: Yang Shi <yang.shi@windriver.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
115/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: mm/memcontrol: Replace local_irq_disable with local locks
Date: Wed, 28 Jan 2015 17:14:16 +0100
There are a few local_irq_disable() which then take sleeping locks. This
patch converts them local locks.
[bigeasy: Move unlock after memcg_check_events() in mem_cgroup_swapout(),
pointed out by Matt Fleming <matt@codeblueprint.co.uk>]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
116/191 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: mm/zsmalloc: copy with get_cpu_var() and locking
Date: Tue, 22 Mar 2016 11:16:09 +0100
get_cpu_var() disables preemption and triggers a might_sleep() splat later.
This is replaced with get_locked_var().
This bitspinlocks are replaced with a proper mutex which requires a slightly
larger struct to allocate.
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
[bigeasy: replace the bitspin_lock() with a mutex, get_locked_var(). Mike then
fixed the size magic]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
117/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: x86: kvm Require const tsc for RT
Date: Sun, 6 Nov 2011 12:26:18 +0100
Non constant TSC is a nightmare on bare metal already, but with
virtualization it becomes a complete disaster because the workarounds
are horrible latency wise. That's also a preliminary for running RT in
a guest on top of a RT host.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
118/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: wait.h: include atomic.h
Date: Mon, 28 Oct 2013 12:19:57 +0100
| CC init/main.o
|In file included from include/linux/mmzone.h:9:0,
| from include/linux/gfp.h:4,
| from include/linux/kmod.h:22,
| from include/linux/module.h:13,
| from init/main.c:15:
|include/linux/wait.h: In function ‘wait_on_atomic_t’:
|include/linux/wait.h:982:2: error: implicit declaration of function ‘atomic_read’ [-Werror=implicit-function-declaration]
| if (atomic_read(val) == 0)
| ^
This pops up on ARM. Non-RT gets its atomic.h include from spinlock.h
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
119/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Limit the number of task migrations per batch
Date: Mon, 6 Jun 2011 12:12:51 +0200
Put an upper limit on the number of tasks which are migrated per batch
to avoid large latencies.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
120/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Move mmdrop to RCU on RT
Date: Mon, 6 Jun 2011 12:20:33 +0200
Takes sleeping locks and calls into the memory allocator, so nothing
we want to do in task switch and oder atomic contexts.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
121/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: kernel/sched: move stack + kprobe clean up to __put_task_struct()
Date: Mon, 21 Nov 2016 19:31:08 +0100
There is no need to free the stack before the task struct (except for reasons
mentioned in commit 68f24b08ee89 ("sched/core: Free the stack early if
CONFIG_THREAD_INFO_IN_TASK")). This also comes handy on -RT because we can't
free memory in preempt disabled region.
vfree_atomic() delays the memory cleanup to a worker. Since we move everything
to the RCU callback, we can also free it immediately.
Cc: stable-rt@vger.kernel.org #for kprobe_flush_task()
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
122/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Do not account rcu_preempt_depth on RT in might_sleep()
Date: Tue, 7 Jun 2011 09:19:06 +0200
RT changes the rcu_preempt_depth semantics, so we cannot check for it
in might_sleep().
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
123/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Disable TTWU_QUEUE on RT
Date: Tue, 13 Sep 2011 16:42:35 +0200
The queued remote wakeup mechanism can introduce rather large
latencies if the number of migrated tasks is high. Disable it for RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
124/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: softirq: Check preemption after reenabling interrupts
Date: Sun, 13 Nov 2011 17:17:09 +0100
raise_softirq_irqoff() disables interrupts and wakes the softirq
daemon, but after reenabling interrupts there is no preemption check,
so the execution of the softirq thread might be delayed arbitrarily.
In principle we could add that check to local_irq_enable/restore, but
that's overkill as the rasie_softirq_irqoff() sections are the only
ones which show this behaviour.
Reported-by: Carsten Emde <cbe@osadl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
125/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: softirq: Disable softirq stacks for RT
Date: Mon, 18 Jul 2011 13:59:17 +0200
Disable extra stacks for softirqs. We want to preempt softirqs and
having them on special IRQ-stack does not make this easier.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
126/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net/core: use local_bh_disable() in netif_rx_ni()
Date: Fri, 16 Jun 2017 19:03:16 +0200
In 2004 netif_rx_ni() gained a preempt_disable() section around
netif_rx() and its do_softirq() + testing for it. The do_softirq() part
is required because netif_rx() raises the softirq but does not invoke
it. The preempt_disable() is required to remain on the same CPU which added the
skb to the per-CPU list.
All this can be avoided be putting this into a local_bh_disable()ed
section. The local_bh_enable() part will invoke do_softirq() if
required.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
127/191 [
Author: Grygorii Strashko
Email: Grygorii.Strashko@linaro.org
Subject: pid.h: include atomic.h
Date: Tue, 21 Jul 2015 19:43:56 +0300
This patch fixes build error:
CC kernel/pid_namespace.o
In file included from kernel/pid_namespace.c:11:0:
include/linux/pid.h: In function 'get_pid':
include/linux/pid.h:78:3: error: implicit declaration of function 'atomic_inc' [-Werror=implicit-function-declaration]
atomic_inc(&pid->count);
^
which happens when
CONFIG_PROVE_LOCKING=n
CONFIG_DEBUG_SPINLOCK=n
CONFIG_DEBUG_MUTEXES=n
CONFIG_DEBUG_LOCK_ALLOC=n
CONFIG_PID_NS=y
Vanilla gets this via spinlock.h.
Signed-off-by: Grygorii Strashko <Grygorii.Strashko@linaro.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
128/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: ptrace: fix ptrace vs tasklist_lock race
Date: Thu, 29 Aug 2013 18:21:04 +0200
As explained by Alexander Fyodorov <halcy@yandex.ru>:
|read_lock(&tasklist_lock) in ptrace_stop() is converted to mutex on RT kernel,
|and it can remove __TASK_TRACED from task->state (by moving it to
|task->saved_state). If parent does wait() on child followed by a sys_ptrace
|call, the following race can happen:
|
|- child sets __TASK_TRACED in ptrace_stop()
|- parent does wait() which eventually calls wait_task_stopped() and returns
| child's pid
|- child blocks on read_lock(&tasklist_lock) in ptrace_stop() and moves
| __TASK_TRACED flag to saved_state
|- parent calls sys_ptrace, which calls ptrace_check_attach() and wait_task_inactive()
The patch is based on his initial patch where an additional check is
added in case the __TASK_TRACED moved to ->saved_state. The pi_lock is
taken in case the caller is interrupted between looking into ->state and
->saved_state.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
129/191 [
Author: Oleg Nesterov
Email: oleg@redhat.com
Subject: ptrace: fix ptrace_unfreeze_traced() race with rt-lock
Date: Tue, 3 Nov 2020 12:39:01 +0100
The patch "ptrace: fix ptrace vs tasklist_lock race" changed
ptrace_freeze_traced() to take task->saved_state into account, but
ptrace_unfreeze_traced() has the same problem and needs a similar fix:
it should check/update both ->state and ->saved_state.
Reported-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Fixes: "ptrace: fix ptrace vs tasklist_lock race"
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: stable-rt@vger.kernel.org
]
130/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: rcu: Delay RCU-selftests
Date: Wed, 10 Mar 2021 15:09:02 +0100
Delay RCU-selftests until ksoftirqd is up and running.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
131/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locking: Make spinlock_t and rwlock_t a RCU section on RT
Date: Tue, 19 Nov 2019 09:25:04 +0100
On !RT a locked spinlock_t and rwlock_t disables preemption which
implies a RCU read section. There is code that relies on that behaviour.
Add an explicit RCU read section on RT while a sleeping lock (a lock
which would disables preemption on !RT) acquired.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
132/191 [
Author: Scott Wood
Email: swood@redhat.com
Subject: rcutorture: Avoid problematic critical section nesting on RT
Date: Wed, 11 Sep 2019 17:57:29 +0100
rcutorture was generating some nesting scenarios that are not
reasonable. Constrain the state selection to avoid them.
Example #1:
1. preempt_disable()
2. local_bh_disable()
3. preempt_enable()
4. local_bh_enable()
On PREEMPT_RT, BH disabling takes a local lock only when called in
non-atomic context. Thus, atomic context must be retained until after BH
is re-enabled. Likewise, if BH is initially disabled in non-atomic
context, it cannot be re-enabled in atomic context.
Example #2:
1. rcu_read_lock()
2. local_irq_disable()
3. rcu_read_unlock()
4. local_irq_enable()
If the thread is preempted between steps 1 and 2,
rcu_read_unlock_special.b.blocked will be set, but it won't be
acted on in step 3 because IRQs are disabled. Thus, reporting of the
quiescent state will be delayed beyond the local_irq_enable().
For now, these scenarios will continue to be tested on non-PREEMPT_RT
kernels, until debug checks are added to ensure that they are not
happening elsewhere.
Signed-off-by: Scott Wood <swood@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
133/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mm/vmalloc: Another preempt disable region which sucks
Date: Tue, 12 Jul 2011 11:39:36 +0200
Avoid the preempt disable version of get_cpu_var(). The inner-lock should
provide enough serialisation.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
134/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: block/mq: do not invoke preempt_disable()
Date: Tue, 14 Jul 2015 14:26:34 +0200
preempt_disable() and get_cpu() don't play well together with the sleeping
locks it tries to allocate later.
It seems to be enough to replace it with get_cpu_light() and migrate_disable().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
135/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: md: raid5: Make raid5_percpu handling RT aware
Date: Tue, 6 Apr 2010 16:51:31 +0200
__raid_run_ops() disables preemption with get_cpu() around the access
to the raid5_percpu variables. That causes scheduling while atomic
spews on RT.
Serialize the access to the percpu data with a lock and keep the code
preemptible.
Reported-by: Udo van den Heuvel <udovdh@xs4all.nl>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Udo van den Heuvel <udovdh@xs4all.nl>
]
136/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: scsi/fcoe: Make RT aware.
Date: Sat, 12 Nov 2011 14:00:48 +0100
Do not disable preemption while taking sleeping locks. All user look safe
for migrate_diable() only.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
137/191 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: sunrpc: Make svc_xprt_do_enqueue() use get_cpu_light()
Date: Wed, 18 Feb 2015 16:05:28 +0100
|BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:915
|in_atomic(): 1, irqs_disabled(): 0, pid: 3194, name: rpc.nfsd
|Preemption disabled at:[<ffffffffa06bf0bb>] svc_xprt_received+0x4b/0xc0 [sunrpc]
|CPU: 6 PID: 3194 Comm: rpc.nfsd Not tainted 3.18.7-rt1 #9
|Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.404 11/06/2014
| ffff880409630000 ffff8800d9a33c78 ffffffff815bdeb5 0000000000000002
| 0000000000000000 ffff8800d9a33c98 ffffffff81073c86 ffff880408dd6008
| ffff880408dd6000 ffff8800d9a33cb8 ffffffff815c3d84 ffff88040b3ac000
|Call Trace:
| [<ffffffff815bdeb5>] dump_stack+0x4f/0x9e
| [<ffffffff81073c86>] __might_sleep+0xe6/0x150
| [<ffffffff815c3d84>] rt_spin_lock+0x24/0x50
| [<ffffffffa06beec0>] svc_xprt_do_enqueue+0x80/0x230 [sunrpc]
| [<ffffffffa06bf0bb>] svc_xprt_received+0x4b/0xc0 [sunrpc]
| [<ffffffffa06c03ed>] svc_add_new_perm_xprt+0x6d/0x80 [sunrpc]
| [<ffffffffa06b2693>] svc_addsock+0x143/0x200 [sunrpc]
| [<ffffffffa072e69c>] write_ports+0x28c/0x340 [nfsd]
| [<ffffffffa072d2ac>] nfsctl_transaction_write+0x4c/0x80 [nfsd]
| [<ffffffff8117ee83>] vfs_write+0xb3/0x1d0
| [<ffffffff8117f889>] SyS_write+0x49/0xb0
| [<ffffffff815c4556>] system_call_fastpath+0x16/0x1b
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
138/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rt: Introduce cpu_chill()
Date: Wed, 7 Mar 2012 20:51:03 +0100
Retry loops on RT might loop forever when the modifying side was
preempted. Add cpu_chill() to replace cpu_relax(). cpu_chill()
defaults to cpu_relax() for non RT. On RT it puts the looping task to
sleep for a tick so the preempted task can make progress.
Steven Rostedt changed it to use a hrtimer instead of msleep():
|
|Ulrich Obergfell pointed out that cpu_chill() calls msleep() which is woken
|up by the ksoftirqd running the TIMER softirq. But as the cpu_chill() is
|called from softirq context, it may block the ksoftirqd() from running, in
|which case, it may never wake up the msleep() causing the deadlock.
+ bigeasy later changed to schedule_hrtimeout()
|If a task calls cpu_chill() and gets woken up by a regular or spurious
|wakeup and has a signal pending, then it exits the sleep loop in
|do_nanosleep() and sets up the restart block. If restart->nanosleep.type is
|not TI_NONE then this results in accessing a stale user pointer from a
|previously interrupted syscall and a copy to user based on the stale
|pointer or a BUG() when 'type' is not supported in nanosleep_copyout().
+ bigeasy: add PF_NOFREEZE:
| [....] Waiting for /dev to be fully populated...
| =====================================
| [ BUG: udevd/229 still has locks held! ]
| 3.12.11-rt17 #23 Not tainted
| -------------------------------------
| 1 lock held by udevd/229:
| #0: (&type->i_mutex_dir_key#2){+.+.+.}, at: lookup_slow+0x28/0x98
|
| stack backtrace:
| CPU: 0 PID: 229 Comm: udevd Not tainted 3.12.11-rt17 #23
| (unwind_backtrace+0x0/0xf8) from (show_stack+0x10/0x14)
| (show_stack+0x10/0x14) from (dump_stack+0x74/0xbc)
| (dump_stack+0x74/0xbc) from (do_nanosleep+0x120/0x160)
| (do_nanosleep+0x120/0x160) from (hrtimer_nanosleep+0x90/0x110)
| (hrtimer_nanosleep+0x90/0x110) from (cpu_chill+0x30/0x38)
| (cpu_chill+0x30/0x38) from (dentry_kill+0x158/0x1ec)
| (dentry_kill+0x158/0x1ec) from (dput+0x74/0x15c)
| (dput+0x74/0x15c) from (lookup_real+0x4c/0x50)
| (lookup_real+0x4c/0x50) from (__lookup_hash+0x34/0x44)
| (__lookup_hash+0x34/0x44) from (lookup_slow+0x38/0x98)
| (lookup_slow+0x38/0x98) from (path_lookupat+0x208/0x7fc)
| (path_lookupat+0x208/0x7fc) from (filename_lookup+0x20/0x60)
| (filename_lookup+0x20/0x60) from (user_path_at_empty+0x50/0x7c)
| (user_path_at_empty+0x50/0x7c) from (user_path_at+0x14/0x1c)
| (user_path_at+0x14/0x1c) from (vfs_fstatat+0x48/0x94)
| (vfs_fstatat+0x48/0x94) from (SyS_stat64+0x14/0x30)
| (SyS_stat64+0x14/0x30) from (ret_fast_syscall+0x0/0x48)
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
139/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: fs: namespace: Use cpu_chill() in trylock loops
Date: Wed, 7 Mar 2012 21:00:34 +0100
Retry loops on RT might loop forever when the modifying side was
preempted. Use cpu_chill() instead of cpu_relax() to let the system
make progress.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
140/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: debugobjects: Make RT aware
Date: Sun, 17 Jul 2011 21:41:35 +0200
Avoid filling the pool / allocating memory with irqs off().
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
141/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: net: Use skbufhead with raw lock
Date: Tue, 12 Jul 2011 15:38:34 +0200
Use the rps lock as rawlock so we can keep irq-off regions. It looks low
latency. However we can't kfree() from this context therefore we defer this
to the softirq and use the tofree_queue list for it (similar to process_queue).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
142/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net: Dequeue in dev_cpu_dead() without the lock
Date: Wed, 16 Sep 2020 16:15:39 +0200
Upstream uses skb_dequeue() to acquire lock of `input_pkt_queue'. The reason is
to synchronize against a remote CPU which still thinks that the CPU is online
enqueues packets to this CPU.
There are no guarantees that the packet is enqueued before the callback is run,
it just hope.
RT however complains about an not initialized lock because it uses another lock
for `input_pkt_queue' due to the IRQ-off nature of the context.
Use the unlocked dequeue version for `input_pkt_queue'.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
143/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net: dev: always take qdisc's busylock in __dev_xmit_skb()
Date: Wed, 30 Mar 2016 13:36:29 +0200
The root-lock is dropped before dev_hard_start_xmit() is invoked and after
setting the __QDISC___STATE_RUNNING bit. If this task is now pushed away
by a task with a higher priority then the task with the higher priority
won't be able to submit packets to the NIC directly instead they will be
enqueued into the Qdisc. The NIC will remain idle until the task(s) with
higher priority leave the CPU and the task with lower priority gets back
and finishes the job.
If we take always the busylock we ensure that the RT task can boost the
low-prio task and submit the packet.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
144/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: irqwork: push most work into softirq context
Date: Tue, 23 Jun 2015 15:32:51 +0200
Initially we defered all irqwork into softirq because we didn't want the
latency spikes if perf or another user was busy and delayed the RT task.
The NOHZ trigger (nohz_full_kick_work) was the first user that did not work
as expected if it did not run in the original irqwork context so we had to
bring it back somehow for it. push_irq_work_func is the second one that
requires this.
This patch adds the IRQ_WORK_HARD_IRQ which makes sure the callback runs
in raw-irq context. Everything else is defered into softirq context. Without
-RT we have the orignal behavior.
This patch incorporates tglx orignal work which revoked a little bringing back
the arch_irq_work_raise() if possible and a few fixes from Steven Rostedt and
Mike Galbraith,
[bigeasy: melt tglx's irq_work_tick_soft() which splits irq_work_tick() into a
hard and soft variant]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
145/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: crypto: limit more FPU-enabled sections
Date: Thu, 30 Nov 2017 13:40:10 +0100
Those crypto drivers use SSE/AVX/… for their crypto work and in order to
do so in kernel they need to enable the "FPU" in kernel mode which
disables preemption.
There are two problems with the way they are used:
- the while loop which processes X bytes may create latency spikes and
should be avoided or limited.
- the cipher-walk-next part may allocate/free memory and may use
kmap_atomic().
The whole kernel_fpu_begin()/end() processing isn't probably that cheap.
It most likely makes sense to process as much of those as possible in one
go. The new *_fpu_sched_rt() schedules only if a RT task is pending.
Probably we should measure the performance those ciphers in pure SW
mode and with this optimisations to see if it makes sense to keep them
for RT.
This kernel_fpu_resched() makes the code more preemptible which might hurt
performance.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
146/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: crypto: cryptd - add a lock instead preempt_disable/local_bh_disable
Date: Thu, 26 Jul 2018 18:52:00 +0200
cryptd has a per-CPU lock which protected with local_bh_disable() and
preempt_disable().
Add an explicit spin_lock to make the locking context more obvious and
visible to lockdep. Since it is a per-CPU lock, there should be no lock
contention on the actual spinlock.
There is a small race-window where we could be migrated to another CPU
after the cpu_queue has been obtain. This is not a problem because the
actual ressource is protected by the spinlock.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
147/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: panic: skip get_random_bytes for RT_FULL in init_oops_id
Date: Tue, 14 Jul 2015 14:26:34 +0200
Disable on -RT. If this is invoked from irq-context we will have problems
to acquire the sleeping lock.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
148/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: x86: stackprotector: Avoid random pool on rt
Date: Thu, 16 Dec 2010 14:25:18 +0100
CPU bringup calls into the random pool to initialize the stack
canary. During boot that works nicely even on RT as the might sleep
checks are disabled. During CPU hotplug the might sleep checks
trigger. Making the locks in random raw is a major PITA, so avoid the
call on RT is the only sensible solution. This is basically the same
randomness which we get during boot where the random pool has no
entropy and we rely on the TSC randomnness.
Reported-by: Carsten Emde <carsten.emde@osadl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
149/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: random: Make it work on rt
Date: Tue, 21 Aug 2012 20:38:50 +0200
Delegate the random insertion to the forced threaded interrupt
handler. Store the return IP of the hard interrupt handler in the irq
descriptor and feed it into the random generator as a source of
entropy.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
150/191 [
Author: Priyanka Jain
Email: Priyanka.Jain@freescale.com
Subject: net: Remove preemption disabling in netif_rx()
Date: Thu, 17 May 2012 09:35:11 +0530
1)enqueue_to_backlog() (called from netif_rx) should be
bind to a particluar CPU. This can be achieved by
disabling migration. No need to disable preemption
2)Fixes crash "BUG: scheduling while atomic: ksoftirqd"
in case of RT.
If preemption is disabled, enqueue_to_backog() is called
in atomic context. And if backlog exceeds its count,
kfree_skb() is called. But in RT, kfree_skb() might
gets scheduled out, so it expects non atomic context.
-Replace preempt_enable(), preempt_disable() with
migrate_enable(), migrate_disable() respectively
-Replace get_cpu(), put_cpu() with get_cpu_light(),
put_cpu_light() respectively
Signed-off-by: Priyanka Jain <Priyanka.Jain@freescale.com>
Acked-by: Rajan Srivastava <Rajan.Srivastava@freescale.com>
Cc: <rostedt@goodmis.orgn>
Link: http://lkml.kernel.org/r/1337227511-2271-1-git-send-email-Priyanka.Jain@freescale.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy: Remove assumption about migrate_disable() from the description.]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
151/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: lockdep: Make it RT aware
Date: Sun, 17 Jul 2011 18:51:23 +0200
teach lockdep that we don't really do softirqs on -RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
152/191 [
Author: Yong Zhang
Email: yong.zhang@windriver.com
Subject: lockdep: selftest: Only do hardirq context test for raw spinlock
Date: Mon, 16 Apr 2012 15:01:56 +0800
On -rt there is no softirq context any more and rwlock is sleepable,
disable softirq context test and rwlock+irq test.
Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Cc: Yong Zhang <yong.zhang@windriver.com>
Link: http://lkml.kernel.org/r/1334559716-18447-3-git-send-email-yong.zhang0@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
153/191 [
Author: Josh Cartwright
Email: josh.cartwright@ni.com
Subject: lockdep: selftest: fix warnings due to missing PREEMPT_RT conditionals
Date: Wed, 28 Jan 2015 13:08:45 -0600
"lockdep: Selftest: Only do hardirq context test for raw spinlock"
disabled the execution of certain tests with PREEMPT_RT, but did
not prevent the tests from still being defined. This leads to warnings
like:
./linux/lib/locking-selftest.c:574:1: warning: 'irqsafe1_hard_rlock_12' defined but not used [-Wunused-function]
./linux/lib/locking-selftest.c:574:1: warning: 'irqsafe1_hard_rlock_21' defined but not used [-Wunused-function]
./linux/lib/locking-selftest.c:577:1: warning: 'irqsafe1_hard_wlock_12' defined but not used [-Wunused-function]
./linux/lib/locking-selftest.c:577:1: warning: 'irqsafe1_hard_wlock_21' defined but not used [-Wunused-function]
./linux/lib/locking-selftest.c:580:1: warning: 'irqsafe1_soft_spin_12' defined but not used [-Wunused-function]
...
Fixed by wrapping the test definitions in #ifndef CONFIG_PREEMPT_RT
conditionals.
Signed-off-by: Josh Cartwright <josh.cartwright@ni.com>
Signed-off-by: Xander Huff <xander.huff@ni.com>
Acked-by: Gratian Crisan <gratian.crisan@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
154/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: lockdep: disable self-test
Date: Tue, 17 Oct 2017 16:36:18 +0200
The self-test wasn't always 100% accurate for RT. We disabled a few
tests which failed because they had a different semantic for RT. Some
still reported false positives. Now the selftest locks up the system
during boot and it needs to be investigated…
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
155/191 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: drm,radeon,i915: Use preempt_disable/enable_rt() where recommended
Date: Sat, 27 Feb 2016 08:09:11 +0100
DRM folks identified the spots, so use them.
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-rt-users <linux-rt-users@vger.kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
156/191 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: drm/i915: Don't disable interrupts on PREEMPT_RT during atomic updates
Date: Sat, 27 Feb 2016 09:01:42 +0100
Commit
8d7849db3eab7 ("drm/i915: Make sprite updates atomic")
started disabling interrupts across atomic updates. This breaks on PREEMPT_RT
because within this section the code attempt to acquire spinlock_t locks which
are sleeping locks on PREEMPT_RT.
According to the comment the interrupts are disabled to avoid random delays and
not required for protection or synchronisation.
Don't disable interrupts on PREEMPT_RT during atomic updates.
[bigeasy: drop local locks, commit message]
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
157/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: drm/i915: disable tracing on -RT
Date: Thu, 6 Dec 2018 09:52:20 +0100
Luca Abeni reported this:
| BUG: scheduling while atomic: kworker/u8:2/15203/0x00000003
| CPU: 1 PID: 15203 Comm: kworker/u8:2 Not tainted 4.19.1-rt3 #10
| Call Trace:
| rt_spin_lock+0x3f/0x50
| gen6_read32+0x45/0x1d0 [i915]
| g4x_get_vblank_counter+0x36/0x40 [i915]
| trace_event_raw_event_i915_pipe_update_start+0x7d/0xf0 [i915]
The tracing events use trace_i915_pipe_update_start() among other events
use functions acquire spin locks. A few trace points use
intel_get_crtc_scanline(), others use ->get_vblank_counter() wich also
might acquire a sleeping lock.
Based on this I don't see any other way than disable trace points on RT.
Cc: stable-rt@vger.kernel.org
Reported-by: Luca Abeni <lucabe72@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
158/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: drm/i915: skip DRM_I915_LOW_LEVEL_TRACEPOINTS with NOTRACE
Date: Wed, 19 Dec 2018 10:47:02 +0100
The order of the header files is important. If this header file is
included after tracepoint.h was included then the NOTRACE here becomes a
nop. Currently this happens for two .c files which use the tracepoitns
behind DRM_I915_LOW_LEVEL_TRACEPOINTS.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
159/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: drm/i915/gt: Only disable interrupts for the timeline lock on !force-threaded
Date: Tue, 7 Jul 2020 12:25:11 +0200
According to commit
d67739268cf0e ("drm/i915/gt: Mark up the nested engine-pm timeline lock as irqsafe")
the intrrupts are disabled the code may be called from an interrupt
handler and from preemptible context.
With `force_irqthreads' set the timeline mutex is never observed in IRQ
context so it is not neede to disable interrupts.
Disable only interrupts if not in `force_irqthreads' mode.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
160/191 [
Author: Mike Galbraith
Email: efault@gmx.de
Subject: cpuset: Convert callback_lock to raw_spinlock_t
Date: Sun, 8 Jan 2017 09:32:25 +0100
The two commits below add up to a cpuset might_sleep() splat for RT:
8447a0fee974 cpuset: convert callback_mutex to a spinlock
344736f29b35 cpuset: simplify cpuset_node_allowed API
BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:995
in_atomic(): 0, irqs_disabled(): 1, pid: 11718, name: cset
CPU: 135 PID: 11718 Comm: cset Tainted: G E 4.10.0-rt1-rt #4
Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0056.R01.1409242327 09/24/2014
Call Trace:
? dump_stack+0x5c/0x81
? ___might_sleep+0xf4/0x170
? rt_spin_lock+0x1c/0x50
? __cpuset_node_allowed+0x66/0xc0
? ___slab_alloc+0x390/0x570 <disables IRQs>
? anon_vma_fork+0x8f/0x140
? copy_page_range+0x6cf/0xb00
? anon_vma_fork+0x8f/0x140
? __slab_alloc.isra.74+0x5a/0x81
? anon_vma_fork+0x8f/0x140
? kmem_cache_alloc+0x1b5/0x1f0
? anon_vma_fork+0x8f/0x140
? copy_process.part.35+0x1670/0x1ee0
? _do_fork+0xdd/0x3f0
? _do_fork+0xdd/0x3f0
? do_syscall_64+0x61/0x170
? entry_SYSCALL64_slow_path+0x25/0x25
The later ensured that a NUMA box WILL take callback_lock in atomic
context by removing the allocator and reclaim path __GFP_HARDWALL
usage which prevented such contexts from taking callback_mutex.
One option would be to reinstate __GFP_HARDWALL protections for
RT, however, as the 8447a0fee974 changelog states:
The callback_mutex is only used to synchronize reads/updates of cpusets'
flags and cpu/node masks. These operations should always proceed fast so
there's no reason why we can't use a spinlock instead of the mutex.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
161/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: x86: Allow to enable RT
Date: Wed, 7 Aug 2019 18:15:38 +0200
Allow to select RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
162/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mm/scatterlist: Do not disable irqs on RT
Date: Fri, 3 Jul 2009 08:44:34 -0500
For -RT it is enough to keep pagefault disabled (which is currently handled by
kmap_atomic()).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
163/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Add support for lazy preemption
Date: Fri, 26 Oct 2012 18:50:54 +0100
It has become an obsession to mitigate the determinism vs. throughput
loss of RT. Looking at the mainline semantics of preemption points
gives a hint why RT sucks throughput wise for ordinary SCHED_OTHER
tasks. One major issue is the wakeup of tasks which are right away
preempting the waking task while the waking task holds a lock on which
the woken task will block right after having preempted the wakee. In
mainline this is prevented due to the implicit preemption disable of
spin/rw_lock held regions. On RT this is not possible due to the fully
preemptible nature of sleeping spinlocks.
Though for a SCHED_OTHER task preempting another SCHED_OTHER task this
is really not a correctness issue. RT folks are concerned about
SCHED_FIFO/RR tasks preemption and not about the purely fairness
driven SCHED_OTHER preemption latencies.
So I introduced a lazy preemption mechanism which only applies to
SCHED_OTHER tasks preempting another SCHED_OTHER task. Aside of the
existing preempt_count each tasks sports now a preempt_lazy_count
which is manipulated on lock acquiry and release. This is slightly
incorrect as for lazyness reasons I coupled this on
migrate_disable/enable so some other mechanisms get the same treatment
(e.g. get_cpu_light).
Now on the scheduler side instead of setting NEED_RESCHED this sets
NEED_RESCHED_LAZY in case of a SCHED_OTHER/SCHED_OTHER preemption and
therefor allows to exit the waking task the lock held region before
the woken task preempts. That also works better for cross CPU wakeups
as the other side can stay in the adaptive spinning loop.
For RT class preemption there is no change. This simply sets
NEED_RESCHED and forgoes the lazy preemption counter.
Initial test do not expose any observable latency increasement, but
history shows that I've been proven wrong before :)
The lazy preemption mode is per default on, but with
CONFIG_SCHED_DEBUG enabled it can be disabled via:
# echo NO_PREEMPT_LAZY >/sys/kernel/debug/sched_features
and reenabled via
# echo PREEMPT_LAZY >/sys/kernel/debug/sched_features
The test results so far are very machine and workload dependent, but
there is a clear trend that it enhances the non RT workload
performance.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
164/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: x86/entry: Use should_resched() in idtentry_exit_cond_resched()
Date: Tue, 30 Jun 2020 11:45:14 +0200
The TIF_NEED_RESCHED bit is inlined on x86 into the preemption counter.
By using should_resched(0) instead of need_resched() the same check can
be performed which uses the same variable as 'preempt_count()` which was
issued before.
Use should_resched(0) instead need_resched().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
165/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: x86: Support for lazy preemption
Date: Thu, 1 Nov 2012 11:03:47 +0100
Implement the x86 pieces for lazy preempt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
166/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: arm: Add support for lazy preemption
Date: Wed, 31 Oct 2012 12:04:11 +0100
Implement the arm pieces for lazy preempt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
167/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: powerpc: Add support for lazy preemption
Date: Thu, 1 Nov 2012 10:14:11 +0100
Implement the powerpc pieces for lazy preempt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
168/191 [
Author: Anders Roxell
Email: anders.roxell@linaro.org
Subject: arch/arm64: Add lazy preempt support
Date: Thu, 14 May 2015 17:52:17 +0200
arm64 is missing support for PREEMPT_RT. The main feature which is
lacking is support for lazy preemption. The arch-specific entry code,
thread information structure definitions, and associated data tables
have to be extended to provide this support. Then the Kconfig file has
to be extended to indicate the support is available, and also to
indicate that support for full RT preemption is now available.
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
]
169/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: jump-label: disable if stop_machine() is used
Date: Wed, 8 Jul 2015 17:14:48 +0200
Some architectures are using stop_machine() while switching the opcode which
leads to latency spikes.
The architectures which use stop_machine() atm:
- ARM stop machine
- s390 stop machine
The architecures which use other sorcery:
- MIPS
- X86
- powerpc
- sparc
- arm64
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy: only ARM for now]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
170/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: leds: trigger: disable CPU trigger on -RT
Date: Thu, 23 Jan 2014 14:45:59 +0100
as it triggers:
|CPU: 0 PID: 0 Comm: swapper Not tainted 3.12.8-rt10 #141
|[<c0014aa4>] (unwind_backtrace+0x0/0xf8) from [<c0012788>] (show_stack+0x1c/0x20)
|[<c0012788>] (show_stack+0x1c/0x20) from [<c043c8dc>] (dump_stack+0x20/0x2c)
|[<c043c8dc>] (dump_stack+0x20/0x2c) from [<c004c5e8>] (__might_sleep+0x13c/0x170)
|[<c004c5e8>] (__might_sleep+0x13c/0x170) from [<c043f270>] (__rt_spin_lock+0x28/0x38)
|[<c043f270>] (__rt_spin_lock+0x28/0x38) from [<c043fa00>] (rt_read_lock+0x68/0x7c)
|[<c043fa00>] (rt_read_lock+0x68/0x7c) from [<c036cf74>] (led_trigger_event+0x2c/0x5c)
|[<c036cf74>] (led_trigger_event+0x2c/0x5c) from [<c036e0bc>] (ledtrig_cpu+0x54/0x5c)
|[<c036e0bc>] (ledtrig_cpu+0x54/0x5c) from [<c000ffd8>] (arch_cpu_idle_exit+0x18/0x1c)
|[<c000ffd8>] (arch_cpu_idle_exit+0x18/0x1c) from [<c00590b8>] (cpu_startup_entry+0xa8/0x234)
|[<c00590b8>] (cpu_startup_entry+0xa8/0x234) from [<c043b2cc>] (rest_init+0xb8/0xe0)
|[<c043b2cc>] (rest_init+0xb8/0xe0) from [<c061ebe0>] (start_kernel+0x2c4/0x380)
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
171/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: tty/serial/omap: Make the locking RT aware
Date: Thu, 28 Jul 2011 13:32:57 +0200
The lock is a sleeping lock and local_irq_save() is not the
optimsation we are looking for. Redo it to make it work on -RT and
non-RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
172/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: tty/serial/pl011: Make the locking work on RT
Date: Tue, 8 Jan 2013 21:36:51 +0100
The lock is a sleeping lock and local_irq_save() is not the optimsation
we are looking for. Redo it to make it work on -RT and non-RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
173/191 [
Author: Yadi.hu
Email: yadi.hu@windriver.com
Subject: ARM: enable irq in translation/section permission fault handlers
Date: Wed, 10 Dec 2014 10:32:09 +0800
Probably happens on all ARM, with
CONFIG_PREEMPT_RT
CONFIG_DEBUG_ATOMIC_SLEEP
This simple program....
int main() {
*((char*)0xc0001000) = 0;
};
[ 512.742724] BUG: sleeping function called from invalid context at kernel/rtmutex.c:658
[ 512.743000] in_atomic(): 0, irqs_disabled(): 128, pid: 994, name: a
[ 512.743217] INFO: lockdep is turned off.
[ 512.743360] irq event stamp: 0
[ 512.743482] hardirqs last enabled at (0): [< (null)>] (null)
[ 512.743714] hardirqs last disabled at (0): [<c0426370>] copy_process+0x3b0/0x11c0
[ 512.744013] softirqs last enabled at (0): [<c0426370>] copy_process+0x3b0/0x11c0
[ 512.744303] softirqs last disabled at (0): [< (null)>] (null)
[ 512.744631] [<c041872c>] (unwind_backtrace+0x0/0x104)
[ 512.745001] [<c09af0c4>] (dump_stack+0x20/0x24)
[ 512.745355] [<c0462490>] (__might_sleep+0x1dc/0x1e0)
[ 512.745717] [<c09b6770>] (rt_spin_lock+0x34/0x6c)
[ 512.746073] [<c0441bf0>] (do_force_sig_info+0x34/0xf0)
[ 512.746457] [<c0442668>] (force_sig_info+0x18/0x1c)
[ 512.746829] [<c041d880>] (__do_user_fault+0x9c/0xd8)
[ 512.747185] [<c041d938>] (do_bad_area+0x7c/0x94)
[ 512.747536] [<c041d990>] (do_sect_fault+0x40/0x48)
[ 512.747898] [<c040841c>] (do_DataAbort+0x40/0xa0)
[ 512.748181] Exception stack(0xecaa1fb0 to 0xecaa1ff8)
Oxc0000000 belongs to kernel address space, user task can not be
allowed to access it. For above condition, correct result is that
test case should receive a “segment fault” and exits but not stacks.
the root cause is commit 02fe2845d6a8 ("avoid enabling interrupts in
prefetch/data abort handlers"),it deletes irq enable block in Data
abort assemble code and move them into page/breakpiont/alignment fault
handlers instead. But author does not enable irq in translation/section
permission fault handlers. ARM disables irq when it enters exception/
interrupt mode, if kernel doesn't enable irq, it would be still disabled
during translation/section permission fault.
We see the above splat because do_force_sig_info is still called with
IRQs off, and that code eventually does a:
spin_lock_irqsave(&t->sighand->siglock, flags);
As this is architecture independent code, and we've not seen any other
need for other arch to have the siglock converted to raw lock, we can
conclude that we should enable irq for ARM translation/section
permission exception.
Signed-off-by: Yadi.hu <yadi.hu@windriver.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
174/191 [
Author: Josh Cartwright
Email: joshc@ni.com
Subject: genirq: update irq_set_irqchip_state documentation
Date: Thu, 11 Feb 2016 11:54:00 -0600
On -rt kernels, the use of migrate_disable()/migrate_enable() is
sufficient to guarantee a task isn't moved to another CPU. Update the
irq_set_irqchip_state() documentation to reflect this.
Signed-off-by: Josh Cartwright <joshc@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
175/191 [
Author: Josh Cartwright
Email: joshc@ni.com
Subject: KVM: arm/arm64: downgrade preempt_disable()d region to migrate_disable()
Date: Thu, 11 Feb 2016 11:54:01 -0600
kvm_arch_vcpu_ioctl_run() disables the use of preemption when updating
the vgic and timer states to prevent the calling task from migrating to
another CPU. It does so to prevent the task from writing to the
incorrect per-CPU GIC distributor registers.
On -rt kernels, it's possible to maintain the same guarantee with the
use of migrate_{disable,enable}(), with the added benefit that the
migrate-disabled region is preemptible. Update
kvm_arch_vcpu_ioctl_run() to do so.
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Reported-by: Manish Jaggi <Manish.Jaggi@caviumnetworks.com>
Signed-off-by: Josh Cartwright <joshc@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
176/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: arm64: fpsimd: Delay freeing memory in fpsimd_flush_thread()
Date: Wed, 25 Jul 2018 14:02:38 +0200
fpsimd_flush_thread() invokes kfree() via sve_free() within a preempt disabled
section which is not working on -RT.
Delay freeing of memory until preemption is enabled again.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
177/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: x86: Enable RT also on 32bit
Date: Thu, 7 Nov 2019 17:49:20 +0100
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
178/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: ARM: Allow to enable RT
Date: Fri, 11 Oct 2019 13:14:29 +0200
Allow to select RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
179/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: ARM64: Allow to enable RT
Date: Fri, 11 Oct 2019 13:14:35 +0200
Allow to select RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
180/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: powerpc: traps: Use PREEMPT_RT
Date: Fri, 26 Jul 2019 11:30:49 +0200
Add PREEMPT_RT to the backtrace if enabled.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
181/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: powerpc/pseries/iommu: Use a locallock instead local_irq_save()
Date: Tue, 26 Mar 2019 18:31:54 +0100
The locallock protects the per-CPU variable tce_page. The function
attempts to allocate memory while tce_page is protected (by disabling
interrupts).
Use local_irq_save() instead of local_irq_disable().
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
182/191 [
Author: Bogdan Purcareata
Email: bogdan.purcareata@freescale.com
Subject: powerpc/kvm: Disable in-kernel MPIC emulation for PREEMPT_RT
Date: Fri, 24 Apr 2015 15:53:13 +0000
While converting the openpic emulation code to use a raw_spinlock_t enables
guests to run on RT, there's still a performance issue. For interrupts sent in
directed delivery mode with a multiple CPU mask, the emulated openpic will loop
through all of the VCPUs, and for each VCPUs, it call IRQ_check, which will loop
through all the pending interrupts for that VCPU. This is done while holding the
raw_lock, meaning that in all this time the interrupts and preemption are
disabled on the host Linux. A malicious user app can max both these number and
cause a DoS.
This temporary fix is sent for two reasons. First is so that users who want to
use the in-kernel MPIC emulation are aware of the potential latencies, thus
making sure that the hardware MPIC and their usage scenario does not involve
interrupts sent in directed delivery mode, and the number of possible pending
interrupts is kept small. Secondly, this should incentivize the development of a
proper openpic emulation that would be better suited for RT.
Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Bogdan Purcareata <bogdan.purcareata@freescale.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
183/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: powerpc/stackprotector: work around stack-guard init from atomic
Date: Tue, 26 Mar 2019 18:31:29 +0100
This is invoked from the secondary CPU in atomic context. On x86 we use
tsc instead. On Power we XOR it against mftb() so lets use stack address
as the initial value.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
184/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: powerpc: Avoid recursive header includes
Date: Fri, 8 Jan 2021 19:48:21 +0100
- The include of bug.h leads to an include of printk.h which gets back
to spinlock.h and complains then about missing xchg().
Remove bug.h and add bits.h which is needed for BITS_PER_BYTE.
- Avoid the "please don't include this file directly" error from
rwlock-rt. Allow an include from/with rtmutex.h.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
185/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: POWERPC: Allow to enable RT
Date: Fri, 11 Oct 2019 13:14:41 +0200
Allow to select RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
186/191 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: drivers/block/zram: Replace bit spinlocks with rtmutex for -rt
Date: Thu, 31 Mar 2016 04:08:28 +0200
They're nondeterministic, and lead to ___might_sleep() splats in -rt.
OTOH, they're a lot less wasteful than an rtmutex per page.
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
187/191 [
Author: Haris Okanovic
Email: haris.okanovic@ni.com
Subject: tpm_tis: fix stall after iowrite*()s
Date: Tue, 15 Aug 2017 15:13:08 -0500
ioread8() operations to TPM MMIO addresses can stall the cpu when
immediately following a sequence of iowrite*()'s to the same region.
For example, cyclitest measures ~400us latency spikes when a non-RT
usermode application communicates with an SPI-based TPM chip (Intel Atom
E3940 system, PREEMPT_RT kernel). The spikes are caused by a
stalling ioread8() operation following a sequence of 30+ iowrite8()s to
the same address. I believe this happens because the write sequence is
buffered (in cpu or somewhere along the bus), and gets flushed on the
first LOAD instruction (ioread*()) that follows.
The enclosed change appears to fix this issue: read the TPM chip's
access register (status code) after every iowrite*() operation to
amortize the cost of flushing data to chip across multiple instructions.
Signed-off-by: Haris Okanovic <haris.okanovic@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
188/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: signals: Allow RT tasks to cache one sigqueue struct
Date: Fri, 3 Jul 2009 08:44:56 -0500
Allow realtime tasks to cache one sigqueue in task struct. This avoids an
allocation which can cause latencies or fail.
Ideally the sigqueue is cached after first sucessfull delivery and will be
available for next signal delivery. This works under the assumption that the RT
task has never an unprocessed singal while one is about to be queued.
The caching is not used for SIGQUEUE_PREALLOC because this kind of sigqueue is
handled differently (and not used for regular signal delivery).
[bigeasy: With a fix from Matt Fleming <matt@codeblueprint.co.uk>]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
189/191 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: genirq: Disable irqpoll on -rt
Date: Fri, 3 Jul 2009 08:29:57 -0500
Creates long latencies for no value
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
190/191 [
Author: Clark Williams
Email: williams@redhat.com
Subject: sysfs: Add /sys/kernel/realtime entry
Date: Sat, 30 Jul 2011 21:55:53 -0500
Add a /sys/kernel entry to indicate that the kernel is a
realtime kernel.
Clark says that he needs this for udev rules, udev needs to evaluate
if its a PREEMPT_RT kernel a few thousand times and parsing uname
output is too slow or so.
Are there better solutions? Should it exist and return 0 on !-rt?
Signed-off-by: Clark Williams <williams@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
]
191/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: Add localversion for -RT release
Date: Fri, 8 Jul 2011 20:25:16 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
1/194 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: stop_machine: Add function and caller debug info
Date: Fri, 23 Oct 2020 12:11:59 +0200
Crashes in stop-machine are hard to connect to the calling code, add a
little something to help with that.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
2/194 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: sched: Fix balance_callback()
Date: Fri, 23 Oct 2020 12:12:00 +0200
The intent of balance_callback() has always been to delay executing
balancing operations until the end of the current rq->lock section.
This is because balance operations must often drop rq->lock, and that
isn't safe in general.
However, as noted by Scott, there were a few holes in that scheme;
balance_callback() was called after rq->lock was dropped, which means
another CPU can interleave and touch the callback list.
Rework code to call the balance callbacks before dropping rq->lock
where possible, and otherwise splice the balance list onto a local
stack.
This guarantees that the balance list must be empty when we take
rq->lock. IOW, we'll only ever run our own balance callbacks.
Reported-by: Scott Wood <swood@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
3/194 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: sched/hotplug: Ensure only per-cpu kthreads run during hotplug
Date: Fri, 23 Oct 2020 12:12:01 +0200
In preparation for migrate_disable(), make sure only per-cpu kthreads
are allowed to run on !active CPUs.
This is ran (as one of the very first steps) from the cpu-hotplug
task which is a per-cpu kthread and completion of the hotplug
operation only requires such tasks.
This constraint enables the migrate_disable() implementation to wait
for completion of all migrate_disable regions on this CPU at hotplug
time without fear of any new ones starting.
This replaces the unlikely(rq->balance_callbacks) test at the tail of
context_switch with an unlikely(rq->balance_work), the fast path is
not affected.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
4/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched/core: Wait for tasks being pushed away on hotplug
Date: Fri, 23 Oct 2020 12:12:02 +0200
RT kernels need to ensure that all tasks which are not per CPU kthreads
have left the outgoing CPU to guarantee that no tasks are force migrated
within a migrate disabled section.
There is also some desire to (ab)use fine grained CPU hotplug control to
clear a CPU from active state to force migrate tasks which are not per CPU
kthreads away for power control purposes.
Add a mechanism which waits until all tasks which should leave the CPU
after the CPU active flag is cleared have moved to a different online CPU.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
5/194 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: workqueue: Manually break affinity on hotplug
Date: Fri, 23 Oct 2020 12:12:03 +0200
Don't rely on the scheduler to force break affinity for us -- it will
stop doing that for per-cpu-kthreads.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
6/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched/hotplug: Consolidate task migration on CPU unplug
Date: Fri, 23 Oct 2020 12:12:04 +0200
With the new mechanism which kicks tasks off the outgoing CPU at the end of
schedule() the situation on an outgoing CPU right before the stopper thread
brings it down completely is:
- All user tasks and all unbound kernel threads have either been migrated
away or are not running and the next wakeup will move them to a online CPU.
- All per CPU kernel threads, except cpu hotplug thread and the stopper
thread have either been unbound or parked by the responsible CPU hotplug
callback.
That means that at the last step before the stopper thread is invoked the
cpu hotplug thread is the last legitimate running task on the outgoing
CPU.
Add a final wait step right before the stopper thread is kicked which
ensures that any still running tasks on the way to park or on the way to
kick themself of the CPU are either sleeping or gone.
This allows to remove the migrate_tasks() crutch in sched_cpu_dying(). If
sched_cpu_dying() detects that there is still another running task aside of
the stopper thread then it will explode with the appropriate fireworks.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
7/194 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: sched: Fix hotplug vs CPU bandwidth control
Date: Fri, 23 Oct 2020 12:12:05 +0200
Since we now migrate tasks away before DYING, we should also move
bandwidth unthrottle, otherwise we can gain tasks from unthrottle
after we expect all tasks to be gone already.
Also; it looks like the RT balancers don't respect cpu_active() and
instead rely on rq->online in part, complete this. This too requires
we do set_rq_offline() earlier to match the cpu_active() semantics.
(The bigger patch is to convert RT to cpu_active() entirely)
Since set_rq_online() is called from sched_cpu_activate(), place
set_rq_offline() in sched_cpu_deactivate().
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
8/194 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: sched: Massage set_cpus_allowed()
Date: Fri, 23 Oct 2020 12:12:06 +0200
Thread a u32 flags word through the *set_cpus_allowed*() callchain.
This will allow adding behavioural tweaks for future users.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
9/194 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: sched: Add migrate_disable()
Date: Fri, 23 Oct 2020 12:12:07 +0200
Add the base migrate_disable() support (under protest).
While migrate_disable() is (currently) required for PREEMPT_RT, it is
also one of the biggest flaws in the system.
Notably this is just the base implementation, it is broken vs
sched_setaffinity() and hotplug, both solved in additional patches for
ease of review.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
10/194 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: sched: Fix migrate_disable() vs set_cpus_allowed_ptr()
Date: Fri, 23 Oct 2020 12:12:08 +0200
Concurrent migrate_disable() and set_cpus_allowed_ptr() has
interesting features. We rely on set_cpus_allowed_ptr() to not return
until the task runs inside the provided mask. This expectation is
exported to userspace.
This means that any set_cpus_allowed_ptr() caller must wait until
migrate_enable() allows migrations.
At the same time, we don't want migrate_enable() to schedule, due to
patterns like:
preempt_disable();
migrate_disable();
...
migrate_enable();
preempt_enable();
And:
raw_spin_lock(&B);
spin_unlock(&A);
this means that when migrate_enable() must restore the affinity
mask, it cannot wait for completion thereof. Luck will have it that
that is exactly the case where there is a pending
set_cpus_allowed_ptr(), so let that provide storage for the async stop
machine.
Much thanks to Valentin who used TLA+ most effective and found lots of
'interesting' cases.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
11/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched/core: Make migrate disable and CPU hotplug cooperative
Date: Fri, 23 Oct 2020 12:12:09 +0200
On CPU unplug tasks which are in a migrate disabled region cannot be pushed
to a different CPU until they returned to migrateable state.
Account the number of tasks on a runqueue which are in a migrate disabled
section and make the hotplug wait mechanism respect that.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
12/194 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: sched,rt: Use cpumask_any*_distribute()
Date: Fri, 23 Oct 2020 12:12:10 +0200
Replace a bunch of cpumask_any*() instances with
cpumask_any*_distribute(), by injecting this little bit of random in
cpu selection, we reduce the chance two competing balance operations
working off the same lowest_mask pick the same CPU.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
13/194 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: sched,rt: Use the full cpumask for balancing
Date: Fri, 23 Oct 2020 12:12:11 +0200
We want migrate_disable() tasks to get PULLs in order for them to PUSH
away the higher priority task.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
14/194 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: sched, lockdep: Annotate ->pi_lock recursion
Date: Fri, 23 Oct 2020 12:12:12 +0200
There's a valid ->pi_lock recursion issue where the actual PI code
tries to wake up the stop task. Make lockdep aware so it doesn't
complain about this.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
15/194 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: sched: Fix migrate_disable() vs rt/dl balancing
Date: Fri, 23 Oct 2020 12:12:13 +0200
In order to minimize the interference of migrate_disable() on lower
priority tasks, which can be deprived of runtime due to being stuck
below a higher priority task. Teach the RT/DL balancers to push away
these higher priority tasks when a lower priority task gets selected
to run on a freshly demoted CPU (pull).
This adds migration interference to the higher priority task, but
restores bandwidth to system that would otherwise be irrevocably lost.
Without this it would be possible to have all tasks on the system
stuck on a single CPU, each task preempted in a migrate_disable()
section with a single high priority task running.
This way we can still approximate running the M highest priority tasks
on the system.
Migrating the top task away is (ofcourse) still subject to
migrate_disable() too, which means the lower task is subject to an
interference equivalent to the worst case migrate_disable() section.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
16/194 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: sched/proc: Print accurate cpumask vs migrate_disable()
Date: Fri, 23 Oct 2020 12:12:14 +0200
Ensure /proc/*/status doesn't print 'random' cpumasks due to
migrate_disable().
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
17/194 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: sched: Add migrate_disable() tracepoints
Date: Fri, 23 Oct 2020 12:12:15 +0200
XXX write a tracer:
- 'migirate_disable() -> migrate_enable()' time in task_sched_runtime()
- 'migrate_pull -> sched-in' time in task_sched_runtime()
The first will give worst case for the second, which is the actual
interference experienced by the task to due migration constraints of
migrate_disable().
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
18/194 [
Author: Valentin Schneider
Email: valentin.schneider@arm.com
Subject: sched: Deny self-issued __set_cpus_allowed_ptr() when migrate_disable()
Date: Fri, 23 Oct 2020 12:12:16 +0200
migrate_disable();
set_cpus_allowed_ptr(current, {something excluding task_cpu(current)});
affine_move_task(); <-- never returns
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20201013140116.26651-1-valentin.schneider@arm.com
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
19/194 [
Author: Valentin Schneider
Email: valentin.schneider@arm.com
Subject: sched: Comment affine_move_task()
Date: Fri, 23 Oct 2020 12:12:17 +0200
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20201013140116.26651-2-valentin.schneider@arm.com
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
20/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: Use CONFIG_PREEMPTION
Date: Fri, 26 Jul 2019 11:30:49 +0200
Thisi is an all-in-one patch of the current `PREEMPTION' branch.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
21/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: blk-mq: Don't complete on a remote CPU in force threaded mode
Date: Wed, 28 Oct 2020 11:07:44 +0100
With force threaded interrupts enabled, raising softirq from an SMP
function call will always result in waking the ksoftirqd thread. This is
not optimal given that the thread runs at SCHED_OTHER priority.
Completing the request in hard IRQ-context on PREEMPT_RT (which enforces
the force threaded mode) is bad because the completion handler may
acquire sleeping locks which violate the locking context.
Disable request completing on a remote CPU in force threaded mode.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
22/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: blk-mq: Always complete remote completions requests in softirq
Date: Wed, 28 Oct 2020 11:07:09 +0100
Controllers with multiple queues have their IRQ-handelers pinned to a
CPU. The core shouldn't need to complete the request on a remote CPU.
Remove this case and always raise the softirq to complete the request.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
23/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: blk-mq: Use llist_head for blk_cpu_done
Date: Wed, 28 Oct 2020 11:08:21 +0100
With llist_head it is possible to avoid the locking (the irq-off region)
when items are added. This makes it possible to add items on a remote
CPU.
llist_add() returns true if the list was previously empty. This can be
used to invoke the SMP function call / raise sofirq only if the first
item was added (otherwise it is already pending).
This simplifies the code a little and reduces the IRQ-off regions. With
this change it possible to reduce the SMP-function call a simple
__raise_softirq_irqoff().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
24/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: lib/test_lockup: Minimum fix to get it compiled on PREEMPT_RT
Date: Wed, 28 Oct 2020 18:55:27 +0100
On PREEMPT_RT the locks are quite different so they can't be tested as
it is done below. The alternative is test for the waitlock within
rtmutex.
This is the bare minim to get it compiled. Problems which exists on
PREEMP_RT:
- none of the locks (spinlock_t, rwlock_t, mutex_t, rw_semaphore) may be
acquired with disabled preemption or interrupts.
If I read the code correct the it is possible to acquire a mutex with
disabled interrupts.
I don't know how to obtain a lock pointer. Technically they are not
exported to userland.
- memory can not be allocated with disabled premption or interrupts even
with GFP_ATOMIC.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
25/194 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: refactor kmsg_dump_get_buffer()
Date: Wed, 14 Oct 2020 19:09:15 +0200
kmsg_dump_get_buffer() requires nearly the same logic as
syslog_print_all(), but uses different variable names and
does not make use of the ringbuffer loop macros. Modify
kmsg_dump_get_buffer() so that the implementation is as similar
to syslog_print_all() as possible.
At some point it would be nice to have this code factored into a
helper function. But until then, the code should at least look
similar enough so that it is obvious there is logic duplication
implemented.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
26/194 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: use buffer pools for sprint buffers
Date: Tue, 13 Oct 2020 22:57:55 +0200
vprintk_store() is using a single static buffer as a temporary
sprint buffer for the message text. This will not work once
@logbuf_lock is removed. Replace the single static buffer with
per-cpu and global pools.
Each per-cpu pool is large enough to support a worse case of 2
contexts (non-NMI and NMI).
To support printk() recursion and printk() calls before per-cpu
variables are ready, an extra/fallback global pool of 2 contexts is
available.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
27/194 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: change @clear_seq to atomic64_t
Date: Tue, 13 Oct 2020 23:19:35 +0200
Currently @clear_seq access is protected by @logbuf_lock. Once
@logbuf_lock is removed some other form of synchronization will be
required. Change the type of @clear_seq to atomic64_t to provide the
synchronization.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
28/194 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: remove logbuf_lock, add syslog_lock
Date: Wed, 14 Oct 2020 19:06:12 +0200
Since the ringbuffer is lockless, there is no need for it to be
protected by @logbuf_lock. Remove @logbuf_lock.
This means that printk_nmi_direct and printk_safe_flush_on_panic()
no longer need to acquire any lock to run.
The global variables @syslog_seq, @syslog_partial, @syslog_time
were also protected by @logbuf_lock. Introduce @syslog_lock to
protect these.
@console_seq, @exclusive_console_stop_seq, @console_dropped are
protected by @console_lock.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
29/194 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: remove safe buffers
Date: Wed, 14 Oct 2020 20:00:11 +0200
With @logbuf_lock removed, the high level printk functions for
storing messages are lockless. Messages can be stored from any
context, so there is no need for the NMI and safe buffers anymore.
Remove the NMI and safe buffers. In NMI or safe contexts, store
the message immediately but still use irq_work to defer the console
printing.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
30/194 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: console: add write_atomic interface
Date: Wed, 14 Oct 2020 20:26:35 +0200
Add a write_atomic() callback to the console. This is an optional
function for console drivers. The function must be atomic (including
NMI safe) for writing to the console.
Console drivers must still implement the write() callback. The
write_atomic() callback will only be used in special situations,
such as when the kernel panics.
Creating an NMI safe write_atomic() that must synchronize with
write() requires a careful implementation of the console driver. To
aid with the implementation, a set of console_atomic_*() functions
are provided:
void console_atomic_lock(unsigned int *flags);
void console_atomic_unlock(unsigned int flags);
These functions synchronize using a processor-reentrant spinlock
(called a cpulock).
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
31/194 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: serial: 8250: implement write_atomic
Date: Wed, 14 Oct 2020 20:31:46 +0200
Implement a non-sleeping NMI-safe write_atomic() console function in
order to support emergency console printing.
Since interrupts need to be disabled during transmit, all usage of
the IER register is wrapped with access functions that use the
console_atomic_lock() function to synchronize register access while
tracking the state of the interrupts. This is necessary because
write_atomic() can be called from an NMI context that has preempted
write_atomic().
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
32/194 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: inline log_output(),log_store() in vprintk_store()
Date: Mon, 19 Oct 2020 16:40:26 +0206
In preparation for supporting atomic printing, inline log_output()
and log_store() into vprintk_store(). This allows these
sub-functions to more easily communicate if they have performed
a finalized commit as well as the sequence number of that commit.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
33/194 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: relocate printk_delay() and vprintk_default()
Date: Mon, 19 Oct 2020 21:02:40 +0206
Move printk_delay() and vprintk_default() "as is" further up so that
they can be used by new functions in an upcoming commit.
Signed-off-by: John Ogness <john.ogness@linutornix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
34/194 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: combine boot_delay_msec() into printk_delay()
Date: Mon, 19 Oct 2020 22:11:31 +0206
boot_delay_msec() is always called immediately before printk_delay()
so just combine the two.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
35/194 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: introduce kernel sync mode
Date: Wed, 14 Oct 2020 20:40:05 +0200
When the kernel performs an OOPS, enter into "sync mode":
- only atomic consoles (write_atomic() callback) will print
- printing occurs within vprintk_store() instead of console_unlock()
Change @console_seq to atomic64_t for atomic access.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
36/194 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: move console printing to kthreads
Date: Mon, 19 Oct 2020 22:30:38 +0206
Create a kthread for each console to perform console printing. Now
all console printing is fully asynchronous except for the boot
console and when the kernel enters sync mode (and there are atomic
consoles available).
The console_lock() and console_unlock() functions now only do what
their name says... locking and unlocking of the console.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
37/194 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: remove deferred printing
Date: Mon, 19 Oct 2020 22:53:30 +0206
Since printing occurs either atomically or from the printing
kthread, there is no need for any deferring or tracking possible
recursion paths. Remove all printk context tracking.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
38/194 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: add console handover
Date: Mon, 19 Oct 2020 23:03:44 +0206
If earlyprintk is used, a boot console will print directly to the
console immediately. The boot console will unregister itself as soon
as a non-boot console registers. However, the non-boot console does
not begin printing until its kthread has started. Since this happens
much later, there is a long pause in the console output. If the
ringbuffer is small, messages could even be dropped during the
pause.
Add a new CON_HANDOVER console flag to be used internally by printk
in order to track which non-boot console took over from a boot
console. If handover consoles have implemented write_atomic(), they
are allowed to print directly to the console until their kthread can
take over.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
39/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: printk: Tiny cleanup
Date: Tue, 20 Oct 2020 18:48:16 +0200
- mark functions and variables static which are used only in this file.
- add printf annotation where appropriate
- remove static functions without caller
- add kdb header file for kgdb builds.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
40/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: cgroup: use irqsave in cgroup_rstat_flush_locked()
Date: Tue, 3 Jul 2018 18:19:48 +0200
All callers of cgroup_rstat_flush_locked() acquire cgroup_rstat_lock
either with spin_lock_irq() or spin_lock_irqsave().
cgroup_rstat_flush_locked() itself acquires cgroup_rstat_cpu_lock which
is a raw_spin_lock. This lock is also acquired in cgroup_rstat_updated()
in IRQ context and therefore requires _irqsave() locking suffix in
cgroup_rstat_flush_locked().
Since there is no difference between spin_lock_t and raw_spin_lock_t
on !RT lockdep does not complain here. On RT lockdep complains because
the interrupts were not disabled here and a deadlock is possible.
Acquire the raw_spin_lock_t with disabled interrupts.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
41/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: mm: workingset: replace IRQ-off check with a lockdep assert.
Date: Mon, 11 Feb 2019 10:40:46 +0100
Commit
68d48e6a2df57 ("mm: workingset: add vmstat counter for shadow nodes")
introduced an IRQ-off check to ensure that a lock is held which also
disabled interrupts. This does not work the same way on -RT because none
of the locks, that are held, disable interrupts.
Replace this check with a lockdep assert which ensures that the lock is
held.
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
42/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: tpm: remove tpm_dev_wq_lock
Date: Mon, 11 Feb 2019 11:33:11 +0100
Added in commit
9e1b74a63f776 ("tpm: add support for nonblocking operation")
but never actually used it.
Cc: Philip Tricca <philip.b.tricca@intel.com>
Cc: Tadeusz Struk <tadeusz.struk@intel.com>
Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
43/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: shmem: Use raw_spinlock_t for ->stat_lock
Date: Fri, 14 Aug 2020 18:53:34 +0200
Each CPU has SHMEM_INO_BATCH inodes available in `->ino_batch' which is
per-CPU. Access here is serialized by disabling preemption. If the pool is
empty, it gets reloaded from `->next_ino'. Access here is serialized by
->stat_lock which is a spinlock_t and can not be acquired with disabled
preemption.
One way around it would make per-CPU ino_batch struct containing the inode
number a local_lock_t.
Another sollution is to promote ->stat_lock to a raw_spinlock_t. The critical
sections are short. The mpol_put() should be moved outside of the critical
section to avoid invoking the destrutor with disabled preemption.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
44/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: net: Move lockdep where it belongs
Date: Tue, 8 Sep 2020 07:32:20 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
45/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: tcp: Remove superfluous BH-disable around listening_hash
Date: Mon, 12 Oct 2020 17:33:54 +0200
Commit
9652dc2eb9e40 ("tcp: relax listening_hash operations")
removed the need to disable bottom half while acquiring
listening_hash.lock. There are still two callers left which disable
bottom half before the lock is acquired.
Drop local_bh_disable() around __inet_hash() which acquires
listening_hash->lock, invoke inet_ehash_nolisten() with disabled BH.
inet_unhash() conditionally acquires listening_hash->lock.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
46/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: x86/fpu: Do not disable BH on RT
Date: Mon, 21 Sep 2020 20:15:50 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
47/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: softirq: Add RT variant
Date: Mon, 21 Sep 2020 17:26:19 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
48/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: tick/sched: Prevent false positive softirq pending warnings on RT
Date: Mon, 31 Aug 2020 17:02:36 +0200
On RT a task which has soft interrupts disabled can block on a lock and
schedule out to idle while soft interrupts are pending. This triggers the
warning in the NOHZ idle code which complains about going idle with pending
soft interrupts. But as the task is blocked soft interrupt processing is
temporarily blocked as well which means that such a warning is a false
positive.
To prevent that check the per CPU state which indicates that a scheduled
out task has soft interrupts disabled.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
49/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rcu: Prevent false positive softirq warning on RT
Date: Mon, 31 Aug 2020 17:26:08 +0200
Soft interrupt disabled sections can legitimately be preempted or schedule
out when blocking on a lock on RT enabled kernels so the RCU preempt check
warning has to be disabled for RT kernels.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
50/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: softirq: Replace barrier() with cpu_relax() in tasklet_unlock_wait()
Date: Mon, 31 Aug 2020 15:12:38 +0200
A barrier() in a tight loop which waits for something to happen on a remote
CPU is a pointless exercise. Replace it with cpu_relax() which allows HT
siblings to make progress.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
51/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: tasklets: Avoid cancel/kill deadlock on RT
Date: Mon, 21 Sep 2020 17:47:34 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
52/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: tasklets: Use static line for functions
Date: Mon, 7 Sep 2020 22:57:32 +0200
Inlines exist for a reason.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
53/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locking/rtmutex: Remove cruft
Date: Tue, 29 Sep 2020 15:21:17 +0200
Most of this is around since the very beginning. I'm not sure if this
was used while the rtmutex-deadlock-tester was around but today it seems
to only waste memory:
- save_state: No users
- name: Assigned and printed if a dead lock was detected. I'm keeping it
but want to point out that lockdep has the same information.
- file + line: Printed if ::name was NULL. This is only used for
in-kernel locks so it ::name shouldn't be NULL and then ::file and
::line isn't used.
- magic: Assigned to NULL by rt_mutex_destroy().
Remove members of rt_mutex which are not used.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
54/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locking/rtmutex: Remove output from deadlock detector.
Date: Tue, 29 Sep 2020 16:05:11 +0200
In commit
f5694788ad8da ("rt_mutex: Add lockdep annotations")
rtmutex gained lockdep annotation for rt_mutex_lock() and and related
functions.
lockdep will see the locking order and may complain about a deadlock
before rtmutex' own mechanism gets a chance to detect it.
The rtmutex deadlock detector will only complain locks with the
RT_MUTEX_MIN_CHAINWALK and a waiter must be pending. That means it
works only for in-kernel locks because the futex interface always uses
RT_MUTEX_FULL_CHAINWALK.
The requirement for an active waiter limits the detector to actual
deadlocks and makes it possible to report potential deadlocks like
lockdep does.
It looks like lockdep is better suited for reporting deadlocks.
Remove rtmutex' debug print on deadlock detection.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
55/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locking/rtmutex: Move rt_mutex_init() outside of CONFIG_DEBUG_RT_MUTEXES
Date: Tue, 29 Sep 2020 16:32:49 +0200
rt_mutex_init() only initializes lockdep if CONFIG_DEBUG_RT_MUTEXES is
enabled. The static initializer (DEFINE_RT_MUTEX) does not have such a
restriction.
Move rt_mutex_init() outside of CONFIG_DEBUG_RT_MUTEXES.
Move the remaining functions in this CONFIG_DEBUG_RT_MUTEXES block to
the upper block.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
56/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locking/rtmutex: Remove rt_mutex_timed_lock()
Date: Wed, 7 Oct 2020 12:11:33 +0200
rt_mutex_timed_lock() has no callers since commit
c051b21f71d1f ("rtmutex: Confine deadlock logic to futex")
Remove rt_mutex_timed_lock().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
57/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: locking/rtmutex: Handle the various new futex race conditions
Date: Fri, 10 Jun 2011 11:04:15 +0200
RT opens a few new interesting race conditions in the rtmutex/futex
combo due to futex hash bucket lock being a 'sleeping' spinlock and
therefor not disabling preemption.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
58/194 [
Author: Steven Rostedt
Email: rostedt@goodmis.org
Subject: futex: Fix bug on when a requeued RT task times out
Date: Tue, 14 Jul 2015 14:26:34 +0200
Requeue with timeout causes a bug with PREEMPT_RT.
The bug comes from a timed out condition.
TASK 1 TASK 2
------ ------
futex_wait_requeue_pi()
futex_wait_queue_me()
<timed out>
double_lock_hb();
raw_spin_lock(pi_lock);
if (current->pi_blocked_on) {
} else {
current->pi_blocked_on = PI_WAKE_INPROGRESS;
run_spin_unlock(pi_lock);
spin_lock(hb->lock); <-- blocked!
plist_for_each_entry_safe(this) {
rt_mutex_start_proxy_lock();
task_blocks_on_rt_mutex();
BUG_ON(task->pi_blocked_on)!!!!
The BUG_ON() actually has a check for PI_WAKE_INPROGRESS, but the
problem is that, after TASK 1 sets PI_WAKE_INPROGRESS, it then tries to
grab the hb->lock, which it fails to do so. As the hb->lock is a mutex,
it will block and set the "pi_blocked_on" to the hb->lock.
When TASK 2 goes to requeue it, the check for PI_WAKE_INPROGESS fails
because the task1's pi_blocked_on is no longer set to that, but instead,
set to the hb->lock.
The fix:
When calling rt_mutex_start_proxy_lock() a check is made to see
if the proxy tasks pi_blocked_on is set. If so, exit out early.
Otherwise set it to a new flag PI_REQUEUE_INPROGRESS, which notifies
the proxy task that it is being requeued, and will handle things
appropriately.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
59/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: locking/rtmutex: Make lock_killable work
Date: Sat, 1 Apr 2017 12:50:59 +0200
Locking an rt mutex killable does not work because signal handling is
restricted to TASK_INTERRUPTIBLE.
Use signal_pending_state() unconditionally.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
60/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: locking/spinlock: Split the lock types header
Date: Wed, 29 Jun 2011 19:34:01 +0200
Split raw_spinlock into its own file and the remaining spinlock_t into
its own non-RT header. The non-RT header will be replaced later by sleeping
spinlocks.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
61/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: locking/rtmutex: Avoid include hell
Date: Wed, 29 Jun 2011 20:06:39 +0200
Include only the required raw types. This avoids pulling in the
complete spinlock header which in turn requires rtmutex.h at some point.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
62/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: lockdep: Reduce header files in debug_locks.h
Date: Fri, 14 Aug 2020 16:55:25 +0200
The inclusion of printk.h leads to circular dependency if spinlock_t is
based on rt_mutex.
Include only atomic.h (xchg()) and cache.h (__read_mostly).
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
63/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locking: split out the rbtree definition
Date: Fri, 14 Aug 2020 17:08:41 +0200
rtmutex.h needs the definition for rb_root_cached. By including kernel.h
we will get to spinlock.h which requires rtmutex.h again.
Split out the required struct definition and move it into its own header
file which can be included by rtmutex.h
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
64/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: locking/rtmutex: Provide rt_mutex_slowlock_locked()
Date: Thu, 12 Oct 2017 16:14:22 +0200
This is the inner-part of rt_mutex_slowlock(), required for rwsem-rt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
65/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: locking/rtmutex: export lockdep-less version of rt_mutex's lock, trylock and unlock
Date: Thu, 12 Oct 2017 16:36:39 +0200
Required for lock implementation ontop of rtmutex.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
66/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Add saved_state for tasks blocked on sleeping locks
Date: Sat, 25 Jun 2011 09:21:04 +0200
Spinlocks are state preserving in !RT. RT changes the state when a
task gets blocked on a lock. So we need to remember the state before
the lock contention. If a regular wakeup (not a RTmutex related
wakeup) happens, the saved_state is updated to running. When the lock
sleep is done, the saved state is restored.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
67/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: locking/rtmutex: add sleeping lock implementation
Date: Thu, 12 Oct 2017 17:11:19 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
68/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locking/rtmutex: Allow rt_mutex_trylock() on PREEMPT_RT
Date: Wed, 2 Dec 2015 11:34:07 +0100
Non PREEMPT_RT kernel can deadlock on rt_mutex_trylock() in softirq
context.
On PREEMPT_RT the softirq context is handled in thread context. This
avoids the deadlock in the slow path and PI-boosting will be done on the
correct thread.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
69/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: locking/rtmutex: add mutex implementation based on rtmutex
Date: Thu, 12 Oct 2017 17:17:03 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
70/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: locking/rtmutex: add rwsem implementation based on rtmutex
Date: Thu, 12 Oct 2017 17:28:34 +0200
The RT specific R/W semaphore implementation restricts the number of readers
to one because a writer cannot block on multiple readers and inherit its
priority or budget.
The single reader restricting is painful in various ways:
- Performance bottleneck for multi-threaded applications in the page fault
path (mmap sem)
- Progress blocker for drivers which are carefully crafted to avoid the
potential reader/writer deadlock in mainline.
The analysis of the writer code paths shows, that properly written RT tasks
should not take them. Syscalls like mmap(), file access which take mmap sem
write locked have unbound latencies which are completely unrelated to mmap
sem. Other R/W sem users like graphics drivers are not suitable for RT tasks
either.
So there is little risk to hurt RT tasks when the RT rwsem implementation is
changed in the following way:
- Allow concurrent readers
- Make writers block until the last reader left the critical section. This
blocking is not subject to priority/budget inheritance.
- Readers blocked on a writer inherit their priority/budget in the normal
way.
There is a drawback with this scheme. R/W semaphores become writer unfair
though the applications which have triggered writer starvation (mostly on
mmap_sem) in the past are not really the typical workloads running on a RT
system. So while it's unlikely to hit writer starvation, it's possible. If
there are unexpected workloads on RT systems triggering it, we need to rethink
the approach.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
71/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: locking/rtmutex: add rwlock implementation based on rtmutex
Date: Thu, 12 Oct 2017 17:18:06 +0200
The implementation is bias-based, similar to the rwsem implementation.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
72/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: locking/rtmutex: wire up RT's locking
Date: Thu, 12 Oct 2017 17:31:14 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
73/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locking/rtmutex: add ww_mutex addon for mutex-rt
Date: Thu, 12 Oct 2017 17:34:38 +0200
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
74/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locking/rtmutex: Use custom scheduling function for spin-schedule()
Date: Tue, 6 Oct 2020 13:07:17 +0200
PREEMPT_RT builds the rwsem, mutex, spinlock and rwlock typed locks on
top of a rtmutex lock. While blocked task->pi_blocked_on is set
(tsk_is_pi_blocked()) and task needs to schedule away while waiting.
The schedule process must distinguish between blocking on a regular
sleeping lock (rwsem and mutex) and a RT-only sleeping lock (spinlock
and rwlock):
- rwsem and mutex must flush block requests (blk_schedule_flush_plug())
even if blocked on a lock. This can not deadlock because this also
happens for non-RT.
There should be a warning if the scheduling point is within a RCU read
section.
- spinlock and rwlock must not flush block requests. This will deadlock
if the callback attempts to acquire a lock which is already acquired.
Similarly to being preempted, there should be no warning if the
scheduling point is within a RCU read section.
Add preempt_schedule_lock() which is invoked if scheduling is required
while blocking on a PREEMPT_RT-only sleeping lock.
Remove tsk_is_pi_blocked() from the scheduler path which is no longer
needed with the additional scheduler entry point.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
75/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: signal: Revert ptrace preempt magic
Date: Wed, 21 Sep 2011 19:57:12 +0200
Upstream commit '53da1d9456fe7f8 fix ptrace slowness' is nothing more
than a bandaid around the ptrace design trainwreck. It's not a
correctness issue, it's merily a cosmetic bandaid.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
76/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: preempt: Provide preempt_*_(no)rt variants
Date: Fri, 24 Jul 2009 12:38:56 +0200
RT needs a few preempt_disable/enable points which are not necessary
otherwise. Implement variants to avoid #ifdeffery.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
77/194 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: mm/vmstat: Protect per cpu variables with preempt disable on RT
Date: Fri, 3 Jul 2009 08:30:13 -0500
Disable preemption on -RT for the vmstat code. On vanila the code runs in
IRQ-off regions while on -RT it is not. "preempt_disable" ensures that the
same ressources is not updated in parallel due to preemption.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
78/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: mm/memcontrol: Disable preemption in __mod_memcg_lruvec_state()
Date: Wed, 28 Oct 2020 18:15:32 +0100
The callers expect disabled preemption/interrupts while invoking
__mod_memcg_lruvec_state(). This works mainline because a lock of
somekind is acquired.
Use preempt_disable_rt() where per-CPU variables are accessed and a
stable pointer is expected. This is also done in __mod_zone_page_state()
for the same reason.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
79/194 [
Author: Ahmed S. Darwish
Email: a.darwish@linutronix.de
Subject: xfrm: Use sequence counter with associated spinlock
Date: Wed, 10 Jun 2020 12:53:22 +0200
A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.
Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.
If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.
Upstream-status: The xfrm locking used for seqcoun writer serialization
appears to be broken. If that's the case, a proper fix will need to be
submitted upstream. (e.g. make the seqcount per network namespace?)
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
80/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: u64_stats: Disable preemption on 32bit-UP/SMP with RT during updates
Date: Mon, 17 Aug 2020 12:28:10 +0200
On RT the seqcount_t is required even on UP because the softirq can be
preempted. The IRQ handler is threaded so it is also preemptible.
Disable preemption on 32bit-RT during value updates. There is no need to
disable interrupts on RT because the handler is run threaded. Therefore
disabling preemption is enough to guarantee that the update is not
interruped.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
81/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: fs/dcache: use swait_queue instead of waitqueue
Date: Wed, 14 Sep 2016 14:35:49 +0200
__d_lookup_done() invokes wake_up_all() while holding a hlist_bl_lock()
which disables preemption. As a workaround convert it to swait.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
82/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: fs/dcache: disable preemption on i_dir_seq's write side
Date: Fri, 20 Oct 2017 11:29:53 +0200
i_dir_seq is an opencoded seqcounter. Based on the code it looks like we
could have two writers in parallel despite the fact that the d_lock is
held. The problem is that during the write process on RT the preemption
is still enabled and if this process is interrupted by a reader with RT
priority then we lock up.
To avoid that lock up I am disabling the preemption during the update.
The rename of i_dir_seq is here to ensure to catch new write sides in
future.
Cc: stable-rt@vger.kernel.org
Reported-by: Oleg.Karfich@wago.com
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
83/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net/Qdisc: use a seqlock instead seqcount
Date: Wed, 14 Sep 2016 17:36:35 +0200
The seqcount disables preemption on -RT while it is held which can't
remove. Also we don't want the reader to spin for ages if the writer is
scheduled out. The seqlock on the other hand will serialize / sleep on
the lock while writer is active.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
84/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net: Properly annotate the try-lock for the seqlock
Date: Tue, 8 Sep 2020 16:57:11 +0200
In patch
("net/Qdisc: use a seqlock instead seqcount")
the seqcount has been replaced with a seqlock to allow to reader to
boost the preempted writer.
The try_write_seqlock() acquired the lock with a try-lock but the
seqcount annotation was "lock".
Opencode write_seqcount_t_begin() and use the try-lock annotation for
lockdep.
Reported-by: Mike Galbraith <efault@gmx.de>
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
85/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: kconfig: Disable config options which are not RT compatible
Date: Sun, 24 Jul 2011 12:11:43 +0200
Disable stuff which is known to have issues on RT
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
86/194 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: mm: Allow only SLUB on RT
Date: Fri, 3 Jul 2009 08:44:03 -0500
Memory allocation disables interrupts as part of the allocation and freeing
process. For -RT it is important that this section remain short and don't
depend on the size of the request or an internal state of the memory allocator.
At the beginning the SLAB memory allocator was adopted for RT's needs and it
required substantial changes. Later, with the addition of the SLUB memory
allocator we adopted this one as well and the changes were smaller. More
important, due to the design of the SLUB allocator it performs better and its
worst case latency was smaller. In the end only SLUB remained supported.
Disable SLAB and SLOB on -RT. Only SLUB is adopted to -RT needs.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
87/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: rcu: make RCU_BOOST default on RT
Date: Fri, 21 Mar 2014 20:19:05 +0100
Since it is no longer invoked from the softirq people run into OOM more
often if the priority of the RCU thread is too low. Making boosting
default on RT should help in those case and it can be switched off if
someone knows better.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
88/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Disable CONFIG_RT_GROUP_SCHED on RT
Date: Mon, 18 Jul 2011 17:03:52 +0200
Carsten reported problems when running:
taskset 01 chrt -f 1 sleep 1
from within rc.local on a F15 machine. The task stays running and
never gets on the run queue because some of the run queues have
rt_throttled=1 which does not go away. Works nice from a ssh login
shell. Disabling CONFIG_RT_GROUP_SCHED solves that as well.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
89/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net/core: disable NET_RX_BUSY_POLL on RT
Date: Sat, 27 May 2017 19:02:06 +0200
napi_busy_loop() disables preemption and performs a NAPI poll. We can't acquire
sleeping locks with disabled preemption so we would have to work around this
and add explicit locking for synchronisation against ksoftirqd.
Without explicit synchronisation a low priority process would "own" the NAPI
state (by setting NAPIF_STATE_SCHED) and could be scheduled out (no
preempt_disable() and BH is preemptible on RT).
In case a network packages arrives then the interrupt handler would set
NAPIF_STATE_MISSED and the system would wait until the task owning the NAPI
would be scheduled in again.
Should a task with RT priority busy poll then it would consume the CPU instead
allowing tasks with lower priority to run.
The NET_RX_BUSY_POLL is disabled by default (the system wide sysctls for
poll/read are set to zero) so disable NET_RX_BUSY_POLL on RT to avoid wrong
locking context on RT. Should this feature be considered useful on RT systems
then it could be enabled again with proper locking and synchronisation.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
90/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: efi: Disable runtime services on RT
Date: Thu, 26 Jul 2018 15:03:16 +0200
Based on meassurements the EFI functions get_variable /
get_next_variable take up to 2us which looks okay.
The functions get_time, set_time take around 10ms. Those 10ms are too
much. Even one ms would be too much.
Ard mentioned that SetVariable might even trigger larger latencies if
the firware will erase flash blocks on NOR.
The time-functions are used by efi-rtc and can be triggered during
runtimed (either via explicit read/write or ntp sync).
The variable write could be used by pstore.
These functions can be disabled without much of a loss. The poweroff /
reboot hooks may be provided by PSCI.
Disable EFI's runtime wrappers.
This was observed on "EFI v2.60 by SoftIron Overdrive 1000".
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
91/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: efi: Allow efi=runtime
Date: Thu, 26 Jul 2018 15:06:10 +0200
In case the command line option "efi=noruntime" is default at built-time, the user
could overwrite its state by `efi=runtime' and allow it again.
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
92/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rt: Add local irq locks
Date: Mon, 20 Jun 2011 09:03:47 +0200
Introduce locallock. For !RT this maps to preempt_disable()/
local_irq_disable() so there is not much that changes. For RT this will
map to a spinlock. This makes preemption possible and locked "ressource"
gets the lockdep anotation it wouldn't have otherwise. The locks are
recursive for owner == current. Also, all locks user migrate_disable()
which ensures that the task is not migrated to another CPU while the lock
is held and the owner is preempted.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
93/194 [
Author: Oleg Nesterov
Email: oleg@redhat.com
Subject: signal/x86: Delay calling signals in atomic
Date: Tue, 14 Jul 2015 14:26:34 +0200
On x86_64 we must disable preemption before we enable interrupts
for stack faults, int3 and debugging, because the current task is using
a per CPU debug stack defined by the IST. If we schedule out, another task
can come in and use the same stack and cause the stack to be corrupted
and crash the kernel on return.
When CONFIG_PREEMPT_RT is enabled, spin_locks become mutexes, and
one of these is the spin lock used in signal handling.
Some of the debug code (int3) causes do_trap() to send a signal.
This function calls a spin lock that has been converted to a mutex
and has the possibility to sleep. If this happens, the above issues with
the corrupted stack is possible.
Instead of calling the signal right away, for PREEMPT_RT and x86_64,
the signal information is stored on the stacks task_struct and
TIF_NOTIFY_RESUME is set. Then on exit of the trap, the signal resume
code will send the signal when preemption is enabled.
[ rostedt: Switched from #ifdef CONFIG_PREEMPT_RT to
ARCH_RT_DELAYS_SIGNAL_SEND and added comments to the code. ]
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy: also needed on 32bit as per Yang Shi <yang.shi@linaro.org>]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
94/194 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: Split IRQ-off and zone->lock while freeing pages from PCP list #2
Date: Mon, 28 May 2018 15:24:21 +0200
Split the IRQ-off section while accessing the PCP list from zone->lock
while freeing pages.
Introcude isolate_pcp_pages() which separates the pages from the PCP
list onto a temporary list and then free the temporary list via
free_pcppages_bulk().
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
95/194 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: Split IRQ-off and zone->lock while freeing pages from PCP list #2
Date: Mon, 28 May 2018 15:24:21 +0200
Split the IRQ-off section while accessing the PCP list from zone->lock
while freeing pages.
Introcude isolate_pcp_pages() which separates the pages from the PCP
list onto a temporary list and then free the temporary list via
free_pcppages_bulk().
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
96/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mm/SLxB: change list_lock to raw_spinlock_t
Date: Mon, 28 May 2018 15:24:22 +0200
The list_lock is used with used with IRQs off on RT. Make it a raw_spinlock_t
otherwise the interrupts won't be disabled on -RT. The locking rules remain
the same on !RT.
This patch changes it for SLAB and SLUB since both share the same header
file for struct kmem_cache_node defintion.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
97/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mm/SLUB: delay giving back empty slubs to IRQ enabled regions
Date: Thu, 21 Jun 2018 17:29:19 +0200
__free_slab() is invoked with disabled interrupts which increases the
irq-off time while __free_pages() is doing the work.
Allow __free_slab() to be invoked with enabled interrupts and move
everything from interrupts-off invocations to a temporary per-CPU list
so it can be processed later.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
98/194 [
Author: Kevin Hao
Email: haokexin@gmail.com
Subject: mm: slub: Always flush the delayed empty slubs in flush_all()
Date: Mon, 4 May 2020 11:34:07 +0800
After commit f0b231101c94 ("mm/SLUB: delay giving back empty slubs to
IRQ enabled regions"), when the free_slab() is invoked with the IRQ
disabled, the empty slubs are moved to a per-CPU list and will be
freed after IRQ enabled later. But in the current codes, there is
a check to see if there really has the cpu slub on a specific cpu
before flushing the delayed empty slubs, this may cause a reference
of already released kmem_cache in a scenario like below:
cpu 0 cpu 1
kmem_cache_destroy()
flush_all()
--->IPI flush_cpu_slab()
flush_slab()
deactivate_slab()
discard_slab()
free_slab()
c->page = NULL;
for_each_online_cpu(cpu)
if (!has_cpu_slab(1, s))
continue
this skip to flush the delayed
empty slub released by cpu1
kmem_cache_free(kmem_cache, s)
kmalloc()
__slab_alloc()
free_delayed()
__free_slab()
reference to released kmem_cache
Fixes: f0b231101c94 ("mm/SLUB: delay giving back empty slubs to IRQ enabled regions")
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: stable-rt@vger.kernel.org
]
99/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: mm/page_alloc: Use migrate_disable() in drain_local_pages_wq()
Date: Thu, 2 Jul 2020 14:27:23 +0200
drain_local_pages_wq() disables preemption to avoid CPU migration during
CPU hotplug.
Using migrate_disable() makes the function preemptible on PREEMPT_RT but
still avoids CPU migrations during CPU-hotplug. On !PREEMPT_RT it
behaves like preempt_disable().
Use migrate_disable() in drain_local_pages_wq().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
100/194 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: mm: page_alloc: rt-friendly per-cpu pages
Date: Fri, 3 Jul 2009 08:29:37 -0500
rt-friendly per-cpu pages: convert the irqs-off per-cpu locking
method into a preemptible, explicit-per-cpu-locks method.
Contains fixes from:
Peter Zijlstra <a.p.zijlstra@chello.nl>
Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
101/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: mm/slub: Make object_map_lock a raw_spinlock_t
Date: Thu, 16 Jul 2020 18:47:50 +0200
The variable object_map is protected by object_map_lock. The lock is always
acquired in debug code and within already atomic context
Make object_map_lock a raw_spinlock_t.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
102/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: slub: Enable irqs for __GFP_WAIT
Date: Wed, 9 Jan 2013 12:08:15 +0100
SYSTEM_RUNNING might be too late for enabling interrupts. Allocations
with GFP_WAIT can happen before that. So use this as an indicator.
[bigeasy: Add warning on RT for allocations in atomic context.
Don't enable interrupts on allocations during SYSTEM_SUSPEND. This is done
during suspend by ACPI, noticed by Liwei Song <liwei.song@windriver.com>
]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
103/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: slub: Disable SLUB_CPU_PARTIAL
Date: Wed, 15 Apr 2015 19:00:47 +0200
|BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:915
|in_atomic(): 1, irqs_disabled(): 0, pid: 87, name: rcuop/7
|1 lock held by rcuop/7/87:
| #0: (rcu_callback){......}, at: [<ffffffff8112c76a>] rcu_nocb_kthread+0x1ca/0x5d0
|Preemption disabled at:[<ffffffff811eebd9>] put_cpu_partial+0x29/0x220
|
|CPU: 0 PID: 87 Comm: rcuop/7 Tainted: G W 4.0.0-rt0+ #477
|Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
| 000000000007a9fc ffff88013987baf8 ffffffff817441c7 0000000000000007
| 0000000000000000 ffff88013987bb18 ffffffff810eee51 0000000000000000
| ffff88013fc10200 ffff88013987bb48 ffffffff8174a1c4 000000000007a9fc
|Call Trace:
| [<ffffffff817441c7>] dump_stack+0x4f/0x90
| [<ffffffff810eee51>] ___might_sleep+0x121/0x1b0
| [<ffffffff8174a1c4>] rt_spin_lock+0x24/0x60
| [<ffffffff811a689a>] __free_pages_ok+0xaa/0x540
| [<ffffffff811a729d>] __free_pages+0x1d/0x30
| [<ffffffff811eddd5>] __free_slab+0xc5/0x1e0
| [<ffffffff811edf46>] free_delayed+0x56/0x70
| [<ffffffff811eecfd>] put_cpu_partial+0x14d/0x220
| [<ffffffff811efc98>] __slab_free+0x158/0x2c0
| [<ffffffff811f0021>] kmem_cache_free+0x221/0x2d0
| [<ffffffff81204d0c>] file_free_rcu+0x2c/0x40
| [<ffffffff8112c7e3>] rcu_nocb_kthread+0x243/0x5d0
| [<ffffffff810e951c>] kthread+0xfc/0x120
| [<ffffffff8174abc8>] ret_from_fork+0x58/0x90
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
104/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: mm: memcontrol: Provide a local_lock for per-CPU memcg_stock
Date: Tue, 18 Aug 2020 10:30:00 +0200
The interrupts are disabled to ensure CPU-local access to the per-CPU
variable `memcg_stock'.
As the code inside the interrupt disabled section acquires regular
spinlocks, which are converted to 'sleeping' spinlocks on a PREEMPT_RT
kernel, this conflicts with the RT semantics.
Convert it to a local_lock which allows RT kernels to substitute them with
a real per CPU lock. On non RT kernels this maps to local_irq_save() as
before, but provides also lockdep coverage of the critical region.
No functional change.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
105/194 [
Author: Yang Shi
Email: yang.shi@windriver.com
Subject: mm/memcontrol: Don't call schedule_work_on in preemption disabled context
Date: Wed, 30 Oct 2013 11:48:33 -0700
The following trace is triggered when running ltp oom test cases:
BUG: sleeping function called from invalid context at kernel/rtmutex.c:659
in_atomic(): 1, irqs_disabled(): 0, pid: 17188, name: oom03
Preemption disabled at:[<ffffffff8112ba70>] mem_cgroup_reclaim+0x90/0xe0
CPU: 2 PID: 17188 Comm: oom03 Not tainted 3.10.10-rt3 #2
Hardware name: Intel Corporation Calpella platform/MATXM-CORE-411-B, BIOS 4.6.3 08/18/2010
ffff88007684d730 ffff880070df9b58 ffffffff8169918d ffff880070df9b70
ffffffff8106db31 ffff88007688b4a0 ffff880070df9b88 ffffffff8169d9c0
ffff88007688b4a0 ffff880070df9bc8 ffffffff81059da1 0000000170df9bb0
Call Trace:
[<ffffffff8169918d>] dump_stack+0x19/0x1b
[<ffffffff8106db31>] __might_sleep+0xf1/0x170
[<ffffffff8169d9c0>] rt_spin_lock+0x20/0x50
[<ffffffff81059da1>] queue_work_on+0x61/0x100
[<ffffffff8112b361>] drain_all_stock+0xe1/0x1c0
[<ffffffff8112ba70>] mem_cgroup_reclaim+0x90/0xe0
[<ffffffff8112beda>] __mem_cgroup_try_charge+0x41a/0xc40
[<ffffffff810f1c91>] ? release_pages+0x1b1/0x1f0
[<ffffffff8106f200>] ? sched_exec+0x40/0xb0
[<ffffffff8112cc87>] mem_cgroup_charge_common+0x37/0x70
[<ffffffff8112e2c6>] mem_cgroup_newpage_charge+0x26/0x30
[<ffffffff8110af68>] handle_pte_fault+0x618/0x840
[<ffffffff8103ecf6>] ? unpin_current_cpu+0x16/0x70
[<ffffffff81070f94>] ? migrate_enable+0xd4/0x200
[<ffffffff8110cde5>] handle_mm_fault+0x145/0x1e0
[<ffffffff810301e1>] __do_page_fault+0x1a1/0x4c0
[<ffffffff8169c9eb>] ? preempt_schedule_irq+0x4b/0x70
[<ffffffff8169e3b7>] ? retint_kernel+0x37/0x40
[<ffffffff8103053e>] do_page_fault+0xe/0x10
[<ffffffff8169e4c2>] page_fault+0x22/0x30
So, to prevent schedule_work_on from being called in preempt disabled context,
replace the pair of get/put_cpu() to get/put_cpu_light().
Signed-off-by: Yang Shi <yang.shi@windriver.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
106/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: mm/memcontrol: Replace local_irq_disable with local locks
Date: Wed, 28 Jan 2015 17:14:16 +0100
There are a few local_irq_disable() which then take sleeping locks. This
patch converts them local locks.
[bigeasy: Move unlock after memcg_check_events() in mem_cgroup_swapout(),
pointed out by Matt Fleming <matt@codeblueprint.co.uk>]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
107/194 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: mm/zsmalloc: copy with get_cpu_var() and locking
Date: Tue, 22 Mar 2016 11:16:09 +0100
get_cpu_var() disables preemption and triggers a might_sleep() splat later.
This is replaced with get_locked_var().
This bitspinlocks are replaced with a proper mutex which requires a slightly
larger struct to allocate.
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
[bigeasy: replace the bitspin_lock() with a mutex, get_locked_var(). Mike then
fixed the size magic]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
108/194 [
Author: Luis Claudio R. Goncalves
Email: lgoncalv@redhat.com
Subject: mm/zswap: Use local lock to protect per-CPU data
Date: Tue, 25 Jun 2019 11:28:04 -0300
zwap uses per-CPU compression. The per-CPU data pointer is acquired with
get_cpu_ptr() which implicitly disables preemption. It allocates
memory inside the preempt disabled region which conflicts with the
PREEMPT_RT semantics.
Replace the implicit preemption control with an explicit local lock.
This allows RT kernels to substitute it with a real per CPU lock, which
serializes the access but keeps the code section preemptible. On non RT
kernels this maps to preempt_disable() as before, i.e. no functional
change.
[bigeasy: Use local_lock(), additional hunks, patch description]
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
109/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: x86: kvm Require const tsc for RT
Date: Sun, 6 Nov 2011 12:26:18 +0100
Non constant TSC is a nightmare on bare metal already, but with
virtualization it becomes a complete disaster because the workarounds
are horrible latency wise. That's also a preliminary for running RT in
a guest on top of a RT host.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
110/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: wait.h: include atomic.h
Date: Mon, 28 Oct 2013 12:19:57 +0100
| CC init/main.o
|In file included from include/linux/mmzone.h:9:0,
| from include/linux/gfp.h:4,
| from include/linux/kmod.h:22,
| from include/linux/module.h:13,
| from init/main.c:15:
|include/linux/wait.h: In function ‘wait_on_atomic_t’:
|include/linux/wait.h:982:2: error: implicit declaration of function ‘atomic_read’ [-Werror=implicit-function-declaration]
| if (atomic_read(val) == 0)
| ^
This pops up on ARM. Non-RT gets its atomic.h include from spinlock.h
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
111/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: hrtimer: Allow raw wakeups during boot
Date: Fri, 9 Aug 2019 15:25:21 +0200
There are a few wake-up timers during the early boot which are essencial for
the system to make progress. At this stage there are no softirq spawn for the
softirq processing so there is no timer processing in softirq.
The wakeup in question:
smpboot_create_thread()
-> kthread_create_on_cpu()
-> kthread_bind()
-> wait_task_inactive()
-> schedule_hrtimeout()
Let the timer fire in hardirq context during the system boot.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
112/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Limit the number of task migrations per batch
Date: Mon, 6 Jun 2011 12:12:51 +0200
Put an upper limit on the number of tasks which are migrated per batch
to avoid large latencies.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
113/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Move mmdrop to RCU on RT
Date: Mon, 6 Jun 2011 12:20:33 +0200
Takes sleeping locks and calls into the memory allocator, so nothing
we want to do in task switch and oder atomic contexts.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
114/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: kernel/sched: move stack + kprobe clean up to __put_task_struct()
Date: Mon, 21 Nov 2016 19:31:08 +0100
There is no need to free the stack before the task struct (except for reasons
mentioned in commit 68f24b08ee89 ("sched/core: Free the stack early if
CONFIG_THREAD_INFO_IN_TASK")). This also comes handy on -RT because we can't
free memory in preempt disabled region.
vfree_atomic() delays the memory cleanup to a worker. Since we move everything
to the RCU callback, we can also free it immediately.
Cc: stable-rt@vger.kernel.org #for kprobe_flush_task()
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
115/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Do not account rcu_preempt_depth on RT in might_sleep()
Date: Tue, 7 Jun 2011 09:19:06 +0200
RT changes the rcu_preempt_depth semantics, so we cannot check for it
in might_sleep().
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
116/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Disable TTWU_QUEUE on RT
Date: Tue, 13 Sep 2011 16:42:35 +0200
The queued remote wakeup mechanism can introduce rather large
latencies if the number of migrated tasks is high. Disable it for RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
117/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: softirq: Check preemption after reenabling interrupts
Date: Sun, 13 Nov 2011 17:17:09 +0100
raise_softirq_irqoff() disables interrupts and wakes the softirq
daemon, but after reenabling interrupts there is no preemption check,
so the execution of the softirq thread might be delayed arbitrarily.
In principle we could add that check to local_irq_enable/restore, but
that's overkill as the rasie_softirq_irqoff() sections are the only
ones which show this behaviour.
Reported-by: Carsten Emde <cbe@osadl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
118/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: softirq: Disable softirq stacks for RT
Date: Mon, 18 Jul 2011 13:59:17 +0200
Disable extra stacks for softirqs. We want to preempt softirqs and
having them on special IRQ-stack does not make this easier.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
119/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net/core: use local_bh_disable() in netif_rx_ni()
Date: Fri, 16 Jun 2017 19:03:16 +0200
In 2004 netif_rx_ni() gained a preempt_disable() section around
netif_rx() and its do_softirq() + testing for it. The do_softirq() part
is required because netif_rx() raises the softirq but does not invoke
it. The preempt_disable() is required to remain on the same CPU which added the
skb to the per-CPU list.
All this can be avoided be putting this into a local_bh_disable()ed
section. The local_bh_enable() part will invoke do_softirq() if
required.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
120/194 [
Author: Grygorii Strashko
Email: Grygorii.Strashko@linaro.org
Subject: pid.h: include atomic.h
Date: Tue, 21 Jul 2015 19:43:56 +0300
This patch fixes build error:
CC kernel/pid_namespace.o
In file included from kernel/pid_namespace.c:11:0:
include/linux/pid.h: In function 'get_pid':
include/linux/pid.h:78:3: error: implicit declaration of function 'atomic_inc' [-Werror=implicit-function-declaration]
atomic_inc(&pid->count);
^
which happens when
CONFIG_PROVE_LOCKING=n
CONFIG_DEBUG_SPINLOCK=n
CONFIG_DEBUG_MUTEXES=n
CONFIG_DEBUG_LOCK_ALLOC=n
CONFIG_PID_NS=y
Vanilla gets this via spinlock.h.
Signed-off-by: Grygorii Strashko <Grygorii.Strashko@linaro.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
121/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: ptrace: fix ptrace vs tasklist_lock race
Date: Thu, 29 Aug 2013 18:21:04 +0200
As explained by Alexander Fyodorov <halcy@yandex.ru>:
|read_lock(&tasklist_lock) in ptrace_stop() is converted to mutex on RT kernel,
|and it can remove __TASK_TRACED from task->state (by moving it to
|task->saved_state). If parent does wait() on child followed by a sys_ptrace
|call, the following race can happen:
|
|- child sets __TASK_TRACED in ptrace_stop()
|- parent does wait() which eventually calls wait_task_stopped() and returns
| child's pid
|- child blocks on read_lock(&tasklist_lock) in ptrace_stop() and moves
| __TASK_TRACED flag to saved_state
|- parent calls sys_ptrace, which calls ptrace_check_attach() and wait_task_inactive()
The patch is based on his initial patch where an additional check is
added in case the __TASK_TRACED moved to ->saved_state. The pi_lock is
taken in case the caller is interrupted between looking into ->state and
->saved_state.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
122/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: kernel/sched: add {put|get}_cpu_light()
Date: Sat, 27 May 2017 19:02:06 +0200
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
123/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: trace: Add migrate-disabled counter to tracing output
Date: Sun, 17 Jul 2011 21:56:42 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
124/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locking: don't check for __LINUX_SPINLOCK_TYPES_H on -RT archs
Date: Fri, 4 Aug 2017 17:40:42 +0200
Upstream uses arch_spinlock_t within spinlock_t and requests that
spinlock_types.h header file is included first.
On -RT we have the rt_mutex with its raw_lock wait_lock which needs
architectures' spinlock_types.h header file for its definition. However
we need rt_mutex first because it is used to build the spinlock_t so
that check does not work for us.
Therefore I am dropping that check.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
125/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locking: Make spinlock_t and rwlock_t a RCU section on RT
Date: Tue, 19 Nov 2019 09:25:04 +0100
On !RT a locked spinlock_t and rwlock_t disables preemption which
implies a RCU read section. There is code that relies on that behaviour.
Add an explicit RCU read section on RT while a sleeping lock (a lock
which would disables preemption on !RT) acquired.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
126/194 [
Author: Scott Wood
Email: swood@redhat.com
Subject: rcu: Use rcuc threads on PREEMPT_RT as we did
Date: Wed, 11 Sep 2019 17:57:28 +0100
While switching to the reworked RCU-thread code, it has been forgotten
to enable the thread processing on -RT.
Besides restoring behavior that used to be default on RT, this avoids
a deadlock on scheduler locks.
Signed-off-by: Scott Wood <swood@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
127/194 [
Author: Julia Cartwright
Email: julia@ni.com
Subject: rcu: enable rcu_normal_after_boot by default for RT
Date: Wed, 12 Oct 2016 11:21:14 -0500
The forcing of an expedited grace period is an expensive and very
RT-application unfriendly operation, as it forcibly preempts all running
tasks on CPUs which are preventing the gp from expiring.
By default, as a policy decision, disable the expediting of grace
periods (after boot) on configurations which enable PREEMPT_RT.
Suggested-by: Luiz Capitulino <lcapitulino@redhat.com>
Acked-by: Paul E. McKenney <paulmck@linux.ibm.com>
Signed-off-by: Julia Cartwright <julia@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
128/194 [
Author: Scott Wood
Email: swood@redhat.com
Subject: rcutorture: Avoid problematic critical section nesting on RT
Date: Wed, 11 Sep 2019 17:57:29 +0100
rcutorture was generating some nesting scenarios that are not
reasonable. Constrain the state selection to avoid them.
Example #1:
1. preempt_disable()
2. local_bh_disable()
3. preempt_enable()
4. local_bh_enable()
On PREEMPT_RT, BH disabling takes a local lock only when called in
non-atomic context. Thus, atomic context must be retained until after BH
is re-enabled. Likewise, if BH is initially disabled in non-atomic
context, it cannot be re-enabled in atomic context.
Example #2:
1. rcu_read_lock()
2. local_irq_disable()
3. rcu_read_unlock()
4. local_irq_enable()
If the thread is preempted between steps 1 and 2,
rcu_read_unlock_special.b.blocked will be set, but it won't be
acted on in step 3 because IRQs are disabled. Thus, reporting of the
quiescent state will be delayed beyond the local_irq_enable().
For now, these scenarios will continue to be tested on non-PREEMPT_RT
kernels, until debug checks are added to ensure that they are not
happening elsewhere.
Signed-off-by: Scott Wood <swood@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
129/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mm/vmalloc: Another preempt disable region which sucks
Date: Tue, 12 Jul 2011 11:39:36 +0200
Avoid the preempt disable version of get_cpu_var(). The inner-lock should
provide enough serialisation.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
130/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: block/mq: do not invoke preempt_disable()
Date: Tue, 14 Jul 2015 14:26:34 +0200
preempt_disable() and get_cpu() don't play well together with the sleeping
locks it tries to allocate later.
It seems to be enough to replace it with get_cpu_light() and migrate_disable().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
131/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: md: raid5: Make raid5_percpu handling RT aware
Date: Tue, 6 Apr 2010 16:51:31 +0200
__raid_run_ops() disables preemption with get_cpu() around the access
to the raid5_percpu variables. That causes scheduling while atomic
spews on RT.
Serialize the access to the percpu data with a lock and keep the code
preemptible.
Reported-by: Udo van den Heuvel <udovdh@xs4all.nl>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Udo van den Heuvel <udovdh@xs4all.nl>
]
132/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: scsi/fcoe: Make RT aware.
Date: Sat, 12 Nov 2011 14:00:48 +0100
Do not disable preemption while taking sleeping locks. All user look safe
for migrate_diable() only.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
133/194 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: sunrpc: Make svc_xprt_do_enqueue() use get_cpu_light()
Date: Wed, 18 Feb 2015 16:05:28 +0100
|BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:915
|in_atomic(): 1, irqs_disabled(): 0, pid: 3194, name: rpc.nfsd
|Preemption disabled at:[<ffffffffa06bf0bb>] svc_xprt_received+0x4b/0xc0 [sunrpc]
|CPU: 6 PID: 3194 Comm: rpc.nfsd Not tainted 3.18.7-rt1 #9
|Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.404 11/06/2014
| ffff880409630000 ffff8800d9a33c78 ffffffff815bdeb5 0000000000000002
| 0000000000000000 ffff8800d9a33c98 ffffffff81073c86 ffff880408dd6008
| ffff880408dd6000 ffff8800d9a33cb8 ffffffff815c3d84 ffff88040b3ac000
|Call Trace:
| [<ffffffff815bdeb5>] dump_stack+0x4f/0x9e
| [<ffffffff81073c86>] __might_sleep+0xe6/0x150
| [<ffffffff815c3d84>] rt_spin_lock+0x24/0x50
| [<ffffffffa06beec0>] svc_xprt_do_enqueue+0x80/0x230 [sunrpc]
| [<ffffffffa06bf0bb>] svc_xprt_received+0x4b/0xc0 [sunrpc]
| [<ffffffffa06c03ed>] svc_add_new_perm_xprt+0x6d/0x80 [sunrpc]
| [<ffffffffa06b2693>] svc_addsock+0x143/0x200 [sunrpc]
| [<ffffffffa072e69c>] write_ports+0x28c/0x340 [nfsd]
| [<ffffffffa072d2ac>] nfsctl_transaction_write+0x4c/0x80 [nfsd]
| [<ffffffff8117ee83>] vfs_write+0xb3/0x1d0
| [<ffffffff8117f889>] SyS_write+0x49/0xb0
| [<ffffffff815c4556>] system_call_fastpath+0x16/0x1b
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
134/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rt: Introduce cpu_chill()
Date: Wed, 7 Mar 2012 20:51:03 +0100
Retry loops on RT might loop forever when the modifying side was
preempted. Add cpu_chill() to replace cpu_relax(). cpu_chill()
defaults to cpu_relax() for non RT. On RT it puts the looping task to
sleep for a tick so the preempted task can make progress.
Steven Rostedt changed it to use a hrtimer instead of msleep():
|
|Ulrich Obergfell pointed out that cpu_chill() calls msleep() which is woken
|up by the ksoftirqd running the TIMER softirq. But as the cpu_chill() is
|called from softirq context, it may block the ksoftirqd() from running, in
|which case, it may never wake up the msleep() causing the deadlock.
+ bigeasy later changed to schedule_hrtimeout()
|If a task calls cpu_chill() and gets woken up by a regular or spurious
|wakeup and has a signal pending, then it exits the sleep loop in
|do_nanosleep() and sets up the restart block. If restart->nanosleep.type is
|not TI_NONE then this results in accessing a stale user pointer from a
|previously interrupted syscall and a copy to user based on the stale
|pointer or a BUG() when 'type' is not supported in nanosleep_copyout().
+ bigeasy: add PF_NOFREEZE:
| [....] Waiting for /dev to be fully populated...
| =====================================
| [ BUG: udevd/229 still has locks held! ]
| 3.12.11-rt17 #23 Not tainted
| -------------------------------------
| 1 lock held by udevd/229:
| #0: (&type->i_mutex_dir_key#2){+.+.+.}, at: lookup_slow+0x28/0x98
|
| stack backtrace:
| CPU: 0 PID: 229 Comm: udevd Not tainted 3.12.11-rt17 #23
| (unwind_backtrace+0x0/0xf8) from (show_stack+0x10/0x14)
| (show_stack+0x10/0x14) from (dump_stack+0x74/0xbc)
| (dump_stack+0x74/0xbc) from (do_nanosleep+0x120/0x160)
| (do_nanosleep+0x120/0x160) from (hrtimer_nanosleep+0x90/0x110)
| (hrtimer_nanosleep+0x90/0x110) from (cpu_chill+0x30/0x38)
| (cpu_chill+0x30/0x38) from (dentry_kill+0x158/0x1ec)
| (dentry_kill+0x158/0x1ec) from (dput+0x74/0x15c)
| (dput+0x74/0x15c) from (lookup_real+0x4c/0x50)
| (lookup_real+0x4c/0x50) from (__lookup_hash+0x34/0x44)
| (__lookup_hash+0x34/0x44) from (lookup_slow+0x38/0x98)
| (lookup_slow+0x38/0x98) from (path_lookupat+0x208/0x7fc)
| (path_lookupat+0x208/0x7fc) from (filename_lookup+0x20/0x60)
| (filename_lookup+0x20/0x60) from (user_path_at_empty+0x50/0x7c)
| (user_path_at_empty+0x50/0x7c) from (user_path_at+0x14/0x1c)
| (user_path_at+0x14/0x1c) from (vfs_fstatat+0x48/0x94)
| (vfs_fstatat+0x48/0x94) from (SyS_stat64+0x14/0x30)
| (SyS_stat64+0x14/0x30) from (ret_fast_syscall+0x0/0x48)
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
135/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: fs: namespace: Use cpu_chill() in trylock loops
Date: Wed, 7 Mar 2012 21:00:34 +0100
Retry loops on RT might loop forever when the modifying side was
preempted. Use cpu_chill() instead of cpu_relax() to let the system
make progress.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
136/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: debugobjects: Make RT aware
Date: Sun, 17 Jul 2011 21:41:35 +0200
Avoid filling the pool / allocating memory with irqs off().
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
137/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: net: Use skbufhead with raw lock
Date: Tue, 12 Jul 2011 15:38:34 +0200
Use the rps lock as rawlock so we can keep irq-off regions. It looks low
latency. However we can't kfree() from this context therefore we defer this
to the softirq and use the tofree_queue list for it (similar to process_queue).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
138/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net: Dequeue in dev_cpu_dead() without the lock
Date: Wed, 16 Sep 2020 16:15:39 +0200
Upstream uses skb_dequeue() to acquire lock of `input_pkt_queue'. The reason is
to synchronize against a remote CPU which still thinks that the CPU is online
enqueues packets to this CPU.
There are no guarantees that the packet is enqueued before the callback is run,
it just hope.
RT however complains about an not initialized lock because it uses another lock
for `input_pkt_queue' due to the IRQ-off nature of the context.
Use the unlocked dequeue version for `input_pkt_queue'.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
139/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net: dev: always take qdisc's busylock in __dev_xmit_skb()
Date: Wed, 30 Mar 2016 13:36:29 +0200
The root-lock is dropped before dev_hard_start_xmit() is invoked and after
setting the __QDISC___STATE_RUNNING bit. If this task is now pushed away
by a task with a higher priority then the task with the higher priority
won't be able to submit packets to the NIC directly instead they will be
enqueued into the Qdisc. The NIC will remain idle until the task(s) with
higher priority leave the CPU and the task with lower priority gets back
and finishes the job.
If we take always the busylock we ensure that the RT task can boost the
low-prio task and submit the packet.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
140/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: irqwork: push most work into softirq context
Date: Tue, 23 Jun 2015 15:32:51 +0200
Initially we defered all irqwork into softirq because we didn't want the
latency spikes if perf or another user was busy and delayed the RT task.
The NOHZ trigger (nohz_full_kick_work) was the first user that did not work
as expected if it did not run in the original irqwork context so we had to
bring it back somehow for it. push_irq_work_func is the second one that
requires this.
This patch adds the IRQ_WORK_HARD_IRQ which makes sure the callback runs
in raw-irq context. Everything else is defered into softirq context. Without
-RT we have the orignal behavior.
This patch incorporates tglx orignal work which revoked a little bringing back
the arch_irq_work_raise() if possible and a few fixes from Steven Rostedt and
Mike Galbraith,
[bigeasy: melt tglx's irq_work_tick_soft() which splits irq_work_tick() into a
hard and soft variant]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
141/194 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: x86: crypto: Reduce preempt disabled regions
Date: Mon, 14 Nov 2011 18:19:27 +0100
Restrict the preempt disabled regions to the actual floating point
operations and enable preemption for the administrative actions.
This is necessary on RT to avoid that kfree and other operations are
called with preemption disabled.
Reported-and-tested-by: Carsten Emde <cbe@osadl.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
142/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: crypto: Reduce preempt disabled regions, more algos
Date: Fri, 21 Feb 2014 17:24:04 +0100
Don Estabrook reported
| kernel: WARNING: CPU: 2 PID: 858 at kernel/sched/core.c:2428 migrate_disable+0xed/0x100()
| kernel: WARNING: CPU: 2 PID: 858 at kernel/sched/core.c:2462 migrate_enable+0x17b/0x200()
| kernel: WARNING: CPU: 3 PID: 865 at kernel/sched/core.c:2428 migrate_disable+0xed/0x100()
and his backtrace showed some crypto functions which looked fine.
The problem is the following sequence:
glue_xts_crypt_128bit()
{
blkcipher_walk_virt(); /* normal migrate_disable() */
glue_fpu_begin(); /* get atomic */
while (nbytes) {
__glue_xts_crypt_128bit();
blkcipher_walk_done(); /* with nbytes = 0, migrate_enable()
* while we are atomic */
};
glue_fpu_end() /* no longer atomic */
}
and this is why the counter get out of sync and the warning is printed.
The other problem is that we are non-preemptible between
glue_fpu_begin() and glue_fpu_end() and the latency grows. To fix this,
I shorten the FPU off region and ensure blkcipher_walk_done() is called
with preemption enabled. This might hurt the performance because we now
enable/disable the FPU state more often but we gain lower latency and
the bug is gone.
Reported-by: Don Estabrook <don.estabrook@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
143/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: crypto: limit more FPU-enabled sections
Date: Thu, 30 Nov 2017 13:40:10 +0100
Those crypto drivers use SSE/AVX/… for their crypto work and in order to
do so in kernel they need to enable the "FPU" in kernel mode which
disables preemption.
There are two problems with the way they are used:
- the while loop which processes X bytes may create latency spikes and
should be avoided or limited.
- the cipher-walk-next part may allocate/free memory and may use
kmap_atomic().
The whole kernel_fpu_begin()/end() processing isn't probably that cheap.
It most likely makes sense to process as much of those as possible in one
go. The new *_fpu_sched_rt() schedules only if a RT task is pending.
Probably we should measure the performance those ciphers in pure SW
mode and with this optimisations to see if it makes sense to keep them
for RT.
This kernel_fpu_resched() makes the code more preemptible which might hurt
performance.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
144/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: crypto: cryptd - add a lock instead preempt_disable/local_bh_disable
Date: Thu, 26 Jul 2018 18:52:00 +0200
cryptd has a per-CPU lock which protected with local_bh_disable() and
preempt_disable().
Add an explicit spin_lock to make the locking context more obvious and
visible to lockdep. Since it is a per-CPU lock, there should be no lock
contention on the actual spinlock.
There is a small race-window where we could be migrated to another CPU
after the cpu_queue has been obtain. This is not a problem because the
actual ressource is protected by the spinlock.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
145/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: panic: skip get_random_bytes for RT_FULL in init_oops_id
Date: Tue, 14 Jul 2015 14:26:34 +0200
Disable on -RT. If this is invoked from irq-context we will have problems
to acquire the sleeping lock.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
146/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: x86: stackprotector: Avoid random pool on rt
Date: Thu, 16 Dec 2010 14:25:18 +0100
CPU bringup calls into the random pool to initialize the stack
canary. During boot that works nicely even on RT as the might sleep
checks are disabled. During CPU hotplug the might sleep checks
trigger. Making the locks in random raw is a major PITA, so avoid the
call on RT is the only sensible solution. This is basically the same
randomness which we get during boot where the random pool has no
entropy and we rely on the TSC randomnness.
Reported-by: Carsten Emde <carsten.emde@osadl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
147/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: random: Make it work on rt
Date: Tue, 21 Aug 2012 20:38:50 +0200
Delegate the random insertion to the forced threaded interrupt
handler. Store the return IP of the hard interrupt handler in the irq
descriptor and feed it into the random generator as a source of
entropy.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
148/194 [
Author: Priyanka Jain
Email: Priyanka.Jain@freescale.com
Subject: net: Remove preemption disabling in netif_rx()
Date: Thu, 17 May 2012 09:35:11 +0530
1)enqueue_to_backlog() (called from netif_rx) should be
bind to a particluar CPU. This can be achieved by
disabling migration. No need to disable preemption
2)Fixes crash "BUG: scheduling while atomic: ksoftirqd"
in case of RT.
If preemption is disabled, enqueue_to_backog() is called
in atomic context. And if backlog exceeds its count,
kfree_skb() is called. But in RT, kfree_skb() might
gets scheduled out, so it expects non atomic context.
3)When CONFIG_PREEMPT_RT is not defined,
migrate_enable(), migrate_disable() maps to
preempt_enable() and preempt_disable(), so no
change in functionality in case of non-RT.
-Replace preempt_enable(), preempt_disable() with
migrate_enable(), migrate_disable() respectively
-Replace get_cpu(), put_cpu() with get_cpu_light(),
put_cpu_light() respectively
Signed-off-by: Priyanka Jain <Priyanka.Jain@freescale.com>
Acked-by: Rajan Srivastava <Rajan.Srivastava@freescale.com>
Cc: <rostedt@goodmis.orgn>
Link: http://lkml.kernel.org/r/1337227511-2271-1-git-send-email-Priyanka.Jain@freescale.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
149/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: lockdep: Make it RT aware
Date: Sun, 17 Jul 2011 18:51:23 +0200
teach lockdep that we don't really do softirqs on -RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
150/194 [
Author: Yong Zhang
Email: yong.zhang@windriver.com
Subject: lockdep: selftest: Only do hardirq context test for raw spinlock
Date: Mon, 16 Apr 2012 15:01:56 +0800
On -rt there is no softirq context any more and rwlock is sleepable,
disable softirq context test and rwlock+irq test.
Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Cc: Yong Zhang <yong.zhang@windriver.com>
Link: http://lkml.kernel.org/r/1334559716-18447-3-git-send-email-yong.zhang0@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
151/194 [
Author: Josh Cartwright
Email: josh.cartwright@ni.com
Subject: lockdep: selftest: fix warnings due to missing PREEMPT_RT conditionals
Date: Wed, 28 Jan 2015 13:08:45 -0600
"lockdep: Selftest: Only do hardirq context test for raw spinlock"
disabled the execution of certain tests with PREEMPT_RT, but did
not prevent the tests from still being defined. This leads to warnings
like:
./linux/lib/locking-selftest.c:574:1: warning: 'irqsafe1_hard_rlock_12' defined but not used [-Wunused-function]
./linux/lib/locking-selftest.c:574:1: warning: 'irqsafe1_hard_rlock_21' defined but not used [-Wunused-function]
./linux/lib/locking-selftest.c:577:1: warning: 'irqsafe1_hard_wlock_12' defined but not used [-Wunused-function]
./linux/lib/locking-selftest.c:577:1: warning: 'irqsafe1_hard_wlock_21' defined but not used [-Wunused-function]
./linux/lib/locking-selftest.c:580:1: warning: 'irqsafe1_soft_spin_12' defined but not used [-Wunused-function]
...
Fixed by wrapping the test definitions in #ifndef CONFIG_PREEMPT_RT
conditionals.
Signed-off-by: Josh Cartwright <josh.cartwright@ni.com>
Signed-off-by: Xander Huff <xander.huff@ni.com>
Acked-by: Gratian Crisan <gratian.crisan@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
152/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: lockdep: disable self-test
Date: Tue, 17 Oct 2017 16:36:18 +0200
The self-test wasn't always 100% accurate for RT. We disabled a few
tests which failed because they had a different semantic for RT. Some
still reported false positives. Now the selftest locks up the system
during boot and it needs to be investigated…
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
153/194 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: drm,radeon,i915: Use preempt_disable/enable_rt() where recommended
Date: Sat, 27 Feb 2016 08:09:11 +0100
DRM folks identified the spots, so use them.
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-rt-users <linux-rt-users@vger.kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
154/194 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: drm/i915: Don't disable interrupts on PREEMPT_RT during atomic updates
Date: Sat, 27 Feb 2016 09:01:42 +0100
Commit
8d7849db3eab7 ("drm/i915: Make sprite updates atomic")
started disabling interrupts across atomic updates. This breaks on PREEMPT_RT
because within this section the code attempt to acquire spinlock_t locks which
are sleeping locks on PREEMPT_RT.
According to the comment the interrupts are disabled to avoid random delays and
not required for protection or synchronisation.
Don't disable interrupts on PREEMPT_RT during atomic updates.
[bigeasy: drop local locks, commit message]
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
155/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: drm/i915: disable tracing on -RT
Date: Thu, 6 Dec 2018 09:52:20 +0100
Luca Abeni reported this:
| BUG: scheduling while atomic: kworker/u8:2/15203/0x00000003
| CPU: 1 PID: 15203 Comm: kworker/u8:2 Not tainted 4.19.1-rt3 #10
| Call Trace:
| rt_spin_lock+0x3f/0x50
| gen6_read32+0x45/0x1d0 [i915]
| g4x_get_vblank_counter+0x36/0x40 [i915]
| trace_event_raw_event_i915_pipe_update_start+0x7d/0xf0 [i915]
The tracing events use trace_i915_pipe_update_start() among other events
use functions acquire spin locks. A few trace points use
intel_get_crtc_scanline(), others use ->get_vblank_counter() wich also
might acquire a sleeping lock.
Based on this I don't see any other way than disable trace points on RT.
Cc: stable-rt@vger.kernel.org
Reported-by: Luca Abeni <lucabe72@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
156/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: drm/i915: skip DRM_I915_LOW_LEVEL_TRACEPOINTS with NOTRACE
Date: Wed, 19 Dec 2018 10:47:02 +0100
The order of the header files is important. If this header file is
included after tracepoint.h was included then the NOTRACE here becomes a
nop. Currently this happens for two .c files which use the tracepoitns
behind DRM_I915_LOW_LEVEL_TRACEPOINTS.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
157/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: drm/i915/gt: Only disable interrupts for the timeline lock on !force-threaded
Date: Tue, 7 Jul 2020 12:25:11 +0200
According to commit
d67739268cf0e ("drm/i915/gt: Mark up the nested engine-pm timeline lock as irqsafe")
the intrrupts are disabled the code may be called from an interrupt
handler and from preemptible context.
With `force_irqthreads' set the timeline mutex is never observed in IRQ
context so it is not neede to disable interrupts.
Disable only interrupts if not in `force_irqthreads' mode.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
158/194 [
Author: Mike Galbraith
Email: efault@gmx.de
Subject: cpuset: Convert callback_lock to raw_spinlock_t
Date: Sun, 8 Jan 2017 09:32:25 +0100
The two commits below add up to a cpuset might_sleep() splat for RT:
8447a0fee974 cpuset: convert callback_mutex to a spinlock
344736f29b35 cpuset: simplify cpuset_node_allowed API
BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:995
in_atomic(): 0, irqs_disabled(): 1, pid: 11718, name: cset
CPU: 135 PID: 11718 Comm: cset Tainted: G E 4.10.0-rt1-rt #4
Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0056.R01.1409242327 09/24/2014
Call Trace:
? dump_stack+0x5c/0x81
? ___might_sleep+0xf4/0x170
? rt_spin_lock+0x1c/0x50
? __cpuset_node_allowed+0x66/0xc0
? ___slab_alloc+0x390/0x570 <disables IRQs>
? anon_vma_fork+0x8f/0x140
? copy_page_range+0x6cf/0xb00
? anon_vma_fork+0x8f/0x140
? __slab_alloc.isra.74+0x5a/0x81
? anon_vma_fork+0x8f/0x140
? kmem_cache_alloc+0x1b5/0x1f0
? anon_vma_fork+0x8f/0x140
? copy_process.part.35+0x1670/0x1ee0
? _do_fork+0xdd/0x3f0
? _do_fork+0xdd/0x3f0
? do_syscall_64+0x61/0x170
? entry_SYSCALL64_slow_path+0x25/0x25
The later ensured that a NUMA box WILL take callback_lock in atomic
context by removing the allocator and reclaim path __GFP_HARDWALL
usage which prevented such contexts from taking callback_mutex.
One option would be to reinstate __GFP_HARDWALL protections for
RT, however, as the 8447a0fee974 changelog states:
The callback_mutex is only used to synchronize reads/updates of cpusets'
flags and cpu/node masks. These operations should always proceed fast so
there's no reason why we can't use a spinlock instead of the mutex.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
159/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: x86: Allow to enable RT
Date: Wed, 7 Aug 2019 18:15:38 +0200
Allow to select RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
160/194 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: mm, rt: kmap_atomic scheduling
Date: Thu, 28 Jul 2011 10:43:51 +0200
In fact, with migrate_disable() existing one could play games with
kmap_atomic. You could save/restore the kmap_atomic slots on context
switch (if there are any in use of course), this should be esp easy now
that we have a kmap_atomic stack.
Something like the below.. it wants replacing all the preempt_disable()
stuff with pagefault_disable() && migrate_disable() of course, but then
you can flip kmaps around like below.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
[dvhart@linux.intel.com: build fix]
Link: http://lkml.kernel.org/r/1311842631.5890.208.camel@twins
[tglx@linutronix.de: Get rid of the per cpu variable and store the idx
and the pte content right away in the task struct.
Shortens the context switch code. ]
]
161/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: x86/highmem: Add a "already used pte" check
Date: Mon, 11 Mar 2013 17:09:55 +0100
This is a copy from kmap_atomic_prot().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
162/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: arm/highmem: Flush tlb on unmap
Date: Mon, 11 Mar 2013 21:37:27 +0100
The tlb should be flushed on unmap and thus make the mapping entry
invalid. This is only done in the non-debug case which does not look
right.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
163/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: arm: Enable highmem for rt
Date: Wed, 13 Feb 2013 11:03:11 +0100
fixup highmem for ARM.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
164/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mm/scatterlist: Do not disable irqs on RT
Date: Fri, 3 Jul 2009 08:44:34 -0500
For -RT it is enough to keep pagefault disabled (which is currently handled by
kmap_atomic()).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
165/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Add support for lazy preemption
Date: Fri, 26 Oct 2012 18:50:54 +0100
It has become an obsession to mitigate the determinism vs. throughput
loss of RT. Looking at the mainline semantics of preemption points
gives a hint why RT sucks throughput wise for ordinary SCHED_OTHER
tasks. One major issue is the wakeup of tasks which are right away
preempting the waking task while the waking task holds a lock on which
the woken task will block right after having preempted the wakee. In
mainline this is prevented due to the implicit preemption disable of
spin/rw_lock held regions. On RT this is not possible due to the fully
preemptible nature of sleeping spinlocks.
Though for a SCHED_OTHER task preempting another SCHED_OTHER task this
is really not a correctness issue. RT folks are concerned about
SCHED_FIFO/RR tasks preemption and not about the purely fairness
driven SCHED_OTHER preemption latencies.
So I introduced a lazy preemption mechanism which only applies to
SCHED_OTHER tasks preempting another SCHED_OTHER task. Aside of the
existing preempt_count each tasks sports now a preempt_lazy_count
which is manipulated on lock acquiry and release. This is slightly
incorrect as for lazyness reasons I coupled this on
migrate_disable/enable so some other mechanisms get the same treatment
(e.g. get_cpu_light).
Now on the scheduler side instead of setting NEED_RESCHED this sets
NEED_RESCHED_LAZY in case of a SCHED_OTHER/SCHED_OTHER preemption and
therefor allows to exit the waking task the lock held region before
the woken task preempts. That also works better for cross CPU wakeups
as the other side can stay in the adaptive spinning loop.
For RT class preemption there is no change. This simply sets
NEED_RESCHED and forgoes the lazy preemption counter.
Initial test do not expose any observable latency increasement, but
history shows that I've been proven wrong before :)
The lazy preemption mode is per default on, but with
CONFIG_SCHED_DEBUG enabled it can be disabled via:
# echo NO_PREEMPT_LAZY >/sys/kernel/debug/sched_features
and reenabled via
# echo PREEMPT_LAZY >/sys/kernel/debug/sched_features
The test results so far are very machine and workload dependent, but
there is a clear trend that it enhances the non RT workload
performance.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
166/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: x86/entry: Use should_resched() in idtentry_exit_cond_resched()
Date: Tue, 30 Jun 2020 11:45:14 +0200
The TIF_NEED_RESCHED bit is inlined on x86 into the preemption counter.
By using should_resched(0) instead of need_resched() the same check can
be performed which uses the same variable as 'preempt_count()` which was
issued before.
Use should_resched(0) instead need_resched().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
167/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: x86: Support for lazy preemption
Date: Thu, 1 Nov 2012 11:03:47 +0100
Implement the x86 pieces for lazy preempt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
168/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: arm: Add support for lazy preemption
Date: Wed, 31 Oct 2012 12:04:11 +0100
Implement the arm pieces for lazy preempt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
169/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: powerpc: Add support for lazy preemption
Date: Thu, 1 Nov 2012 10:14:11 +0100
Implement the powerpc pieces for lazy preempt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
170/194 [
Author: Anders Roxell
Email: anders.roxell@linaro.org
Subject: arch/arm64: Add lazy preempt support
Date: Thu, 14 May 2015 17:52:17 +0200
arm64 is missing support for PREEMPT_RT. The main feature which is
lacking is support for lazy preemption. The arch-specific entry code,
thread information structure definitions, and associated data tables
have to be extended to provide this support. Then the Kconfig file has
to be extended to indicate the support is available, and also to
indicate that support for full RT preemption is now available.
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
]
171/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: jump-label: disable if stop_machine() is used
Date: Wed, 8 Jul 2015 17:14:48 +0200
Some architectures are using stop_machine() while switching the opcode which
leads to latency spikes.
The architectures which use stop_machine() atm:
- ARM stop machine
- s390 stop machine
The architecures which use other sorcery:
- MIPS
- X86
- powerpc
- sparc
- arm64
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy: only ARM for now]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
172/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: leds: trigger: disable CPU trigger on -RT
Date: Thu, 23 Jan 2014 14:45:59 +0100
as it triggers:
|CPU: 0 PID: 0 Comm: swapper Not tainted 3.12.8-rt10 #141
|[<c0014aa4>] (unwind_backtrace+0x0/0xf8) from [<c0012788>] (show_stack+0x1c/0x20)
|[<c0012788>] (show_stack+0x1c/0x20) from [<c043c8dc>] (dump_stack+0x20/0x2c)
|[<c043c8dc>] (dump_stack+0x20/0x2c) from [<c004c5e8>] (__might_sleep+0x13c/0x170)
|[<c004c5e8>] (__might_sleep+0x13c/0x170) from [<c043f270>] (__rt_spin_lock+0x28/0x38)
|[<c043f270>] (__rt_spin_lock+0x28/0x38) from [<c043fa00>] (rt_read_lock+0x68/0x7c)
|[<c043fa00>] (rt_read_lock+0x68/0x7c) from [<c036cf74>] (led_trigger_event+0x2c/0x5c)
|[<c036cf74>] (led_trigger_event+0x2c/0x5c) from [<c036e0bc>] (ledtrig_cpu+0x54/0x5c)
|[<c036e0bc>] (ledtrig_cpu+0x54/0x5c) from [<c000ffd8>] (arch_cpu_idle_exit+0x18/0x1c)
|[<c000ffd8>] (arch_cpu_idle_exit+0x18/0x1c) from [<c00590b8>] (cpu_startup_entry+0xa8/0x234)
|[<c00590b8>] (cpu_startup_entry+0xa8/0x234) from [<c043b2cc>] (rest_init+0xb8/0xe0)
|[<c043b2cc>] (rest_init+0xb8/0xe0) from [<c061ebe0>] (start_kernel+0x2c4/0x380)
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
173/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: tty/serial/omap: Make the locking RT aware
Date: Thu, 28 Jul 2011 13:32:57 +0200
The lock is a sleeping lock and local_irq_save() is not the
optimsation we are looking for. Redo it to make it work on -RT and
non-RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
174/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: tty/serial/pl011: Make the locking work on RT
Date: Tue, 8 Jan 2013 21:36:51 +0100
The lock is a sleeping lock and local_irq_save() is not the optimsation
we are looking for. Redo it to make it work on -RT and non-RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
175/194 [
Author: Yadi.hu
Email: yadi.hu@windriver.com
Subject: ARM: enable irq in translation/section permission fault handlers
Date: Wed, 10 Dec 2014 10:32:09 +0800
Probably happens on all ARM, with
CONFIG_PREEMPT_RT
CONFIG_DEBUG_ATOMIC_SLEEP
This simple program....
int main() {
*((char*)0xc0001000) = 0;
};
[ 512.742724] BUG: sleeping function called from invalid context at kernel/rtmutex.c:658
[ 512.743000] in_atomic(): 0, irqs_disabled(): 128, pid: 994, name: a
[ 512.743217] INFO: lockdep is turned off.
[ 512.743360] irq event stamp: 0
[ 512.743482] hardirqs last enabled at (0): [< (null)>] (null)
[ 512.743714] hardirqs last disabled at (0): [<c0426370>] copy_process+0x3b0/0x11c0
[ 512.744013] softirqs last enabled at (0): [<c0426370>] copy_process+0x3b0/0x11c0
[ 512.744303] softirqs last disabled at (0): [< (null)>] (null)
[ 512.744631] [<c041872c>] (unwind_backtrace+0x0/0x104)
[ 512.745001] [<c09af0c4>] (dump_stack+0x20/0x24)
[ 512.745355] [<c0462490>] (__might_sleep+0x1dc/0x1e0)
[ 512.745717] [<c09b6770>] (rt_spin_lock+0x34/0x6c)
[ 512.746073] [<c0441bf0>] (do_force_sig_info+0x34/0xf0)
[ 512.746457] [<c0442668>] (force_sig_info+0x18/0x1c)
[ 512.746829] [<c041d880>] (__do_user_fault+0x9c/0xd8)
[ 512.747185] [<c041d938>] (do_bad_area+0x7c/0x94)
[ 512.747536] [<c041d990>] (do_sect_fault+0x40/0x48)
[ 512.747898] [<c040841c>] (do_DataAbort+0x40/0xa0)
[ 512.748181] Exception stack(0xecaa1fb0 to 0xecaa1ff8)
Oxc0000000 belongs to kernel address space, user task can not be
allowed to access it. For above condition, correct result is that
test case should receive a “segment fault” and exits but not stacks.
the root cause is commit 02fe2845d6a8 ("avoid enabling interrupts in
prefetch/data abort handlers"),it deletes irq enable block in Data
abort assemble code and move them into page/breakpiont/alignment fault
handlers instead. But author does not enable irq in translation/section
permission fault handlers. ARM disables irq when it enters exception/
interrupt mode, if kernel doesn't enable irq, it would be still disabled
during translation/section permission fault.
We see the above splat because do_force_sig_info is still called with
IRQs off, and that code eventually does a:
spin_lock_irqsave(&t->sighand->siglock, flags);
As this is architecture independent code, and we've not seen any other
need for other arch to have the siglock converted to raw lock, we can
conclude that we should enable irq for ARM translation/section
permission exception.
Signed-off-by: Yadi.hu <yadi.hu@windriver.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
176/194 [
Author: Josh Cartwright
Email: joshc@ni.com
Subject: genirq: update irq_set_irqchip_state documentation
Date: Thu, 11 Feb 2016 11:54:00 -0600
On -rt kernels, the use of migrate_disable()/migrate_enable() is
sufficient to guarantee a task isn't moved to another CPU. Update the
irq_set_irqchip_state() documentation to reflect this.
Signed-off-by: Josh Cartwright <joshc@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
177/194 [
Author: Josh Cartwright
Email: joshc@ni.com
Subject: KVM: arm/arm64: downgrade preempt_disable()d region to migrate_disable()
Date: Thu, 11 Feb 2016 11:54:01 -0600
kvm_arch_vcpu_ioctl_run() disables the use of preemption when updating
the vgic and timer states to prevent the calling task from migrating to
another CPU. It does so to prevent the task from writing to the
incorrect per-CPU GIC distributor registers.
On -rt kernels, it's possible to maintain the same guarantee with the
use of migrate_{disable,enable}(), with the added benefit that the
migrate-disabled region is preemptible. Update
kvm_arch_vcpu_ioctl_run() to do so.
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Reported-by: Manish Jaggi <Manish.Jaggi@caviumnetworks.com>
Signed-off-by: Josh Cartwright <joshc@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
178/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: arm64: fpsimd: Delay freeing memory in fpsimd_flush_thread()
Date: Wed, 25 Jul 2018 14:02:38 +0200
fpsimd_flush_thread() invokes kfree() via sve_free() within a preempt disabled
section which is not working on -RT.
Delay freeing of memory until preemption is enabled again.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
179/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: x86: Enable RT also on 32bit
Date: Thu, 7 Nov 2019 17:49:20 +0100
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
180/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: ARM: Allow to enable RT
Date: Fri, 11 Oct 2019 13:14:29 +0200
Allow to select RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
181/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: ARM64: Allow to enable RT
Date: Fri, 11 Oct 2019 13:14:35 +0200
Allow to select RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
182/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: powerpc/pseries/iommu: Use a locallock instead local_irq_save()
Date: Tue, 26 Mar 2019 18:31:54 +0100
The locallock protects the per-CPU variable tce_page. The function
attempts to allocate memory while tce_page is protected (by disabling
interrupts).
Use local_irq_save() instead of local_irq_disable().
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
183/194 [
Author: Bogdan Purcareata
Email: bogdan.purcareata@freescale.com
Subject: powerpc/kvm: Disable in-kernel MPIC emulation for PREEMPT_RT
Date: Fri, 24 Apr 2015 15:53:13 +0000
While converting the openpic emulation code to use a raw_spinlock_t enables
guests to run on RT, there's still a performance issue. For interrupts sent in
directed delivery mode with a multiple CPU mask, the emulated openpic will loop
through all of the VCPUs, and for each VCPUs, it call IRQ_check, which will loop
through all the pending interrupts for that VCPU. This is done while holding the
raw_lock, meaning that in all this time the interrupts and preemption are
disabled on the host Linux. A malicious user app can max both these number and
cause a DoS.
This temporary fix is sent for two reasons. First is so that users who want to
use the in-kernel MPIC emulation are aware of the potential latencies, thus
making sure that the hardware MPIC and their usage scenario does not involve
interrupts sent in directed delivery mode, and the number of possible pending
interrupts is kept small. Secondly, this should incentivize the development of a
proper openpic emulation that would be better suited for RT.
Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Bogdan Purcareata <bogdan.purcareata@freescale.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
184/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: powerpc: Disable highmem on RT
Date: Mon, 18 Jul 2011 17:08:34 +0200
The current highmem handling on -RT is not compatible and needs fixups.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
185/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: powerpc/stackprotector: work around stack-guard init from atomic
Date: Tue, 26 Mar 2019 18:31:29 +0100
This is invoked from the secondary CPU in atomic context. On x86 we use
tsc instead. On Power we XOR it against mftb() so lets use stack address
as the initial value.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
186/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: POWERPC: Allow to enable RT
Date: Fri, 11 Oct 2019 13:14:41 +0200
Allow to select RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
187/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mips: Disable highmem on RT
Date: Mon, 18 Jul 2011 17:10:12 +0200
The current highmem handling on -RT is not compatible and needs fixups.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
188/194 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: drivers/block/zram: Replace bit spinlocks with rtmutex for -rt
Date: Thu, 31 Mar 2016 04:08:28 +0200
They're nondeterministic, and lead to ___might_sleep() splats in -rt.
OTOH, they're a lot less wasteful than an rtmutex per page.
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
189/194 [
Author: Haris Okanovic
Email: haris.okanovic@ni.com
Subject: tpm_tis: fix stall after iowrite*()s
Date: Tue, 15 Aug 2017 15:13:08 -0500
ioread8() operations to TPM MMIO addresses can stall the cpu when
immediately following a sequence of iowrite*()'s to the same region.
For example, cyclitest measures ~400us latency spikes when a non-RT
usermode application communicates with an SPI-based TPM chip (Intel Atom
E3940 system, PREEMPT_RT kernel). The spikes are caused by a
stalling ioread8() operation following a sequence of 30+ iowrite8()s to
the same address. I believe this happens because the write sequence is
buffered (in cpu or somewhere along the bus), and gets flushed on the
first LOAD instruction (ioread*()) that follows.
The enclosed change appears to fix this issue: read the TPM chip's
access register (status code) after every iowrite*() operation to
amortize the cost of flushing data to chip across multiple instructions.
Signed-off-by: Haris Okanovic <haris.okanovic@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
190/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: signals: Allow rt tasks to cache one sigqueue struct
Date: Fri, 3 Jul 2009 08:44:56 -0500
To avoid allocation allow rt tasks to cache one sigqueue struct in
task struct.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
191/194 [
Author: Matt Fleming
Email: matt@codeblueprint.co.uk
Subject: signal: Prevent double-free of user struct
Date: Tue, 7 Apr 2020 10:54:13 +0100
The way user struct reference counting works changed significantly with,
fda31c50292a ("signal: avoid double atomic counter increments for user accounting")
Now user structs are only freed once the last pending signal is
dequeued. Make sigqueue_free_current() follow this new convention to
avoid freeing the user struct multiple times and triggering this
warning:
refcount_t: underflow; use-after-free.
WARNING: CPU: 0 PID: 6794 at lib/refcount.c:288 refcount_dec_not_one+0x45/0x50
Call Trace:
refcount_dec_and_lock_irqsave+0x16/0x60
free_uid+0x31/0xa0
__dequeue_signal+0x17c/0x190
dequeue_signal+0x5a/0x1b0
do_sigtimedwait+0x208/0x250
__x64_sys_rt_sigtimedwait+0x6f/0xd0
do_syscall_64+0x72/0x200
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Reported-by: Daniel Wagner <wagi@monom.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
192/194 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: genirq: Disable irqpoll on -rt
Date: Fri, 3 Jul 2009 08:29:57 -0500
Creates long latencies for no value
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
193/194 [
Author: Clark Williams
Email: williams@redhat.com
Subject: sysfs: Add /sys/kernel/realtime entry
Date: Sat, 30 Jul 2011 21:55:53 -0500
Add a /sys/kernel entry to indicate that the kernel is a
realtime kernel.
Clark says that he needs this for udev rules, udev needs to evaluate
if its a PREEMPT_RT kernel a few thousand times and parsing uname
output is too slow or so.
Are there better solutions? Should it exist and return 0 on !-rt?
Signed-off-by: Clark Williams <williams@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
]
194/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: Add localversion for -RT release
Date: Fri, 8 Jul 2011 20:25:16 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
1/252 [
Author: Waiman Long
Email: longman@redhat.com
Subject: lib/smp_processor_id: Don't use cpumask_equal()
Date: Thu, 3 Oct 2019 16:36:08 -0400
The check_preemption_disabled() function uses cpumask_equal() to see
if the task is bounded to the current CPU only. cpumask_equal() calls
memcmp() to do the comparison. As x86 doesn't have __HAVE_ARCH_MEMCMP,
the slow memcmp() function in lib/string.c is used.
On a RT kernel that call check_preemption_disabled() very frequently,
below is the perf-record output of a certain microbenchmark:
42.75% 2.45% testpmd [kernel.kallsyms] [k] check_preemption_disabled
40.01% 39.97% testpmd [kernel.kallsyms] [k] memcmp
We should avoid calling memcmp() in performance critical path. So the
cpumask_equal() call is now replaced with an equivalent simpler check.
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
2/252 [
Author: Julien Grall
Email: julien.grall@arm.com
Subject: lib/ubsan: Don't seralize UBSAN report
Date: Fri, 20 Sep 2019 11:08:35 +0100
At the moment, UBSAN report will be serialized using a spin_lock(). On
RT-systems, spinlocks are turned to rt_spin_lock and may sleep. This will
result to the following splat if the undefined behavior is in a context
that can sleep:
| BUG: sleeping function called from invalid context at /src/linux/kernel/locking/rtmutex.c:968
| in_atomic(): 1, irqs_disabled(): 128, pid: 3447, name: make
| 1 lock held by make/3447:
| #0: 000000009a966332 (&mm->mmap_sem){++++}, at: do_page_fault+0x140/0x4f8
| Preemption disabled at:
| [<ffff000011324a4c>] rt_mutex_futex_unlock+0x4c/0xb0
| CPU: 3 PID: 3447 Comm: make Tainted: G W 5.2.14-rt7-01890-ge6e057589653 #911
| Call trace:
| dump_backtrace+0x0/0x148
| show_stack+0x14/0x20
| dump_stack+0xbc/0x104
| ___might_sleep+0x154/0x210
| rt_spin_lock+0x68/0xa0
| ubsan_prologue+0x30/0x68
| handle_overflow+0x64/0xe0
| __ubsan_handle_add_overflow+0x10/0x18
| __lock_acquire+0x1c28/0x2a28
| lock_acquire+0xf0/0x370
| _raw_spin_lock_irqsave+0x58/0x78
| rt_mutex_futex_unlock+0x4c/0xb0
| rt_spin_unlock+0x28/0x70
| get_page_from_freelist+0x428/0x2b60
| __alloc_pages_nodemask+0x174/0x1708
| alloc_pages_vma+0x1ac/0x238
| __handle_mm_fault+0x4ac/0x10b0
| handle_mm_fault+0x1d8/0x3b0
| do_page_fault+0x1c8/0x4f8
| do_translation_fault+0xb8/0xe0
| do_mem_abort+0x3c/0x98
| el0_da+0x20/0x24
The spin_lock() will protect against multiple CPUs to output a report
together, I guess to prevent them to be interleaved. However, they can
still interleave with other messages (and even splat from __migth_sleep).
So the lock usefulness seems pretty limited. Rather than trying to
accomodate RT-system by switching to a raw_spin_lock(), the lock is now
completely dropped.
Link: https://lkml.kernel.org/r/20190920100835.14999-1-julien.grall@arm.com
Reported-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
3/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: jbd2: Simplify journal_unmap_buffer()
Date: Fri, 9 Aug 2019 14:42:27 +0200
journal_unmap_buffer() checks first whether the buffer head is a journal.
If so it takes locks and then invokes jbd2_journal_grab_journal_head()
followed by another check whether this is journal head buffer.
The double checking is pointless.
Replace the initial check with jbd2_journal_grab_journal_head() which
alredy checks whether the buffer head is actually a journal.
Allows also early access to the journal head pointer for the upcoming
conversion of state lock to a regular spinlock.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: linux-ext4@vger.kernel.org
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
4/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: jbd2: Remove jbd_trylock_bh_state()
Date: Fri, 9 Aug 2019 14:42:28 +0200
No users.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: linux-ext4@vger.kernel.org
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
5/252 [
Author: Jan Kara
Email: jack@suse.cz
Subject: jbd2: Move dropping of jh reference out of un/re-filing functions
Date: Fri, 9 Aug 2019 14:42:29 +0200
__jbd2_journal_unfile_buffer() and __jbd2_journal_refile_buffer() drop
transaction's jh reference when they remove jh from a transaction. This
will be however inconvenient once we move state lock into journal_head
itself as we still need to unlock it and we'd need to grab jh reference
just for that. Move dropping of jh reference out of these functions into
the few callers.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
6/252 [
Author: Jan Kara
Email: jack@suse.cz
Subject: jbd2: Drop unnecessary branch from jbd2_journal_forget()
Date: Fri, 9 Aug 2019 14:42:30 +0200
We have cleared both dirty & jbddirty bits from the bh. So there's no
difference between bforget() and brelse(). Thus there's no point jumping
to no_jbd branch.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
7/252 [
Author: Jan Kara
Email: jack@suse.cz
Subject: jbd2: Don't call __bforget() unnecessarily
Date: Fri, 9 Aug 2019 14:42:31 +0200
jbd2_journal_forget() jumps to 'not_jbd' branch which calls __bforget()
in cases where the buffer is clean which is pointless. In case of failed
assertion, it can be even argued that it is safer not to touch buffer's
dirty bits. Also logically it makes more sense to just jump to 'drop'
and that will make logic also simpler when we switch bh_state_lock to a
spinlock.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
8/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: jbd2: Make state lock a spinlock
Date: Fri, 9 Aug 2019 14:42:32 +0200
Bit-spinlocks are problematic on PREEMPT_RT if functions which might sleep
on RT, e.g. spin_lock(), alloc/free(), are invoked inside the lock held
region because bit spinlocks disable preemption even on RT.
A first attempt was to replace state lock with a spinlock placed in struct
buffer_head and make the locking conditional on PREEMPT_RT and
DEBUG_BIT_SPINLOCKS.
Jan pointed out that there is a 4 byte hole in struct journal_head where a
regular spinlock fits in and he would not object to convert the state lock
to a spinlock unconditionally.
Aside of solving the RT problem, this also gains lockdep coverage for the
journal head state lock (bit-spinlocks are not covered by lockdep as it's
hard to fit a lockdep map into a single bit).
The trivial change would have been to convert the jbd_*lock_bh_state()
inlines, but that comes with the downside that these functions take a
buffer head pointer which needs to be converted to a journal head pointer
which adds another level of indirection.
As almost all functions which use this lock have a journal head pointer
readily available, it makes more sense to remove the lock helper inlines
and write out spin_*lock() at all call sites.
Fixup all locking comments as well.
Suggested-by: Jan Kara <jack@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Jan Kara <jack@suse.com>
Cc: linux-ext4@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
9/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: jbd2: Free journal head outside of locked region
Date: Fri, 9 Aug 2019 14:42:33 +0200
On PREEMPT_RT bit-spinlocks have the same semantics as on PREEMPT_RT=n,
i.e. they disable preemption. That means functions which are not safe to be
called in preempt disabled context on RT trigger a might_sleep() assert.
The journal head bit spinlock is mostly held for short code sequences with
trivial RT safe functionality, except for one place:
jbd2_journal_put_journal_head() invokes __journal_remove_journal_head()
with the journal head bit spinlock held. __journal_remove_journal_head()
invokes kmem_cache_free() which must not be called with preemption disabled
on RT.
Jan suggested to rework the removal function so the actual free happens
outside the bit-spinlocked region.
Split it into two parts:
- Do the sanity checks and the buffer head detach under the lock
- Do the actual free after dropping the lock
There is error case handling in the free part which needs to dereference
the b_size field of the now detached buffer head. Due to paranoia (caused
by ignorance) the size is retrieved in the detach function and handed into
the free function. Might be over-engineered, but better safe than sorry.
This makes the journal head bit-spinlock usage RT compliant and also avoids
nested locking which is not covered by lockdep.
Suggested-by: Jan Kara <jack@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-ext4@vger.kernel.org
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Jan Kara <jack@suse.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
10/252 [
Author: Joel Fernandes (Google)
Email: joel@joelfernandes.org
Subject: workqueue: Convert for_each_wq to use built-in list check
Date: Thu, 15 Aug 2019 10:18:42 -0400
Because list_for_each_entry_rcu() can now check for holding a
lock as well as for being in an RCU read-side critical section,
this commit replaces the workqueue_sysfs_unregister() function's
use of assert_rcu_or_wq_mutex() and list_for_each_entry_rcu() with
list_for_each_entry_rcu() augmented with a lockdep_is_held() optional
argument.
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
11/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: x86/ioapic: Prevent inconsistent state when moving an interrupt
Date: Thu, 17 Oct 2019 12:19:01 +0200
There is an issue with threaded interrupts which are marked ONESHOT
and using the fasteoi handler:
if (IS_ONESHOT())
mask_irq();
....
cond_unmask_eoi_irq()
chip->irq_eoi();
if (setaffinity_pending) {
mask_ioapic();
...
move_affinity();
unmask_ioapic();
}
So if setaffinity is pending the interrupt will be moved and then
unconditionally unmasked at the ioapic level, which is wrong in two
aspects:
1) It should be kept masked up to the point where the threaded handler
finished.
2) The physical chip state and the software masked state are inconsistent
Guard both the mask and the unmask with a check for the software masked
state. If the line is marked masked then the ioapic line is also masked, so
both mask_ioapic() and unmask_ioapic() can be skipped safely.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Fixes: 3aa551c9b4c4 ("genirq: add threaded interrupt handler support")
Link: https://lkml.kernel.org/r/20191017101938.321393687@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
12/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: x86/ioapic: Rename misnamed functions
Date: Thu, 17 Oct 2019 12:19:02 +0200
ioapic_irqd_[un]mask() are misnomers as both functions do way more than
masking and unmasking the interrupt line. Both deal with the moving the
affinity of the interrupt within interrupt context. The mask/unmask is just
a tiny part of the functionality.
Rename them to ioapic_prepare/finish_move(), fixup the call sites and
rename the related variables in the code to reflect what this is about.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Link: https://lkml.kernel.org/r/20191017101938.412489856@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
13/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: percpu-refcount: use normal instead of RCU-sched"
Date: Wed, 4 Sep 2019 17:59:36 +0200
This is a revert of commit
a4244454df129 ("percpu-refcount: use RCU-sched insted of normal RCU")
which claims the only reason for using RCU-sched is
"rcu_read_[un]lock() … are slightly more expensive than preempt_disable/enable()"
and
"As the RCU critical sections are extremely short, using sched-RCU
shouldn't have any latency implications."
The problem with RCU-sched is that it disables preemption and the
callback must not acquire any sleeping locks like spinlock_t on
PREEMPT_RT which is the case.
Convert back to normal RCU.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
14/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: drm/i915: Don't disable interrupts independently of the lock
Date: Wed, 10 Apr 2019 11:01:37 +0200
The locks (active.lock and rq->lock) need to be taken with disabled
interrupts. This is done in i915_request_retire() by disabling the
interrupts independently of the locks itself.
While local_irq_disable()+spin_lock() equals spin_lock_irq() on vanilla
it does not on PREEMPT_RT.
Chris Wilson confirmed that local_irq_disable() was just introduced as
an optimisation to avoid enabling/disabling interrupts during
lock/unlock combo.
Enable/disable interrupts as part of the locking instruction.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
15/252 [
Author: Julia Cartwright
Email: julia@ni.com
Subject: watchdog: prevent deferral of watchdogd wakeup on RT
Date: Fri, 28 Sep 2018 21:03:51 +0000
When PREEMPT_RT is enabled, all hrtimer expiry functions are
deferred for execution into the context of ksoftirqd unless otherwise
annotated.
Deferring the expiry of the hrtimer used by the watchdog core, however,
is a waste, as the callback does nothing but queue a kthread work item
and wakeup watchdogd.
It's worst then that, too: the deferral through ksoftirqd also means
that for correct behavior a user must adjust the scheduling parameters
of both watchdogd _and_ ksoftirqd, which is unnecessary and has other
side effects (like causing unrelated expiry functions to execute at
potentially elevated priority).
Instead, mark the hrtimer used by the watchdog core as being _HARD to
allow it's execution directly from hardirq context. The work done in
this expiry function is well-bounded and minimal.
A user still must adjust the scheduling parameters of the watchdogd
to be correct w.r.t. their application needs.
Cc: Guenter Roeck <linux@roeck-us.net>
Reported-and-tested-by: Steffen Trumtrar <s.trumtrar@pengutronix.de>
Reported-by: Tim Sander <tim@krieglstein.org>
Signed-off-by: Julia Cartwright <julia@ni.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
[bigeasy: use only HRTIMER_MODE_REL_HARD]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
16/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: block: Don't disable interrupts in trigger_softirq()
Date: Fri, 15 Nov 2019 21:37:22 +0100
trigger_softirq() is always invoked as a SMP-function call which is
always invoked with disables interrupts.
Don't disable interrupt in trigger_softirq() because interrupts are
already disabled.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
17/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: arm64: KVM: Invoke compute_layout() before alternatives are applied
Date: Thu, 26 Jul 2018 09:13:42 +0200
compute_layout() is invoked as part of an alternative fixup under
stop_machine(). This function invokes get_random_long() which acquires a
sleeping lock on -RT which can not be acquired in this context.
Rename compute_layout() to kvm_compute_layout() and invoke it before
stop_machine() applies the alternatives. Add a __init prefix to
kvm_compute_layout() because the caller has it, too (and so the code can be
discarded after boot).
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
18/252 [
Author: Marc Kleine-Budde
Email: mkl@pengutronix.de
Subject: net: sched: Use msleep() instead of yield()
Date: Wed, 5 Mar 2014 00:49:47 +0100
On PREEMPT_RT enabled systems the interrupt handler run as threads at prio 50
(by default). If a high priority userspace process tries to shut down a busy
network interface it might spin in a yield loop waiting for the device to
become idle. With the interrupt thread having a lower priority than the
looping process it might never be scheduled and so result in a deadlock on UP
systems.
With Magic SysRq the following backtrace can be produced:
> test_app R running 0 174 168 0x00000000
> [<c02c7070>] (__schedule+0x220/0x3fc) from [<c02c7870>] (preempt_schedule_irq+0x48/0x80)
> [<c02c7870>] (preempt_schedule_irq+0x48/0x80) from [<c0008fa8>] (svc_preempt+0x8/0x20)
> [<c0008fa8>] (svc_preempt+0x8/0x20) from [<c001a984>] (local_bh_enable+0x18/0x88)
> [<c001a984>] (local_bh_enable+0x18/0x88) from [<c025316c>] (dev_deactivate_many+0x220/0x264)
> [<c025316c>] (dev_deactivate_many+0x220/0x264) from [<c023be04>] (__dev_close_many+0x64/0xd4)
> [<c023be04>] (__dev_close_many+0x64/0xd4) from [<c023be9c>] (__dev_close+0x28/0x3c)
> [<c023be9c>] (__dev_close+0x28/0x3c) from [<c023f7f0>] (__dev_change_flags+0x88/0x130)
> [<c023f7f0>] (__dev_change_flags+0x88/0x130) from [<c023f904>] (dev_change_flags+0x10/0x48)
> [<c023f904>] (dev_change_flags+0x10/0x48) from [<c024c140>] (do_setlink+0x370/0x7ec)
> [<c024c140>] (do_setlink+0x370/0x7ec) from [<c024d2f0>] (rtnl_newlink+0x2b4/0x450)
> [<c024d2f0>] (rtnl_newlink+0x2b4/0x450) from [<c024cfa0>] (rtnetlink_rcv_msg+0x158/0x1f4)
> [<c024cfa0>] (rtnetlink_rcv_msg+0x158/0x1f4) from [<c0256740>] (netlink_rcv_skb+0xac/0xc0)
> [<c0256740>] (netlink_rcv_skb+0xac/0xc0) from [<c024bbd8>] (rtnetlink_rcv+0x18/0x24)
> [<c024bbd8>] (rtnetlink_rcv+0x18/0x24) from [<c02561b8>] (netlink_unicast+0x13c/0x198)
> [<c02561b8>] (netlink_unicast+0x13c/0x198) from [<c025651c>] (netlink_sendmsg+0x264/0x2e0)
> [<c025651c>] (netlink_sendmsg+0x264/0x2e0) from [<c022af98>] (sock_sendmsg+0x78/0x98)
> [<c022af98>] (sock_sendmsg+0x78/0x98) from [<c022bb50>] (___sys_sendmsg.part.25+0x268/0x278)
> [<c022bb50>] (___sys_sendmsg.part.25+0x268/0x278) from [<c022cf08>] (__sys_sendmsg+0x48/0x78)
> [<c022cf08>] (__sys_sendmsg+0x48/0x78) from [<c0009320>] (ret_fast_syscall+0x0/0x2c)
This patch works around the problem by replacing yield() by msleep(1), giving
the interrupt thread time to finish, similar to other changes contained in the
rt patch set. Using wait_for_completion() instead would probably be a better
solution.
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
19/252 [
Author: Uladzislau Rezki (Sony)
Email: urezki@gmail.com
Subject: mm/vmalloc: remove preempt_disable/enable when doing preloading
Date: Sat, 30 Nov 2019 17:54:33 -0800
Some background. The preemption was disabled before to guarantee that a
preloaded object is available for a CPU, it was stored for. That was
achieved by combining the disabling the preemption and taking the spin
lock while the ne_fit_preload_node is checked.
The aim was to not allocate in atomic context when spinlock is taken
later, for regular vmap allocations. But that approach conflicts with
CONFIG_PREEMPT_RT philosophy. It means that calling spin_lock() with
disabled preemption is forbidden in the CONFIG_PREEMPT_RT kernel.
Therefore, get rid of preempt_disable() and preempt_enable() when the
preload is done for splitting purpose. As a result we do not guarantee
now that a CPU is preloaded, instead we minimize the case when it is
not, with this change, by populating the per cpu preload pointer under
the vmap_area_lock.
This implies that at least each caller that has done the preallocation
will not fallback to an atomic allocation later. It is possible that
the preallocation would be pointless or that no preallocation is done
because of the race but the data shows that this is really rare.
For example i run the special test case that follows the preload pattern
and path. 20 "unbind" threads run it and each does 1000000 allocations.
Only 3.5 times among 1000000 a CPU was not preloaded. So it can happen
but the number is negligible.
[mhocko@suse.com: changelog additions]
Link: http://lkml.kernel.org/r/20191016095438.12391-1-urezki@gmail.com
Fixes: 82dd23e84be3 ("mm/vmalloc.c: preload a CPU with one object for split purpose")
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Daniel Wagner <dwagner@suse.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
20/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: KVM: arm/arm64: Let the timer expire in hardirq context on RT
Date: Tue, 13 Aug 2019 14:29:41 +0200
The timers are canceled from an preempt-notifier which is invoked with
disabled preemption which is not allowed on PREEMPT_RT.
The timer callback is short so in could be invoked in hard-IRQ context
on -RT.
Let the timer expire on hard-IRQ context even on -RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Marc Zyngier <maz@kernel.org>
Tested-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
21/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk-rb: add printk ring buffer documentation
Date: Tue, 12 Feb 2019 15:29:39 +0100
The full documentation file for the printk ring buffer.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
22/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk-rb: add prb locking functions
Date: Tue, 12 Feb 2019 15:29:40 +0100
Add processor-reentrant spin locking functions. These allow
restricting the number of possible contexts to 2, which can simplify
implementing code that also supports NMI interruptions.
prb_lock();
/*
* This code is synchronized with all contexts
* except an NMI on the same processor.
*/
prb_unlock();
In order to support printk's emergency messages, a
processor-reentrant spin lock will be used to control raw access to
the emergency console. However, it must be the same
processor-reentrant spin lock as the one used by the ring buffer,
otherwise a deadlock can occur:
CPU1: printk lock -> emergency -> serial lock
CPU2: serial lock -> printk lock
By making the processor-reentrant implemtation available externally,
printk can use the same atomic_t for the ring buffer as for the
emergency console and thus avoid the above deadlock.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
23/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk-rb: define ring buffer struct and initializer
Date: Tue, 12 Feb 2019 15:29:41 +0100
See Documentation/printk-ringbuffer.txt for details about the
initializer arguments.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
24/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk-rb: add writer interface
Date: Tue, 12 Feb 2019 15:29:42 +0100
Add the writer functions prb_reserve() and prb_commit(). These make
use of processor-reentrant spin locks to limit the number of possible
interruption scenarios for the writers.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
25/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk-rb: add basic non-blocking reading interface
Date: Tue, 12 Feb 2019 15:29:43 +0100
Add reader iterator static declaration/initializer, dynamic
initializer, and functions to iterate and retrieve ring buffer data.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
26/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk-rb: add blocking reader support
Date: Tue, 12 Feb 2019 15:29:44 +0100
Add a blocking read function for readers. An irq_work function is
used to signal the wait queue so that write notification can
be triggered from any context.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
27/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk-rb: add functionality required by printk
Date: Tue, 12 Feb 2019 15:29:45 +0100
The printk subsystem needs to be able to query the size of the ring
buffer, seek to specific entries within the ring buffer, and track
if records could not be stored in the ring buffer.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
28/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: add ring buffer and kthread
Date: Tue, 12 Feb 2019 15:29:46 +0100
The printk ring buffer provides an NMI-safe interface for writing
messages to a ring buffer. Using such a buffer for alleviates printk
callers from the current burdens of disabled preemption while calling
the console drivers (and possibly printing out many messages that
another task put into the log buffer).
Create a ring buffer to be used for storing messages to be
printed to the consoles.
Create a dedicated printk kthread to block on the ring buffer
and call the console drivers for the read messages.
NOTE: The printk_delay is relocated to _after_ the message is
printed, where it makes more sense.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
29/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: remove exclusive console hack
Date: Tue, 12 Feb 2019 15:29:47 +0100
In order to support printing the printk log history when new
consoles are registered, a global exclusive_console variable is
temporarily set. This only works because printk runs with
preemption disabled.
When console printing is moved to a fully preemptible dedicated
kthread, this hack no longer works.
Remove exclusive_console usage.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
30/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: redirect emit/store to new ringbuffer
Date: Tue, 12 Feb 2019 15:29:48 +0100
vprintk_emit and vprintk_store are the main functions that all printk
variants eventually go through. Change these to store the message in
the new printk ring buffer that the printk kthread is reading.
Remove functions no longer in use because of the changes to
vprintk_emit and vprintk_store.
In order to handle interrupts and NMIs, a second per-cpu ring buffer
(sprint_rb) is added. This ring buffer is used for NMI-safe memory
allocation in order to format the printk messages.
NOTE: LOG_CONT is ignored for now and handled as individual messages.
LOG_CONT functions are masked behind "#if 0" blocks until their
functionality can be restored
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
31/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk_safe: remove printk safe code
Date: Tue, 12 Feb 2019 15:29:49 +0100
vprintk variants are now NMI-safe so there is no longer a need for
the "safe" calls.
NOTE: This also removes printk flushing functionality.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
32/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: minimize console locking implementation
Date: Tue, 12 Feb 2019 15:29:50 +0100
Since printing of the printk buffer is now handled by the printk
kthread, minimize the console locking functions to just handle
locking of the console.
NOTE: With this console_flush_on_panic will no longer flush.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
33/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: track seq per console
Date: Tue, 12 Feb 2019 15:29:51 +0100
Allow each console to track which seq record was last printed. This
simplifies identifying dropped records.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
34/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: do boot_delay_msec inside printk_delay
Date: Tue, 12 Feb 2019 15:29:52 +0100
Both functions needed to be called one after the other, so just
integrate boot_delay_msec into printk_delay for simplification.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
35/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: print history for new consoles
Date: Tue, 12 Feb 2019 15:29:53 +0100
When new consoles register, they currently print how many messages
they have missed. However, many (or all) of those messages may still
be in the ring buffer. Add functionality to print as much of the
history as available. This is a clean replacement of the old
exclusive console hack.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
36/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: implement CON_PRINTBUFFER
Date: Tue, 12 Feb 2019 15:29:54 +0100
If the CON_PRINTBUFFER flag is not set, do not replay the history
for that console.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
37/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: add processor number to output
Date: Tue, 12 Feb 2019 15:29:55 +0100
It can be difficult to sort printk out if multiple processors are
printing simultaneously. Add the processor number to the printk
output to allow the messages to be sorted.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
38/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: console: add write_atomic interface
Date: Tue, 12 Feb 2019 15:29:56 +0100
Add a write_atomic callback to the console. This is an optional
function for console drivers. The function must be atomic (including
NMI safe) for writing to the console.
Console drivers must still implement the write callback. The
write_atomic callback will only be used for emergency messages.
Creating an NMI safe write_atomic that must synchronize with write
requires a careful implementation of the console driver. To aid with
the implementation, a set of console_atomic_* functions are provided:
void console_atomic_lock(unsigned int *flags);
void console_atomic_unlock(unsigned int flags);
These functions synchronize using the processor-reentrant cpu lock of
the printk buffer.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
39/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: introduce emergency messages
Date: Tue, 12 Feb 2019 15:29:57 +0100
Console messages are generally either critical or non-critical.
Critical messages are messages such as crashes or sysrq output.
Critical messages should never be lost because generally they provide
important debugging information.
Since all console messages are output via a fully preemptible printk
kernel thread, it is possible that messages are not output because
that thread cannot be scheduled (BUG in scheduler, run-away RT task,
etc).
To allow critical messages to be output independent of the
schedulability of the printk task, introduce an emergency mechanism
that _immediately_ outputs the message to the consoles. To avoid
possible unbounded latency issues, the emergency mechanism only
outputs the printk line provided by the caller and ignores any
pending messages in the log buffer.
Critical messages are identified as messages (by default) with log
level LOGLEVEL_WARNING or more critical. This is configurable via the
kernel option CONSOLE_LOGLEVEL_EMERGENCY.
Any messages output as emergency messages are skipped by the printk
thread on those consoles that output the emergency message.
In order for a console driver to support emergency messages, the
write_atomic function must be implemented by the driver. If not
implemented, the emergency messages are handled like all other
messages and are printed by the printk thread.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
40/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: serial: 8250: implement write_atomic
Date: Tue, 12 Feb 2019 15:29:58 +0100
Implement a non-sleeping NMI-safe write_atomic console function in
order to support emergency printk messages.
Since interrupts need to be disabled during transmit, all usage of
the IER register was wrapped with access functions that use the
console_atomic_lock function to synchronize register access while
tracking the state of the interrupts. This was necessary because
write_atomic is can be calling from an NMI context that has
preempted write_atomic.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
41/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: implement KERN_CONT
Date: Tue, 12 Feb 2019 15:29:59 +0100
Implement KERN_CONT based on the printing CPU rather than on the
printing task. As long as the KERN_CONT messages are coming from the
same CPU and no non-KERN_CONT messages come, the messages are assumed
to belong to each other.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
42/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: implement /dev/kmsg
Date: Tue, 12 Feb 2019 15:30:00 +0100
Since printk messages are now logged to a new ring buffer, update
the /dev/kmsg functions to pull the messages from there.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
43/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: implement syslog
Date: Tue, 12 Feb 2019 15:30:01 +0100
Since printk messages are now logged to a new ring buffer, update
the syslog functions to pull the messages from there.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
44/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: implement kmsg_dump
Date: Tue, 12 Feb 2019 15:30:02 +0100
Since printk messages are now logged to a new ring buffer, update
the kmsg_dump functions to pull the messages from there.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
45/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: remove unused code
Date: Tue, 12 Feb 2019 15:30:03 +0100
Code relating to the safe context and anything dealing with the
previous log buffer implementation is no longer in use. Remove it.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
46/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: set deferred to default loglevel, enforce mask
Date: Thu, 14 Feb 2019 23:13:30 +0100
All messages printed via vpritnk_deferred() were being
automatically treated as emergency messages.
Messages printed via vprintk_deferred() should be set to the
default loglevel. LOGLEVEL_SCHED is no longer relevant.
Also, enforce the loglevel mask for emergency messages.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
47/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: serial: 8250: remove that trylock in serial8250_console_write_atomic()
Date: Thu, 14 Feb 2019 17:38:24 +0100
This does not work as rtmutex in NMI context. As per John, it is not
needed.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
48/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: serial: 8250: export symbols which are used by symbols
Date: Sat, 16 Feb 2019 09:02:00 +0100
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
49/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: arm: remove printk_nmi_.*()
Date: Fri, 15 Feb 2019 14:34:20 +0100
It is no longer provided by the printk core code.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
50/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: only allow kernel to emergency message
Date: Sun, 17 Feb 2019 03:11:20 +0100
Emergency messages exist as a mechanism for the kernel to
communicate critical information to users. It is not meant for
use by userspace. Only allow facility=0 messages to be
processed by the emergency message code.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
51/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: devkmsg: llseek: reset clear if it is lost
Date: Fri, 22 Feb 2019 23:02:44 +0100
SEEK_DATA will seek to the last clear record. If this clear record
is no longer in the ring buffer, devkmsg_llseek() will go into an
infinite loop. Fix that by resetting the clear sequence if the old
clear record is no longer in the ring buffer.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
52/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: printk: print "rate-limitted" message as info
Date: Fri, 22 Feb 2019 12:47:13 +0100
If messages which are injected via kmsg are dropped then they don't need
to be printed as warnings. This is to avoid latency spikes if the
interface decides to print a lot of important messages.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
53/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: kmsg_dump: remove mutex usage
Date: Wed, 24 Apr 2019 16:36:04 +0200
The kmsg dumper can be called from any context, but the dumping
helpers were using a mutex to synchronize the iterator against
concurrent dumps.
Rather than trying to synchronize the iterator, use a local copy
of the iterator during the dump. Then no synchronization is
required.
Reported-by: Scott Wood <swood@redhat.com>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
54/252 [
Author: He Zhe
Email: zhe.he@windriver.com
Subject: printk: devkmsg: read: Return EPIPE when the first message user-space wants has gone
Date: Tue, 24 Sep 2019 15:26:39 +0800
When user-space wants to read the first message, that is when user->seq
is 0, and that message has gone, it currently automatically resets
user->seq to current first seq. This mis-aligns with mainline kernel.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/ABI/testing/dev-kmsg#n39
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/printk/printk.c#n899
We should inform user-space that what it wants has gone by returning EPIPE
in such scenario.
Link: https://lore.kernel.org/r/20190924072639.25986-1-zhe.he@windriver.com
Signed-off-by: He Zhe <zhe.he@windriver.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
55/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: handle iterating while buffer changing
Date: Mon, 7 Oct 2019 16:20:39 +0200
The syslog and kmsg_dump readers are provided buffers to fill.
Both try to maximize the provided buffer usage by calculating the
maximum number of messages that can fit. However, if after the
calculation, messages are dropped and new messages added, the
calculation will no longer match.
For syslog, add a check to make sure the provided buffer is not
overfilled.
For kmsg_dump, start over by recalculating the messages
available.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
56/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: hack out emergency loglevel usage
Date: Tue, 3 Dec 2019 09:14:57 +0100
Instead of using an emergency loglevel to determine if atomic
messages should be printed, use oops_in_progress. This conforms
to the decision that latency-causing atomic messages never be
generated during normal operation.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
57/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: fs/buffer: Make BH_Uptodate_Lock bit_spin_lock a regular spinlock_t
Date: Fri, 15 Nov 2019 18:54:20 +0100
Bit spinlocks are problematic if PREEMPT_RT is enabled, because they
disable preemption, which is undesired for latency reasons and breaks when
regular spinlocks are taken within the bit_spinlock locked region because
regular spinlocks are converted to 'sleeping spinlocks' on RT. So RT
replaces the bit spinlocks with regular spinlocks to avoid this problem.
Bit spinlocks are also not covered by lock debugging, e.g. lockdep.
Substitute the BH_Uptodate_Lock bit spinlock with a regular spinlock.
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy: remove the wrapper and use always spinlock_t and move it into
the padding hole]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
58/252 [
Author: Clark Williams
Email: williams@redhat.com
Subject: thermal/x86_pkg_temp: Make pkg_temp_lock a raw_spinlock_t
Date: Mon, 15 Jul 2019 15:25:00 -0500
The spinlock pkg_temp_lock has the potential of being taken in atomic
context because it can be acquired from the thermal IRQ vector.
It's static and limited scope so go ahead and make it a raw spinlock.
Signed-off-by: Clark Williams <williams@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
59/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: perf/core: Add SRCU annotation for pmus list walk
Date: Fri, 15 Nov 2019 18:04:07 +0100
Since commit
28875945ba98d ("rcu: Add support for consolidated-RCU reader checking")
there is an additional check to ensure that a RCU related lock is held
while the RCU list is iterated.
This section holds the SRCU reader lock instead.
Add annotation to list_for_each_entry_rcu() that pmus_srcu must be
acquired during the list traversal.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
60/252 [
Author: He Zhe
Email: zhe.he@windriver.com
Subject: kmemleak: Turn kmemleak_lock and object->lock to raw_spinlock_t
Date: Wed, 19 Dec 2018 16:30:57 +0100
kmemleak_lock as a rwlock on RT can possibly be acquired in atomic context
which does work on RT.
Since the kmemleak operation is performed in atomic context make it a
raw_spinlock_t so it can also be acquired on RT. This is used for
debugging and is not enabled by default in a production like environment
(where performance/latency matters) so it makes sense to make it a
raw_spinlock_t instead trying to get rid of the atomic context.
Turn also the kmemleak_object->lock into raw_spinlock_t which is
acquired (nested) while the kmemleak_lock is held.
The time spent in "echo scan > kmemleak" slightly improved on 64core box
with this patch applied after boot.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lkml.kernel.org/r/20181218150744.GB20197@arrakis.emea.arm.com
Link: https://lkml.kernel.org/r/1542877459-144382-1-git-send-email-zhe.he@windriver.com
Link: https://lkml.kernel.org/r/20190927082230.34152-1-yongxin.liu@windriver.com
Signed-off-by: He Zhe <zhe.he@windriver.com>
Signed-off-by: Liu Haitao <haitao.liu@windriver.com>
Signed-off-by: Yongxin Liu <yongxin.liu@windriver.com>
[bigeasy: Redo the description. Merge the individual bits: He Zhe did
the kmemleak_lock, Liu Haitao the ->lock and Yongxin Liu forwarded the
patch.]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
61/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: Use CONFIG_PREEMPTION
Date: Fri, 26 Jul 2019 11:30:49 +0200
Thisi is an all-in-one patch of the current `PREEMPTION' branch.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
62/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: BPF: Disable on PREEMPT_RT
Date: Thu, 10 Oct 2019 16:54:45 +0200
Disable BPF on PREEMPT_RT because
- it allocates and frees memory in atomic context
- it uses up_read_non_owner()
- BPF_PROG_RUN() expects to be invoked in non-preemptible context
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
63/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: workqueue: Don't assume that the callback has interrupts disabled
Date: Tue, 11 Jun 2019 11:21:02 +0200
Due to the TIMER_IRQSAFE flag, the timer callback is invoked with
disabled interrupts. On -RT the callback is invoked in softirq context
with enabled interrupts. Since the interrupts are threaded, there are
are no in_irq() users. The local_bh_disable() around the threaded
handler ensures that there is either a timer or a threaded handler
active on the CPU.
Disable interrupts before __queue_work() is invoked from the timer
callback.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
64/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: sched/swait: Add swait_event_lock_irq()
Date: Wed, 22 May 2019 12:42:26 +0200
The swait_event_lock_irq() is inspired by wait_event_lock_irq(). This is
required by the workqueue code once it switches to swait.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
65/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: workqueue: Use swait for wq_manager_wait
Date: Tue, 11 Jun 2019 11:21:09 +0200
In order for the workqueue code use raw_spinlock_t typed locking there
must not be a spinlock_t typed lock be acquired. A wait_queue_head uses
a spinlock_t lock for its list protection.
Use a swait based queue head to avoid raw_spinlock_t -> spinlock_t
locking.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
66/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: workqueue: Convert the locks to raw type
Date: Wed, 22 May 2019 12:43:56 +0200
After all the workqueue and the timer rework, we can finally make the
worker_pool lock raw.
The lock is not held over an unbounded period of time/iterations.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
67/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: mm/compaction: Disable compact_unevictable_allowed on RT
Date: Fri, 8 Nov 2019 12:55:47 +0100
Since commit
5bbe3547aa3ba ("mm: allow compaction of unevictable pages")
it is allowed to examine mlocked pages for pages to compact by default.
On -RT even minor pagefaults are problematic because it may take a few
100us to resolve them and until then the task is blocked.
Make compact_unevictable_allowed = 0 default and remove it from /proc on
RT.
Link: https://lore.kernel.org/linux-mm/20190710144138.qyn4tuttdq6h7kqx@linutronix.de/
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
68/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: cgroup: Remove ->css_rstat_flush()
Date: Thu, 15 Aug 2019 18:14:16 +0200
I was looking at the lifetime of the the ->css_rstat_flush() to see if
cgroup_rstat_cpu_lock should remain a raw_spinlock_t. I didn't find any
users and is unused since it was introduced in commit
8f53470bab042 ("cgroup: Add cgroup_subsys->css_rstat_flush()")
Remove the css_rstat_flush callback because it has no users.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
69/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: cgroup: Consolidate users of cgroup_rstat_lock.
Date: Fri, 16 Aug 2019 12:20:42 +0200
cgroup_rstat_flush_irqsafe() has no users, remove it.
cgroup_rstat_flush_hold() and cgroup_rstat_flush_release() are only used within
this file. Make it static.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
70/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: cgroup: Remove `may_sleep' from cgroup_rstat_flush_locked()
Date: Fri, 16 Aug 2019 12:25:35 +0200
cgroup_rstat_flush_locked() is always invoked with `may_sleep' set to
true so that this case can be made default and the parameter removed.
Remove the `may_sleep' parameter.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
71/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: cgroup: Acquire cgroup_rstat_lock with enabled interrupts
Date: Fri, 16 Aug 2019 12:49:36 +0200
There is no need to disable interrupts while cgroup_rstat_lock is
acquired. The lock is never used in-IRQ context so a simple spin_lock()
is enough for synchronisation purpose.
Acquire cgroup_rstat_lock without disabling interrupts and ensure that
cgroup_rstat_cpu_lock is acquired with disabled interrupts (this one is
acquired in-IRQ context).
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
72/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: mm: workingset: replace IRQ-off check with a lockdep assert.
Date: Mon, 11 Feb 2019 10:40:46 +0100
Commit
68d48e6a2df57 ("mm: workingset: add vmstat counter for shadow nodes")
introduced an IRQ-off check to ensure that a lock is held which also
disabled interrupts. This does not work the same way on -RT because none
of the locks, that are held, disable interrupts.
Replace this check with a lockdep assert which ensures that the lock is
held.
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
73/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: tpm: remove tpm_dev_wq_lock
Date: Mon, 11 Feb 2019 11:33:11 +0100
Added in commit
9e1b74a63f776 ("tpm: add support for nonblocking operation")
but never actually used it.
Cc: Philip Tricca <philip.b.tricca@intel.com>
Cc: Tadeusz Struk <tadeusz.struk@intel.com>
Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
74/252 [
Author: Rob Herring
Email: robh@kernel.org
Subject: of: Rework and simplify phandle cache to use a fixed size
Date: Wed, 11 Dec 2019 17:23:45 -0600
The phandle cache was added to speed up of_find_node_by_phandle() by
avoiding walking the whole DT to find a matching phandle. The
implementation has several shortcomings:
- The cache is designed to work on a linear set of phandle values.
This is true for dtc generated DTs, but not for other cases such as
Power.
- The cache isn't enabled until of_core_init() and a typical system
may see hundreds of calls to of_find_node_by_phandle() before that
point.
- The cache is freed and re-allocated when the number of phandles
changes.
- It takes a raw spinlock around a memory allocation which breaks on
RT.
Change the implementation to a fixed size and use hash_32() as the
cache index. This greatly simplifies the implementation. It avoids
the need for any re-alloc of the cache and taking a reference on nodes
in the cache. We only have a single source of removing cache entries
which is of_detach_node().
Using hash_32() removes any assumption on phandle values improving
the hit rate for non-linear phandle values. The effect on linear values
using hash_32() is about a 10% collision. The chances of thrashing on
colliding values seems to be low.
To compare performance, I used a RK3399 board which is a pretty typical
system. I found that just measuring boot time as done previously is
noisy and may be impacted by other things. Also bringing up secondary
cores causes some issues with measuring, so I booted with 'nr_cpus=1'.
With no caching, calls to of_find_node_by_phandle() take about 20124 us
for 1248 calls. There's an additional 288 calls before time keeping is
up. Using the average time per hit/miss with the cache, we can calculate
these calls to take 690 us (277 hit / 11 miss) with a 128 entry cache
and 13319 us with no cache or an uninitialized cache.
Comparing the 3 implementations the time spent in
of_find_node_by_phandle() is:
no cache: 20124 us (+ 13319 us)
128 entry cache: 5134 us (+ 690 us)
current cache: 819 us (+ 13319 us)
We could move the allocation of the cache earlier to improve the
current cache, but that just further complicates the situation as it
needs to be after slab is up, so we can't do it when unflattening (which
uses memblock).
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Cc: Frank Rowand <frowand.list@gmail.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lkml.kernel.org/r/20191211232345.24810-1-robh@kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
75/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: timekeeping: Split jiffies seqlock
Date: Thu, 14 Feb 2013 22:36:59 +0100
Replace jiffies_lock seqlock with a simple seqcounter and a rawlock so
it can be taken in atomic context on RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
76/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: signal: Revert ptrace preempt magic
Date: Wed, 21 Sep 2011 19:57:12 +0200
Upstream commit '53da1d9456fe7f8 fix ptrace slowness' is nothing more
than a bandaid around the ptrace design trainwreck. It's not a
correctness issue, it's merily a cosmetic bandaid.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
77/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: dma-buf: Use seqlock_t instread disabling preemption
Date: Wed, 14 Aug 2019 16:38:43 +0200
"dma reservation" disables preemption while acquiring the write access
for "seqcount".
Replace the seqcount with a seqlock_t which provides seqcount like
semantic and lock for writer.
Link: https://lkml.kernel.org/r/f410b429-db86-f81c-7c67-f563fa808b62@free.fr
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
78/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: seqlock: Prevent rt starvation
Date: Wed, 22 Feb 2012 12:03:30 +0100
If a low prio writer gets preempted while holding the seqlock write
locked, a high prio reader spins forever on RT.
To prevent this let the reader grab the spinlock, so it blocks and
eventually boosts the writer. This way the writer can proceed and
endless spinning is prevented.
For seqcount writers we disable preemption over the update code
path. Thanks to Al Viro for distangling some VFS code to make that
possible.
Nicholas Mc Guire:
- spin_lock+unlock => spin_unlock_wait
- __write_seqcount_begin => __raw_write_seqcount_begin
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
79/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: NFSv4: replace seqcount_t with a seqlock_t
Date: Fri, 28 Oct 2016 23:05:11 +0200
The raw_write_seqcount_begin() in nfs4_reclaim_open_state() causes a
preempt_disable() on -RT. The spin_lock()/spin_unlock() in that section does
not work.
The lockdep part was removed in commit
abbec2da13f0 ("NFS: Use raw_write_seqcount_begin/end int nfs4_reclaim_open_state")
because lockdep complained.
The whole seqcount thing was introduced in commit
c137afabe330 ("NFSv4: Allow the state manager to mark an open_owner as being recovered").
The recovery threads runs only once.
write_seqlock() does not work on !RT because it disables preemption and it the
writer side is preemptible (has to remain so despite the fact that it will
block readers).
Reported-by: kernel test robot <xiaolong.ye@intel.com>
Link: https://lkml.kernel.org/r/20161021164727.24485-1-bigeasy@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
80/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net/Qdisc: use a seqlock instead seqcount
Date: Wed, 14 Sep 2016 17:36:35 +0200
The seqcount disables preemption on -RT while it is held which can't
remove. Also we don't want the reader to spin for ages if the writer is
scheduled out. The seqlock on the other hand will serialize / sleep on
the lock while writer is active.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
81/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net: Add a mutex around devnet_rename_seq
Date: Wed, 20 Mar 2013 18:06:20 +0100
On RT write_seqcount_begin() disables preemption and device_rename()
allocates memory with GFP_KERNEL and grabs later the sysfs_mutex
mutex. Serialize with a mutex and add use the non preemption disabling
__write_seqcount_begin().
To avoid writer starvation, let the reader grab the mutex and release
it when it detects a writer in progress. This keeps the normal case
(no reader on the fly) fast.
[ tglx: Instead of replacing the seqcount by a mutex, add the mutex ]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
82/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: userfaultfd: Use a seqlock instead of seqcount
Date: Wed, 18 Dec 2019 12:25:09 +0100
On RT write_seqcount_begin() disables preemption which leads to warning
in add_wait_queue() while the spinlock_t is acquired.
The waitqueue can't be converted to swait_queue because
userfaultfd_wake_function() is used as a custom wake function.
Use seqlock instead seqcount to avoid the preempt_disable() section
during add_wait_queue().
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
83/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: fs/nfs: turn rmdir_sem into a semaphore
Date: Thu, 15 Sep 2016 10:51:27 +0200
The RW semaphore had a reader side which used the _non_owner version
because it most likely took the reader lock in one thread and released it
in another which would cause lockdep to complain if the "regular"
version was used.
On -RT we need the owner because the rw lock is turned into a rtmutex.
The semaphores on the hand are "plain simple" and should work as
expected. We can't have multiple readers but on -RT we don't allow
multiple readers anyway so that is not a loss.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
84/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: fs/dcache: disable preemption on i_dir_seq's write side
Date: Fri, 20 Oct 2017 11:29:53 +0200
i_dir_seq is an opencoded seqcounter. Based on the code it looks like we
could have two writers in parallel despite the fact that the d_lock is
held. The problem is that during the write process on RT the preemption
is still enabled and if this process is interrupted by a reader with RT
priority then we lock up.
To avoid that lock up I am disabling the preemption during the update.
The rename of i_dir_seq is here to ensure to catch new write sides in
future.
Cc: stable-rt@vger.kernel.org
Reported-by: Oleg.Karfich@wago.com
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
85/252 [
Author: Paul Gortmaker
Email: paul.gortmaker@windriver.com
Subject: list_bl: Make list head locking RT safe
Date: Fri, 21 Jun 2013 15:07:25 -0400
As per changes in include/linux/jbd_common.h for avoiding the
bit_spin_locks on RT ("fs: jbd/jbd2: Make state lock and journal
head lock rt safe") we do the same thing here.
We use the non atomic __set_bit and __clear_bit inside the scope of
the lock to preserve the ability of the existing LIST_DEBUG code to
use the zero'th bit in the sanity checks.
As a bit spinlock, we had no lockdep visibility into the usage
of the list head locking. Now, if we were to implement it as a
standard non-raw spinlock, we would see:
BUG: sleeping function called from invalid context at kernel/rtmutex.c:658
in_atomic(): 1, irqs_disabled(): 0, pid: 122, name: udevd
5 locks held by udevd/122:
#0: (&sb->s_type->i_mutex_key#7/1){+.+.+.}, at: [<ffffffff811967e8>] lock_rename+0xe8/0xf0
#1: (rename_lock){+.+...}, at: [<ffffffff811a277c>] d_move+0x2c/0x60
#2: (&dentry->d_lock){+.+...}, at: [<ffffffff811a0763>] dentry_lock_for_move+0xf3/0x130
#3: (&dentry->d_lock/2){+.+...}, at: [<ffffffff811a0734>] dentry_lock_for_move+0xc4/0x130
#4: (&dentry->d_lock/3){+.+...}, at: [<ffffffff811a0747>] dentry_lock_for_move+0xd7/0x130
Pid: 122, comm: udevd Not tainted 3.4.47-rt62 #7
Call Trace:
[<ffffffff810b9624>] __might_sleep+0x134/0x1f0
[<ffffffff817a24d4>] rt_spin_lock+0x24/0x60
[<ffffffff811a0c4c>] __d_shrink+0x5c/0xa0
[<ffffffff811a1b2d>] __d_drop+0x1d/0x40
[<ffffffff811a24be>] __d_move+0x8e/0x320
[<ffffffff811a278e>] d_move+0x3e/0x60
[<ffffffff81199598>] vfs_rename+0x198/0x4c0
[<ffffffff8119b093>] sys_renameat+0x213/0x240
[<ffffffff817a2de5>] ? _raw_spin_unlock+0x35/0x60
[<ffffffff8107781c>] ? do_page_fault+0x1ec/0x4b0
[<ffffffff817a32ca>] ? retint_swapgs+0xe/0x13
[<ffffffff813eb0e6>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[<ffffffff8119b0db>] sys_rename+0x1b/0x20
[<ffffffff817a3b96>] system_call_fastpath+0x1a/0x1f
Since we are only taking the lock during short lived list operations,
lets assume for now that it being raw won't be a significant latency
concern.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
[julia@ni.com: Use #define instead static inline to avoid false positive from
lockdep]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
86/252 [
Author: Clark Williams
Email: williams@redhat.com
Subject: fscache: initialize cookie hash table raw spinlocks
Date: Tue, 3 Jul 2018 13:34:30 -0500
The fscache cookie mechanism uses a hash table of hlist_bl_head structures. The
PREEMPT_RT patcheset adds a raw spinlock to this structure and so on PREEMPT_RT
the structures get used uninitialized, causing warnings about bad magic numbers
when spinlock debugging is turned on.
Use the init function for fscache cookies.
Signed-off-by: Clark Williams <williams@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
87/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: fs/dcache: bring back explicit INIT_HLIST_BL_HEAD init
Date: Wed, 13 Sep 2017 12:32:34 +0200
Commit 3d375d78593c ("mm: update callers to use HASH_ZERO flag") removed
INIT_HLIST_BL_HEAD and uses the ZERO flag instead for the init. However
on RT we have also a spinlock which needs an init call so we can't use
that.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
88/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: fs/dcache: use swait_queue instead of waitqueue
Date: Wed, 14 Sep 2016 14:35:49 +0200
__d_lookup_done() invokes wake_up_all() while holding a hlist_bl_lock()
which disables preemption. As a workaround convert it to swait.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
89/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: kconfig: Disable config options which are not RT compatible
Date: Sun, 24 Jul 2011 12:11:43 +0200
Disable stuff which is known to have issues on RT
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
90/252 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: mm: Allow only SLUB on RT
Date: Fri, 3 Jul 2009 08:44:03 -0500
Memory allocation disables interrupts as part of the allocation and freeing
process. For -RT it is important that this section remain short and don't
depend on the size of the request or an internal state of the memory allocator.
At the beginning the SLAB memory allocator was adopted for RT's needs and it
required substantial changes. Later, with the addition of the SLUB memory
allocator we adopted this one as well and the changes were smaller. More
important, due to the design of the SLUB allocator it performs better and its
worst case latency was smaller. In the end only SLUB remained supported.
Disable SLAB and SLOB on -RT. Only SLUB is adopted to -RT needs.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
91/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: rcu: make RCU_BOOST default on RT
Date: Fri, 21 Mar 2014 20:19:05 +0100
Since it is no longer invoked from the softirq people run into OOM more
often if the priority of the RCU thread is too low. Making boosting
default on RT should help in those case and it can be switched off if
someone knows better.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
92/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Disable CONFIG_RT_GROUP_SCHED on RT
Date: Mon, 18 Jul 2011 17:03:52 +0200
Carsten reported problems when running:
taskset 01 chrt -f 1 sleep 1
from within rc.local on a F15 machine. The task stays running and
never gets on the run queue because some of the run queues have
rt_throttled=1 which does not go away. Works nice from a ssh login
shell. Disabling CONFIG_RT_GROUP_SCHED solves that as well.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
93/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net/core: disable NET_RX_BUSY_POLL on RT
Date: Sat, 27 May 2017 19:02:06 +0200
napi_busy_loop() disables preemption and performs a NAPI poll. We can't acquire
sleeping locks with disabled preemption so we would have to work around this
and add explicit locking for synchronisation against ksoftirqd.
Without explicit synchronisation a low priority process would "own" the NAPI
state (by setting NAPIF_STATE_SCHED) and could be scheduled out (no
preempt_disable() and BH is preemptible on RT).
In case a network packages arrives then the interrupt handler would set
NAPIF_STATE_MISSED and the system would wait until the task owning the NAPI
would be scheduled in again.
Should a task with RT priority busy poll then it would consume the CPU instead
allowing tasks with lower priority to run.
The NET_RX_BUSY_POLL is disabled by default (the system wide sysctls for
poll/read are set to zero) so disable NET_RX_BUSY_POLL on RT to avoid wrong
locking context on RT. Should this feature be considered useful on RT systems
then it could be enabled again with proper locking and synchronisation.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
94/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: md: disable bcache
Date: Thu, 29 Aug 2013 11:48:57 +0200
It uses anon semaphores
|drivers/md/bcache/request.c: In function ‘cached_dev_write_complete’:
|drivers/md/bcache/request.c:1007:2: error: implicit declaration of function ‘up_read_non_owner’ [-Werror=implicit-function-declaration]
| up_read_non_owner(&dc->writeback_lock);
| ^
|drivers/md/bcache/request.c: In function ‘request_write’:
|drivers/md/bcache/request.c:1033:2: error: implicit declaration of function ‘down_read_non_owner’ [-Werror=implicit-function-declaration]
| down_read_non_owner(&dc->writeback_lock);
| ^
either we get rid of those or we have to introduce them…
Link: http://lkml.kernel.org/r/20130820111602.3cea203c@gandalf.local.home
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
95/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: efi: Disable runtime services on RT
Date: Thu, 26 Jul 2018 15:03:16 +0200
Based on meassurements the EFI functions get_variable /
get_next_variable take up to 2us which looks okay.
The functions get_time, set_time take around 10ms. Those 10ms are too
much. Even one ms would be too much.
Ard mentioned that SetVariable might even trigger larger latencies if
the firware will erase flash blocks on NOR.
The time-functions are used by efi-rtc and can be triggered during
runtimed (either via explicit read/write or ntp sync).
The variable write could be used by pstore.
These functions can be disabled without much of a loss. The poweroff /
reboot hooks may be provided by PSCI.
Disable EFI's runtime wrappers.
This was observed on "EFI v2.60 by SoftIron Overdrive 1000".
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
96/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: efi: Allow efi=runtime
Date: Thu, 26 Jul 2018 15:06:10 +0200
In case the command line option "efi=noruntime" is default at built-time, the user
could overwrite its state by `efi=runtime' and allow it again.
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
97/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: x86: Disable HAVE_ARCH_JUMP_LABEL
Date: Mon, 1 Jul 2019 17:39:28 +0200
__text_poke() does:
| local_irq_save(flags);
…
| ptep = get_locked_pte(poking_mm, poking_addr, &ptl);
which does not work on -RT because the PTE-lock is a spinlock_t typed lock.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
98/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rt: Add local irq locks
Date: Mon, 20 Jun 2011 09:03:47 +0200
Introduce locallock. For !RT this maps to preempt_disable()/
local_irq_disable() so there is not much that changes. For RT this will
map to a spinlock. This makes preemption possible and locked "ressource"
gets the lockdep anotation it wouldn't have otherwise. The locks are
recursive for owner == current. Also, all locks user migrate_disable()
which ensures that the task is not migrated to another CPU while the lock
is held and the owner is preempted.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
99/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: softirq: Add preemptible softirq
Date: Mon, 20 May 2019 13:09:08 +0200
Add preemptible softirq for RT's needs. By removing the softirq count
from the preempt counter, the softirq becomes preemptible. A per-CPU
lock ensures that there is no parallel softirq processing or that
per-CPU variables are not access in parallel by multiple threads.
local_bh_enable() will process all softirq work that has been raised in
its BH-disabled section once the BH counter gets to 0.
[+ rcu_read_lock() as part of local_bh_disable() by Scott Wood]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
100/252 [
Author: Oleg Nesterov
Email: oleg@redhat.com
Subject: signal/x86: Delay calling signals in atomic
Date: Tue, 14 Jul 2015 14:26:34 +0200
On x86_64 we must disable preemption before we enable interrupts
for stack faults, int3 and debugging, because the current task is using
a per CPU debug stack defined by the IST. If we schedule out, another task
can come in and use the same stack and cause the stack to be corrupted
and crash the kernel on return.
When CONFIG_PREEMPT_RT is enabled, spin_locks become mutexes, and
one of these is the spin lock used in signal handling.
Some of the debug code (int3) causes do_trap() to send a signal.
This function calls a spin lock that has been converted to a mutex
and has the possibility to sleep. If this happens, the above issues with
the corrupted stack is possible.
Instead of calling the signal right away, for PREEMPT_RT and x86_64,
the signal information is stored on the stacks task_struct and
TIF_NOTIFY_RESUME is set. Then on exit of the trap, the signal resume
code will send the signal when preemption is enabled.
[ rostedt: Switched from #ifdef CONFIG_PREEMPT_RT to
ARCH_RT_DELAYS_SIGNAL_SEND and added comments to the code. ]
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy: also needed on 32bit as per Yang Shi <yang.shi@linaro.org>]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
101/252 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: Split IRQ-off and zone->lock while freeing pages from PCP list #2
Date: Mon, 28 May 2018 15:24:21 +0200
Split the IRQ-off section while accessing the PCP list from zone->lock
while freeing pages.
Introcude isolate_pcp_pages() which separates the pages from the PCP
list onto a temporary list and then free the temporary list via
free_pcppages_bulk().
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
102/252 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: Split IRQ-off and zone->lock while freeing pages from PCP list #2
Date: Mon, 28 May 2018 15:24:21 +0200
Split the IRQ-off section while accessing the PCP list from zone->lock
while freeing pages.
Introcude isolate_pcp_pages() which separates the pages from the PCP
list onto a temporary list and then free the temporary list via
free_pcppages_bulk().
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
103/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mm/SLxB: change list_lock to raw_spinlock_t
Date: Mon, 28 May 2018 15:24:22 +0200
The list_lock is used with used with IRQs off on RT. Make it a raw_spinlock_t
otherwise the interrupts won't be disabled on -RT. The locking rules remain
the same on !RT.
This patch changes it for SLAB and SLUB since both share the same header
file for struct kmem_cache_node defintion.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
104/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mm/SLUB: delay giving back empty slubs to IRQ enabled regions
Date: Thu, 21 Jun 2018 17:29:19 +0200
__free_slab() is invoked with disabled interrupts which increases the
irq-off time while __free_pages() is doing the work.
Allow __free_slab() to be invoked with enabled interrupts and move
everything from interrupts-off invocations to a temporary per-CPU list
so it can be processed later.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
105/252 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: mm: page_alloc: rt-friendly per-cpu pages
Date: Fri, 3 Jul 2009 08:29:37 -0500
rt-friendly per-cpu pages: convert the irqs-off per-cpu locking
method into a preemptible, explicit-per-cpu-locks method.
Contains fixes from:
Peter Zijlstra <a.p.zijlstra@chello.nl>
Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
106/252 [
Author: Anna-Maria Gleixner
Email: anna-maria@linutronix.de
Subject: mm/page_alloc: Split drain_local_pages()
Date: Thu, 18 Apr 2019 11:09:04 +0200
Splitting the functionality of drain_local_pages() into a separate
function. This is a preparatory work for introducing the static key
dependend locking mechanism.
No functional change.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
107/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mm/swap: Add static key dependent pagevec locking
Date: Thu, 18 Apr 2019 11:09:05 +0200
The locking of struct pagevec is done by disabling preemption. In case the
struct has be accessed form interrupt context then interrupts are
disabled. This means the struct can only be accessed locally from the
CPU. There is also no lockdep coverage which would scream during if it
accessed from wrong context.
Create struct swap_pagevec which contains of a pagevec member and a
spin_lock_t. Introduce a static key, which changes the locking behavior
only if the key is set in the following way: Before the struct is accessed
the spin_lock has to be acquired instead of using preempt_disable(). Since
the struct is used CPU-locally there is no spinning on the lock but the
lock is acquired immediately. If the struct is accessed from interrupt
context, spin_lock_irqsave() is used.
No functional change yet because static key is not enabled.
[anna-maria: introduce static key]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
108/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mm/swap: Access struct pagevec remotely
Date: Thu, 18 Apr 2019 11:09:06 +0200
When the newly introduced static key would be enabled, struct pagevec is
locked during access. So it is possible to access it from a remote CPU. The
advantage is that the work can be done from the "requesting" CPU without
firing a worker on a remote CPU and waiting for it to complete the work.
No functional change because static key is not enabled.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
109/252 [
Author: Anna-Maria Gleixner
Email: anna-maria@linutronix.de
Subject: mm/swap: Enable "use_pvec_lock" nohz_full dependent
Date: Thu, 18 Apr 2019 11:09:07 +0200
When a system runs with CONFIG_NO_HZ_FULL enabled, the tick of CPUs listed
in 'nohz_full=' kernel command line parameter should be stopped whenever
possible. The tick stays longer stopped, when work for this CPU is handled
by another CPU.
With the already introduced static key 'use_pvec_lock' there is the
possibility to prevent firing a worker for mm/swap work on a remote CPU
with a stopped tick.
Therefore enabling the static key in case kernel command line parameter
'nohz_full=' setup was successful, which implies that CONFIG_NO_HZ_FULL is
set.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
110/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: mm/swap: Enable use pvec lock on RT
Date: Mon, 12 Aug 2019 11:20:44 +0200
On RT we also need to avoid preempt disable/IRQ-off regions so have to enable
the locking while accessing pvecs.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
111/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: preempt: Provide preempt_*_(no)rt variants
Date: Fri, 24 Jul 2009 12:38:56 +0200
RT needs a few preempt_disable/enable points which are not necessary
otherwise. Implement variants to avoid #ifdeffery.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
112/252 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: mm/vmstat: Protect per cpu variables with preempt disable on RT
Date: Fri, 3 Jul 2009 08:30:13 -0500
Disable preemption on -RT for the vmstat code. On vanila the code runs in
IRQ-off regions while on -RT it is not. "preempt_disable" ensures that the
same ressources is not updated in parallel due to preemption.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
113/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mm: Enable SLUB for RT
Date: Thu, 25 Oct 2012 10:32:35 +0100
Avoid the memory allocation in IRQ section
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy: factor out everything except the kcalloc() workaorund ]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
114/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: slub: Enable irqs for __GFP_WAIT
Date: Wed, 9 Jan 2013 12:08:15 +0100
SYSTEM_RUNNING might be too late for enabling interrupts. Allocations
with GFP_WAIT can happen before that. So use this as an indicator.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
115/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: slub: Disable SLUB_CPU_PARTIAL
Date: Wed, 15 Apr 2015 19:00:47 +0200
|BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:915
|in_atomic(): 1, irqs_disabled(): 0, pid: 87, name: rcuop/7
|1 lock held by rcuop/7/87:
| #0: (rcu_callback){......}, at: [<ffffffff8112c76a>] rcu_nocb_kthread+0x1ca/0x5d0
|Preemption disabled at:[<ffffffff811eebd9>] put_cpu_partial+0x29/0x220
|
|CPU: 0 PID: 87 Comm: rcuop/7 Tainted: G W 4.0.0-rt0+ #477
|Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
| 000000000007a9fc ffff88013987baf8 ffffffff817441c7 0000000000000007
| 0000000000000000 ffff88013987bb18 ffffffff810eee51 0000000000000000
| ffff88013fc10200 ffff88013987bb48 ffffffff8174a1c4 000000000007a9fc
|Call Trace:
| [<ffffffff817441c7>] dump_stack+0x4f/0x90
| [<ffffffff810eee51>] ___might_sleep+0x121/0x1b0
| [<ffffffff8174a1c4>] rt_spin_lock+0x24/0x60
| [<ffffffff811a689a>] __free_pages_ok+0xaa/0x540
| [<ffffffff811a729d>] __free_pages+0x1d/0x30
| [<ffffffff811eddd5>] __free_slab+0xc5/0x1e0
| [<ffffffff811edf46>] free_delayed+0x56/0x70
| [<ffffffff811eecfd>] put_cpu_partial+0x14d/0x220
| [<ffffffff811efc98>] __slab_free+0x158/0x2c0
| [<ffffffff811f0021>] kmem_cache_free+0x221/0x2d0
| [<ffffffff81204d0c>] file_free_rcu+0x2c/0x40
| [<ffffffff8112c7e3>] rcu_nocb_kthread+0x243/0x5d0
| [<ffffffff810e951c>] kthread+0xfc/0x120
| [<ffffffff8174abc8>] ret_from_fork+0x58/0x90
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
116/252 [
Author: Yang Shi
Email: yang.shi@windriver.com
Subject: mm/memcontrol: Don't call schedule_work_on in preemption disabled context
Date: Wed, 30 Oct 2013 11:48:33 -0700
The following trace is triggered when running ltp oom test cases:
BUG: sleeping function called from invalid context at kernel/rtmutex.c:659
in_atomic(): 1, irqs_disabled(): 0, pid: 17188, name: oom03
Preemption disabled at:[<ffffffff8112ba70>] mem_cgroup_reclaim+0x90/0xe0
CPU: 2 PID: 17188 Comm: oom03 Not tainted 3.10.10-rt3 #2
Hardware name: Intel Corporation Calpella platform/MATXM-CORE-411-B, BIOS 4.6.3 08/18/2010
ffff88007684d730 ffff880070df9b58 ffffffff8169918d ffff880070df9b70
ffffffff8106db31 ffff88007688b4a0 ffff880070df9b88 ffffffff8169d9c0
ffff88007688b4a0 ffff880070df9bc8 ffffffff81059da1 0000000170df9bb0
Call Trace:
[<ffffffff8169918d>] dump_stack+0x19/0x1b
[<ffffffff8106db31>] __might_sleep+0xf1/0x170
[<ffffffff8169d9c0>] rt_spin_lock+0x20/0x50
[<ffffffff81059da1>] queue_work_on+0x61/0x100
[<ffffffff8112b361>] drain_all_stock+0xe1/0x1c0
[<ffffffff8112ba70>] mem_cgroup_reclaim+0x90/0xe0
[<ffffffff8112beda>] __mem_cgroup_try_charge+0x41a/0xc40
[<ffffffff810f1c91>] ? release_pages+0x1b1/0x1f0
[<ffffffff8106f200>] ? sched_exec+0x40/0xb0
[<ffffffff8112cc87>] mem_cgroup_charge_common+0x37/0x70
[<ffffffff8112e2c6>] mem_cgroup_newpage_charge+0x26/0x30
[<ffffffff8110af68>] handle_pte_fault+0x618/0x840
[<ffffffff8103ecf6>] ? unpin_current_cpu+0x16/0x70
[<ffffffff81070f94>] ? migrate_enable+0xd4/0x200
[<ffffffff8110cde5>] handle_mm_fault+0x145/0x1e0
[<ffffffff810301e1>] __do_page_fault+0x1a1/0x4c0
[<ffffffff8169c9eb>] ? preempt_schedule_irq+0x4b/0x70
[<ffffffff8169e3b7>] ? retint_kernel+0x37/0x40
[<ffffffff8103053e>] do_page_fault+0xe/0x10
[<ffffffff8169e4c2>] page_fault+0x22/0x30
So, to prevent schedule_work_on from being called in preempt disabled context,
replace the pair of get/put_cpu() to get/put_cpu_light().
Signed-off-by: Yang Shi <yang.shi@windriver.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
117/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: mm/memcontrol: Replace local_irq_disable with local locks
Date: Wed, 28 Jan 2015 17:14:16 +0100
There are a few local_irq_disable() which then take sleeping locks. This
patch converts them local locks.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
118/252 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: mm/zsmalloc: copy with get_cpu_var() and locking
Date: Tue, 22 Mar 2016 11:16:09 +0100
get_cpu_var() disables preemption and triggers a might_sleep() splat later.
This is replaced with get_locked_var().
This bitspinlocks are replaced with a proper mutex which requires a slightly
larger struct to allocate.
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
[bigeasy: replace the bitspin_lock() with a mutex, get_locked_var(). Mike then
fixed the size magic]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
119/252 [
Author: Luis Claudio R. Goncalves
Email: lclaudio@uudg.org
Subject: mm/zswap: Do not disable preemption in zswap_frontswap_store()
Date: Tue, 25 Jun 2019 11:28:04 -0300
Zswap causes "BUG: scheduling while atomic" by blocking on a rt_spin_lock() with
preemption disabled. The preemption is disabled by get_cpu_var() in
zswap_frontswap_store() to protect the access of the zswap_dstmem percpu variable.
Use get_locked_var() to protect the percpu zswap_dstmem variable, making the
code preemptive.
As get_cpu_ptr() also disables preemption, replace it by this_cpu_ptr() and
remove the counterpart put_cpu_ptr().
Steps to Reproduce:
1. # grubby --args "zswap.enabled=1" --update-kernel DEFAULT
2. # reboot
3. Calculate the amount o memory to be used by the test:
---> grep MemAvailable /proc/meminfo
---> Add 25% ~ 50% to that value
4. # stress --vm 1 --vm-bytes ${MemAvailable+25%} --timeout 240s
Usually, in less than 5 minutes the backtrace listed below appears, followed
by a kernel panic:
| BUG: scheduling while atomic: kswapd1/181/0x00000002
|
| Preemption disabled at:
| [<ffffffff8b2a6cda>] zswap_frontswap_store+0x21a/0x6e1
|
| Kernel panic - not syncing: scheduling while atomic
| CPU: 14 PID: 181 Comm: kswapd1 Kdump: loaded Not tainted 5.0.14-rt9 #1
| Hardware name: AMD Pence/Pence, BIOS WPN2321X_Weekly_12_03_21 03/19/2012
| Call Trace:
| panic+0x106/0x2a7
| __schedule_bug.cold+0x3f/0x51
| __schedule+0x5cb/0x6f0
| schedule+0x43/0xd0
| rt_spin_lock_slowlock_locked+0x114/0x2b0
| rt_spin_lock_slowlock+0x51/0x80
| zbud_alloc+0x1da/0x2d0
| zswap_frontswap_store+0x31a/0x6e1
| __frontswap_store+0xab/0x130
| swap_writepage+0x39/0x70
| pageout.isra.0+0xe3/0x320
| shrink_page_list+0xa8e/0xd10
| shrink_inactive_list+0x251/0x840
| shrink_node_memcg+0x213/0x770
| shrink_node+0xd9/0x450
| balance_pgdat+0x2d5/0x510
| kswapd+0x218/0x470
| kthread+0xfb/0x130
| ret_from_fork+0x27/0x50
Cc: stable-rt@vger.kernel.org
Reported-by: Ping Fang <pifang@redhat.com>
Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
120/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: radix-tree: use local locks
Date: Wed, 25 Jan 2017 16:34:27 +0100
The preload functionality uses per-CPU variables and preempt-disable to
ensure that it does not switch CPUs during its usage. This patch adds
local_locks() instead preempt_disable() for the same purpose and to
remain preemptible on -RT.
Cc: stable-rt@vger.kernel.org
Reported-and-debugged-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
121/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: x86: kvm Require const tsc for RT
Date: Sun, 6 Nov 2011 12:26:18 +0100
Non constant TSC is a nightmare on bare metal already, but with
virtualization it becomes a complete disaster because the workarounds
are horrible latency wise. That's also a preliminary for running RT in
a guest on top of a RT host.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
122/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: pci/switchtec: Don't use completion's wait queue
Date: Wed, 4 Oct 2017 10:24:23 +0200
The poll callback is using completion's wait_queue_head_t member and
puts it in poll_wait() so the poll() caller gets a wakeup after command
completed. This does not work on RT because we don't have a
wait_queue_head_t in our completion implementation. Nobody in tree does
like that in tree so this is the only driver that breaks.
Instead of using the completion here is waitqueue with a status flag as
suggested by Logan.
I don't have the HW so I have no idea if it works as expected, so please
test it.
Cc: Kurt Schwemmer <kurt.schwemmer@microsemi.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
123/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: wait.h: include atomic.h
Date: Mon, 28 Oct 2013 12:19:57 +0100
| CC init/main.o
|In file included from include/linux/mmzone.h:9:0,
| from include/linux/gfp.h:4,
| from include/linux/kmod.h:22,
| from include/linux/module.h:13,
| from init/main.c:15:
|include/linux/wait.h: In function ‘wait_on_atomic_t’:
|include/linux/wait.h:982:2: error: implicit declaration of function ‘atomic_read’ [-Werror=implicit-function-declaration]
| if (atomic_read(val) == 0)
| ^
This pops up on ARM. Non-RT gets its atomic.h include from spinlock.h
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
124/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: completion: Use simple wait queues
Date: Fri, 11 Jan 2013 11:23:51 +0100
Completions have no long lasting callbacks and therefor do not need
the complex waitqueue variant. Use simple waitqueues which reduces the
contention on the waitqueue lock.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[cminyard@mvista.com: Move __prepare_to_swait() into the do loop because
swake_up_locked() removes the waiter on wake from the queue while in the
original code it is not the case]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
125/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: hrtimer: Allow raw wakeups during boot
Date: Fri, 9 Aug 2019 15:25:21 +0200
There are a few wake-up timers during the early boot which are essencial for
the system to make progress. At this stage there are no softirq spawn for the
softirq processing so there is no timer processing in softirq.
The wakeup in question:
smpboot_create_thread()
-> kthread_create_on_cpu()
-> kthread_bind()
-> wait_task_inactive()
-> schedule_hrtimeout()
Let the timer fire in hardirq context during the system boot.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
126/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: hrtimer: move state change before hrtimer_cancel in do_nanosleep()
Date: Thu, 6 Dec 2018 10:15:13 +0100
There is a small window between setting t->task to NULL and waking the
task up (which would set TASK_RUNNING). So the timer would fire, run and
set ->task to NULL while the other side/do_nanosleep() wouldn't enter
freezable_schedule(). After all we are peemptible here (in
do_nanosleep() and on the timer wake up path) and on KVM/virt the
virt-CPU might get preempted.
So do_nanosleep() wouldn't enter freezable_schedule() but cancel the
timer which is still running and wait for it via
hrtimer_wait_for_timer(). Then wait_event()/might_sleep() would complain
that it is invoked with state != TASK_RUNNING.
This isn't a problem since it would be reset to TASK_RUNNING later
anyway and we don't rely on the previous state.
Move the state update to TASK_RUNNING before hrtimer_cancel() so there
are no complains from might_sleep() about wrong state.
Cc: stable-rt@vger.kernel.org
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
127/252 [
Author: John Stultz
Email: johnstul@us.ibm.com
Subject: posix-timers: Thread posix-cpu-timers on -rt
Date: Fri, 3 Jul 2009 08:29:58 -0500
posix-cpu-timer code takes non -rt safe locks in hard irq
context. Move it to a thread.
[ 3.0 fixes from Peter Zijlstra <peterz@infradead.org> ]
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
128/252 [
Author: Anna-Maria Gleixner
Email: anna-maria@linutronix.de
Subject: posix-timers: Add expiry lock
Date: Mon, 27 May 2019 16:54:06 +0200
If a about to be removed posix timer is active then the code will retry the
delete operation until it succeeds / the timer callback completes.
Use hrtimer_grab_expiry_lock() for posix timers which use a hrtimer underneath
to spin on a lock until the callback finished.
Introduce cpu_timers_grab_expiry_lock() for the posix-cpu-timer. This will
acquire the proper per-CPU spin_lock which is acquired by the CPU which is
expirering the timer.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
[bigeasy: keep the posix-cpu timer bits, everything else got applied]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
129/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Limit the number of task migrations per batch
Date: Mon, 6 Jun 2011 12:12:51 +0200
Put an upper limit on the number of tasks which are migrated per batch
to avoid large latencies.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
130/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Move mmdrop to RCU on RT
Date: Mon, 6 Jun 2011 12:20:33 +0200
Takes sleeping locks and calls into the memory allocator, so nothing
we want to do in task switch and oder atomic contexts.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
131/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: kernel/sched: move stack + kprobe clean up to __put_task_struct()
Date: Mon, 21 Nov 2016 19:31:08 +0100
There is no need to free the stack before the task struct (except for reasons
mentioned in commit 68f24b08ee89 ("sched/core: Free the stack early if
CONFIG_THREAD_INFO_IN_TASK")). This also comes handy on -RT because we can't
free memory in preempt disabled region.
vfree_atomic() delays the memory cleanup to a worker. Since we move everything
to the RCU callback, we can also free it immediately.
Cc: stable-rt@vger.kernel.org #for kprobe_flush_task()
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
132/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Add saved_state for tasks blocked on sleeping locks
Date: Sat, 25 Jun 2011 09:21:04 +0200
Spinlocks are state preserving in !RT. RT changes the state when a
task gets blocked on a lock. So we need to remember the state before
the lock contention. If a regular wakeup (not a RTmutex related
wakeup) happens, the saved_state is updated to running. When the lock
sleep is done, the saved state is restored.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
133/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Do not account rcu_preempt_depth on RT in might_sleep()
Date: Tue, 7 Jun 2011 09:19:06 +0200
RT changes the rcu_preempt_depth semantics, so we cannot check for it
in might_sleep().
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
134/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Disable TTWU_QUEUE on RT
Date: Tue, 13 Sep 2011 16:42:35 +0200
The queued remote wakeup mechanism can introduce rather large
latencies if the number of migrated tasks is high. Disable it for RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
135/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: softirq: Avoid a cancel dead-lock in tasklet handling due to preemptible-softirq
Date: Sat, 22 Jun 2019 00:09:22 +0200
A pending / active tasklet which is preempted by a task on the same CPU
will spin indefinitely because the tasklet makes no progress.
To avoid this deadlock we can disable BH which will acquire the
softirq-lock which will force the completion of the softirq and so the
tasklet.
The BH off/on in tasklet_kill() will force tasklets which are not yet
running but scheduled (because ksoftirqd was preempted before it could
start the tasklet).
The BH off/on in tasklet_unlock_wait() will force tasklets which got
preempted while running.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
136/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: softirq: Check preemption after reenabling interrupts
Date: Sun, 13 Nov 2011 17:17:09 +0100
raise_softirq_irqoff() disables interrupts and wakes the softirq
daemon, but after reenabling interrupts there is no preemption check,
so the execution of the softirq thread might be delayed arbitrarily.
In principle we could add that check to local_irq_enable/restore, but
that's overkill as the rasie_softirq_irqoff() sections are the only
ones which show this behaviour.
Reported-by: Carsten Emde <cbe@osadl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
137/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: softirq: Disable softirq stacks for RT
Date: Mon, 18 Jul 2011 13:59:17 +0200
Disable extra stacks for softirqs. We want to preempt softirqs and
having them on special IRQ-stack does not make this easier.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
138/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net/core: use local_bh_disable() in netif_rx_ni()
Date: Fri, 16 Jun 2017 19:03:16 +0200
In 2004 netif_rx_ni() gained a preempt_disable() section around
netif_rx() and its do_softirq() + testing for it. The do_softirq() part
is required because netif_rx() raises the softirq but does not invoke
it. The preempt_disable() is required to remain on the same CPU which added the
skb to the per-CPU list.
All this can be avoided be putting this into a local_bh_disable()ed
section. The local_bh_enable() part will invoke do_softirq() if
required.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
139/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: Handle the various new futex race conditions
Date: Fri, 10 Jun 2011 11:04:15 +0200
RT opens a few new interesting race conditions in the rtmutex/futex
combo due to futex hash bucket lock being a 'sleeping' spinlock and
therefor not disabling preemption.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
140/252 [
Author: Steven Rostedt
Email: rostedt@goodmis.org
Subject: futex: Fix bug on when a requeued RT task times out
Date: Tue, 14 Jul 2015 14:26:34 +0200
Requeue with timeout causes a bug with PREEMPT_RT.
The bug comes from a timed out condition.
TASK 1 TASK 2
------ ------
futex_wait_requeue_pi()
futex_wait_queue_me()
<timed out>
double_lock_hb();
raw_spin_lock(pi_lock);
if (current->pi_blocked_on) {
} else {
current->pi_blocked_on = PI_WAKE_INPROGRESS;
run_spin_unlock(pi_lock);
spin_lock(hb->lock); <-- blocked!
plist_for_each_entry_safe(this) {
rt_mutex_start_proxy_lock();
task_blocks_on_rt_mutex();
BUG_ON(task->pi_blocked_on)!!!!
The BUG_ON() actually has a check for PI_WAKE_INPROGRESS, but the
problem is that, after TASK 1 sets PI_WAKE_INPROGRESS, it then tries to
grab the hb->lock, which it fails to do so. As the hb->lock is a mutex,
it will block and set the "pi_blocked_on" to the hb->lock.
When TASK 2 goes to requeue it, the check for PI_WAKE_INPROGESS fails
because the task1's pi_blocked_on is no longer set to that, but instead,
set to the hb->lock.
The fix:
When calling rt_mutex_start_proxy_lock() a check is made to see
if the proxy tasks pi_blocked_on is set. If so, exit out early.
Otherwise set it to a new flag PI_REQUEUE_INPROGRESS, which notifies
the proxy task that it is being requeued, and will handle things
appropriately.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
141/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: futex: Ensure lock/unlock symetry versus pi_lock and hash bucket lock
Date: Fri, 1 Mar 2013 11:17:42 +0100
In exit_pi_state_list() we have the following locking construct:
spin_lock(&hb->lock);
raw_spin_lock_irq(&curr->pi_lock);
...
spin_unlock(&hb->lock);
In !RT this works, but on RT the migrate_enable() function which is
called from spin_unlock() sees atomic context due to the held pi_lock
and just decrements the migrate_disable_atomic counter of the
task. Now the next call to migrate_disable() sees the counter being
negative and issues a warning. That check should be in
migrate_enable() already.
Fix this by dropping pi_lock before unlocking hb->lock and reaquire
pi_lock after that again. This is safe as the loop code reevaluates
head again under the pi_lock.
Reported-by: Yong Zhang <yong.zhang@windriver.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
142/252 [
Author: Grygorii Strashko
Email: Grygorii.Strashko@linaro.org
Subject: pid.h: include atomic.h
Date: Tue, 21 Jul 2015 19:43:56 +0300
This patch fixes build error:
CC kernel/pid_namespace.o
In file included from kernel/pid_namespace.c:11:0:
include/linux/pid.h: In function 'get_pid':
include/linux/pid.h:78:3: error: implicit declaration of function 'atomic_inc' [-Werror=implicit-function-declaration]
atomic_inc(&pid->count);
^
which happens when
CONFIG_PROVE_LOCKING=n
CONFIG_DEBUG_SPINLOCK=n
CONFIG_DEBUG_MUTEXES=n
CONFIG_DEBUG_LOCK_ALLOC=n
CONFIG_PID_NS=y
Vanilla gets this via spinlock.h.
Signed-off-by: Grygorii Strashko <Grygorii.Strashko@linaro.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
143/252 [
Author: Wolfgang M. Reimer
Email: linuxball@gmail.com
Subject: locking: locktorture: Do NOT include rwlock.h directly
Date: Tue, 21 Jul 2015 16:20:07 +0200
Including rwlock.h directly will cause kernel builds to fail
if CONFIG_PREEMPT_RT is defined. The correct header file
(rwlock_rt.h OR rwlock.h) will be included by spinlock.h which
is included by locktorture.c anyway.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Wolfgang M. Reimer <linuxball@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
144/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: Add rtmutex_lock_killable()
Date: Thu, 9 Jun 2011 11:43:52 +0200
Add "killable" type to rtmutex. We need this since rtmutex are used as
"normal" mutexes which do use this type.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
145/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: Make lock_killable work
Date: Sat, 1 Apr 2017 12:50:59 +0200
Locking an rt mutex killable does not work because signal handling is
restricted to TASK_INTERRUPTIBLE.
Use signal_pending_state() unconditionaly.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
146/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: spinlock: Split the lock types header
Date: Wed, 29 Jun 2011 19:34:01 +0200
Split raw_spinlock into its own file and the remaining spinlock_t into
its own non-RT header. The non-RT header will be replaced later by sleeping
spinlocks.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
147/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: Avoid include hell
Date: Wed, 29 Jun 2011 20:06:39 +0200
Include only the required raw types. This avoids pulling in the
complete spinlock header which in turn requires rtmutex.h at some point.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
148/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: rbtree: don't include the rcu header
Date: Fri, 20 Dec 2019 11:38:21 -0500
The RCU header pulls in spinlock.h and fails due not yet defined types:
|In file included from include/linux/spinlock.h:275:0,
| from include/linux/rcupdate.h:38,
| from include/linux/rbtree.h:34,
| from include/linux/rtmutex.h:17,
| from include/linux/spinlock_types.h:18,
| from kernel/bounds.c:13:
|include/linux/rwlock_rt.h:16:38: error: unknown type name ‘rwlock_t’
| extern void __lockfunc rt_write_lock(rwlock_t *rwlock);
| ^
This patch moves the required RCU function from the rcupdate.h header file into
a new header file which can be included by both users.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
149/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: Provide rt_mutex_slowlock_locked()
Date: Thu, 12 Oct 2017 16:14:22 +0200
This is the inner-part of rt_mutex_slowlock(), required for rwsem-rt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
150/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: export lockdep-less version of rt_mutex's lock, trylock and unlock
Date: Thu, 12 Oct 2017 16:36:39 +0200
Required for lock implementation ontop of rtmutex.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
151/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: add sleeping lock implementation
Date: Thu, 12 Oct 2017 17:11:19 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
152/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Use the proper LOCK_OFFSET for cond_resched()
Date: Sun, 17 Jul 2011 22:51:33 +0200
RT does not increment preempt count when a 'sleeping' spinlock is
locked. Update PREEMPT_LOCK_OFFSET for that case.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
153/252 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: locking/rtmutex: Clean ->pi_blocked_on in the error case
Date: Mon, 30 Sep 2019 18:15:44 +0200
The function rt_mutex_wait_proxy_lock() cleans ->pi_blocked_on in case
of failure (timeout, signal). The same cleanup is required in
__rt_mutex_start_proxy_lock().
In both the cases the tasks was interrupted by a signal or timeout while
acquiring the lock and after the interruption it longer blocks on the
lock.
Fixes: 1a1fb985f2e2b ("futex: Handle early deadlock return correctly")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
154/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: rtmutex: trylock is okay on -RT
Date: Wed, 2 Dec 2015 11:34:07 +0100
non-RT kernel could deadlock on rt_mutex_trylock() in softirq context. On
-RT we don't run softirqs in IRQ context but in thread context so it is
not a issue here.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
155/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: add mutex implementation based on rtmutex
Date: Thu, 12 Oct 2017 17:17:03 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
156/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: add rwsem implementation based on rtmutex
Date: Thu, 12 Oct 2017 17:28:34 +0200
The RT specific R/W semaphore implementation restricts the number of readers
to one because a writer cannot block on multiple readers and inherit its
priority or budget.
The single reader restricting is painful in various ways:
- Performance bottleneck for multi-threaded applications in the page fault
path (mmap sem)
- Progress blocker for drivers which are carefully crafted to avoid the
potential reader/writer deadlock in mainline.
The analysis of the writer code pathes shows, that properly written RT tasks
should not take them. Syscalls like mmap(), file access which take mmap sem
write locked have unbound latencies which are completely unrelated to mmap
sem. Other R/W sem users like graphics drivers are not suitable for RT tasks
either.
So there is little risk to hurt RT tasks when the RT rwsem implementation is
changed in the following way:
- Allow concurrent readers
- Make writers block until the last reader left the critical section. This
blocking is not subject to priority/budget inheritance.
- Readers blocked on a writer inherit their priority/budget in the normal
way.
There is a drawback with this scheme. R/W semaphores become writer unfair
though the applications which have triggered writer starvation (mostly on
mmap_sem) in the past are not really the typical workloads running on a RT
system. So while it's unlikely to hit writer starvation, it's possible. If
there are unexpected workloads on RT systems triggering it, we need to rethink
the approach.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
157/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: add rwlock implementation based on rtmutex
Date: Thu, 12 Oct 2017 17:18:06 +0200
The implementation is bias-based, similar to the rwsem implementation.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
158/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: wire up RT's locking
Date: Thu, 12 Oct 2017 17:31:14 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
159/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: rtmutex: add ww_mutex addon for mutex-rt
Date: Thu, 12 Oct 2017 17:34:38 +0200
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
160/252 [
Author: Mikulas Patocka
Email: mpatocka@redhat.com
Subject: locking/rt-mutex: fix deadlock in device mapper / block-IO
Date: Mon, 13 Nov 2017 12:56:53 -0500
When some block device driver creates a bio and submits it to another
block device driver, the bio is added to current->bio_list (in order to
avoid unbounded recursion).
However, this queuing of bios can cause deadlocks, in order to avoid them,
device mapper registers a function flush_current_bio_list. This function
is called when device mapper driver blocks. It redirects bios queued on
current->bio_list to helper workqueues, so that these bios can proceed
even if the driver is blocked.
The problem with CONFIG_PREEMPT_RT is that when the device mapper
driver blocks, it won't call flush_current_bio_list (because
tsk_is_pi_blocked returns true in sched_submit_work), so deadlocks in
block device stack can happen.
Note that we can't call blk_schedule_flush_plug if tsk_is_pi_blocked
returns true - that would cause
BUG_ON(rt_mutex_real_waiter(task->pi_blocked_on)) in
task_blocks_on_rt_mutex when flush_current_bio_list attempts to take a
spinlock.
So the proper fix is to call blk_schedule_flush_plug in rt_mutex_fastlock,
when fast acquire failed and when the task is about to block.
CC: stable-rt@vger.kernel.org
[bigeasy: The deadlock is not device-mapper specific, it can also occur
in plain EXT4]
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
161/252 [
Author: Scott Wood
Email: swood@redhat.com
Subject: locking/rt-mutex: Flush block plug on __down_read()
Date: Fri, 4 Jan 2019 15:33:21 -0500
__down_read() bypasses the rtmutex frontend to call
rt_mutex_slowlock_locked() directly, and thus it needs to call
blk_schedule_flush_flug() itself.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Scott Wood <swood@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
162/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locking/rtmutex: re-init the wait_lock in rt_mutex_init_proxy_locked()
Date: Thu, 16 Nov 2017 16:48:48 +0100
We could provide a key-class for the lockdep (and fixup all callers) or
move the init to all callers (like it was) in order to avoid lockdep
seeing a double-lock of the wait_lock.
Reported-by: Fernando Lopez-Lezcano <nando@ccrma.Stanford.EDU>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
163/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: ptrace: fix ptrace vs tasklist_lock race
Date: Thu, 29 Aug 2013 18:21:04 +0200
As explained by Alexander Fyodorov <halcy@yandex.ru>:
|read_lock(&tasklist_lock) in ptrace_stop() is converted to mutex on RT kernel,
|and it can remove __TASK_TRACED from task->state (by moving it to
|task->saved_state). If parent does wait() on child followed by a sys_ptrace
|call, the following race can happen:
|
|- child sets __TASK_TRACED in ptrace_stop()
|- parent does wait() which eventually calls wait_task_stopped() and returns
| child's pid
|- child blocks on read_lock(&tasklist_lock) in ptrace_stop() and moves
| __TASK_TRACED flag to saved_state
|- parent calls sys_ptrace, which calls ptrace_check_attach() and wait_task_inactive()
The patch is based on his initial patch where an additional check is
added in case the __TASK_TRACED moved to ->saved_state. The pi_lock is
taken in case the caller is interrupted between looking into ->state and
->saved_state.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
164/252 [
Author: Scott Wood
Email: swood@redhat.com
Subject: sched: __set_cpus_allowed_ptr(): Check cpus_mask, not cpus_ptr
Date: Sat, 27 Jul 2019 00:56:32 -0500
This function is concerned with the long-term cpu mask, not the
transitory mask the task might have while migrate disabled. Before
this patch, if a task was migrate disabled at the time
__set_cpus_allowed_ptr() was called, and the new mask happened to be
equal to the cpu that the task was running on, then the mask update
would be lost.
Signed-off-by: Scott Wood <swood@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
165/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: kernel/sched/core: add migrate_disable()
Date: Sat, 27 May 2017 19:02:06 +0200
[bristot@redhat.com: rt: Increase/decrease the nr of migratory tasks when enabling/disabling migration
Link: https://lkml.kernel.org/r/e981d271cbeca975bca710e2fbcc6078c09741b0.1498482127.git.bristot@redhat.com
]
[swood@redhat.com: fixups and optimisations
Link:https://lkml.kernel.org/r/20190727055638.20443-1-swood@redhat.com
Link:https://lkml.kernel.org/r/20191012065214.28109-1-swood@redhat.com
]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
166/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: sched/core: migrate_enable() must access takedown_cpu_task on !HOTPLUG_CPU
Date: Fri, 29 Nov 2019 17:24:55 +0100
The variable takedown_cpu_task is never declared/used on !HOTPLUG_CPU
except for migrate_enable(). This leads to a link error.
Don't use takedown_cpu_task in !HOTPLUG_CPU.
Reported-by: Dick Hollenbeck <dick@softplc.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
167/252 [
Author: Scott Wood
Email: swood@redhat.com
Subject: sched: migrate_enable: Use stop_one_cpu_nowait()
Date: Sat, 12 Oct 2019 01:52:14 -0500
migrate_enable() can be called with current->state != TASK_RUNNING.
Avoid clobbering the existing state by using stop_one_cpu_nowait().
Since we're stopping the current cpu, we know that we won't get
past __schedule() until migration_cpu_stop() has run (at least up to
the point of migrating us to another cpu).
Signed-off-by: Scott Wood <swood@redhat.com>
[bigeasy: spin until the request has been processed]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
168/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: trace: Add migrate-disabled counter to tracing output
Date: Sun, 17 Jul 2011 21:56:42 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
169/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: futex: workaround migrate_disable/enable in different context
Date: Wed, 8 Mar 2017 14:23:35 +0100
migrate_enable() invokes __schedule() and it expects a preempt count of one.
Holding a raw_spinlock_t with disabled interrupts should not allow scheduling.
These little hacks ensure that we don't schedule while we lock the hb lockwith
interrupts enabled and unlock it with interrupts disabled.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[XXX: As per PeterZ suggesstion
set_thread_flag(TIF_NEED_RESCHED); preempt_fold_need_resched()
would trigger a scheduler invocation on the last preempt_enable() which in
turn would allow to drop this.
]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
170/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locking: don't check for __LINUX_SPINLOCK_TYPES_H on -RT archs
Date: Fri, 4 Aug 2017 17:40:42 +0200
Upstream uses arch_spinlock_t within spinlock_t and requests that
spinlock_types.h header file is included first.
On -RT we have the rt_mutex with its raw_lock wait_lock which needs
architectures' spinlock_types.h header file for its definition. However
we need rt_mutex first because it is used to build the spinlock_t so
that check does not work for us.
Therefore I am dropping that check.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
171/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locking: Make spinlock_t and rwlock_t a RCU section on RT
Date: Tue, 19 Nov 2019 09:25:04 +0100
On !RT a locked spinlock_t and rwlock_t disables preemption which
implies a RCU read section. There is code that relies on that behaviour.
Add an explicit RCU read section on RT while a sleeping lock (a lock
which would disables preemption on !RT) acquired.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
172/252 [
Author: Scott Wood
Email: swood@redhat.com
Subject: rcu: Use rcuc threads on PREEMPT_RT as we did
Date: Wed, 11 Sep 2019 17:57:28 +0100
While switching to the reworked RCU-thread code, it has been forgotten
to enable the thread processing on -RT.
Besides restoring behavior that used to be default on RT, this avoids
a deadlock on scheduler locks.
Signed-off-by: Scott Wood <swood@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
173/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: srcu: replace local_irqsave() with a locallock
Date: Thu, 12 Oct 2017 18:37:12 +0200
There are two instances which disable interrupts in order to become a
stable this_cpu_ptr() pointer. The restore part is coupled with
spin_unlock_irqrestore() which does not work on RT.
Replace the local_irq_save() call with the appropriate local_lock()
version of it.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
174/252 [
Author: Julia Cartwright
Email: julia@ni.com
Subject: rcu: enable rcu_normal_after_boot by default for RT
Date: Wed, 12 Oct 2016 11:21:14 -0500
The forcing of an expedited grace period is an expensive and very
RT-application unfriendly operation, as it forcibly preempts all running
tasks on CPUs which are preventing the gp from expiring.
By default, as a policy decision, disable the expediting of grace
periods (after boot) on configurations which enable PREEMPT_RT.
Suggested-by: Luiz Capitulino <lcapitulino@redhat.com>
Acked-by: Paul E. McKenney <paulmck@linux.ibm.com>
Signed-off-by: Julia Cartwright <julia@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
175/252 [
Author: Scott Wood
Email: swood@redhat.com
Subject: rcutorture: Avoid problematic critical section nesting on RT
Date: Wed, 11 Sep 2019 17:57:29 +0100
rcutorture was generating some nesting scenarios that are not
reasonable. Constrain the state selection to avoid them.
Example #1:
1. preempt_disable()
2. local_bh_disable()
3. preempt_enable()
4. local_bh_enable()
On PREEMPT_RT, BH disabling takes a local lock only when called in
non-atomic context. Thus, atomic context must be retained until after BH
is re-enabled. Likewise, if BH is initially disabled in non-atomic
context, it cannot be re-enabled in atomic context.
Example #2:
1. rcu_read_lock()
2. local_irq_disable()
3. rcu_read_unlock()
4. local_irq_enable()
If the thread is preempted between steps 1 and 2,
rcu_read_unlock_special.b.blocked will be set, but it won't be
acted on in step 3 because IRQs are disabled. Thus, reporting of the
quiescent state will be delayed beyond the local_irq_enable().
For now, these scenarios will continue to be tested on non-PREEMPT_RT
kernels, until debug checks are added to ensure that they are not
happening elsewhere.
Signed-off-by: Scott Wood <swood@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
176/252 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: rt: Improve the serial console PASS_LIMIT
Date: Wed, 14 Dec 2011 13:05:54 +0100
Beyond the warning:
drivers/tty/serial/8250/8250.c:1613:6: warning: unused variable ‘pass_counter’ [-Wunused-variable]
the solution of just looping infinitely was ugly - up it to 1 million to
give it a chance to continue in some really ugly situation.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
177/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: fs/epoll: Do not disable preemption on RT
Date: Fri, 8 Jul 2011 16:35:35 +0200
ep_call_nested() takes a sleeping lock so we can't disable preemption.
The light version is enough since ep_call_nested() doesn't mind beeing
invoked twice on the same CPU.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
178/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mm/vmalloc: Another preempt disable region which sucks
Date: Tue, 12 Jul 2011 11:39:36 +0200
Avoid the preempt disable version of get_cpu_var(). The inner-lock should
provide enough serialisation.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
179/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: block/mq: do not invoke preempt_disable()
Date: Tue, 14 Jul 2015 14:26:34 +0200
preempt_disable() and get_cpu() don't play well together with the sleeping
locks it tries to allocate later.
It seems to be enough to replace it with get_cpu_light() and migrate_disable().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
180/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: block/mq: don't complete requests via IPI
Date: Thu, 29 Jan 2015 15:10:08 +0100
The IPI runs in hardirq context and there are sleeping locks. Assume caches are
shared and complete them on the local CPU.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
181/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: md: raid5: Make raid5_percpu handling RT aware
Date: Tue, 6 Apr 2010 16:51:31 +0200
__raid_run_ops() disables preemption with get_cpu() around the access
to the raid5_percpu variables. That causes scheduling while atomic
spews on RT.
Serialize the access to the percpu data with a lock and keep the code
preemptible.
Reported-by: Udo van den Heuvel <udovdh@xs4all.nl>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Udo van den Heuvel <udovdh@xs4all.nl>
]
182/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: scsi/fcoe: Make RT aware.
Date: Sat, 12 Nov 2011 14:00:48 +0100
Do not disable preemption while taking sleeping locks. All user look safe
for migrate_diable() only.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
183/252 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: sunrpc: Make svc_xprt_do_enqueue() use get_cpu_light()
Date: Wed, 18 Feb 2015 16:05:28 +0100
|BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:915
|in_atomic(): 1, irqs_disabled(): 0, pid: 3194, name: rpc.nfsd
|Preemption disabled at:[<ffffffffa06bf0bb>] svc_xprt_received+0x4b/0xc0 [sunrpc]
|CPU: 6 PID: 3194 Comm: rpc.nfsd Not tainted 3.18.7-rt1 #9
|Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.404 11/06/2014
| ffff880409630000 ffff8800d9a33c78 ffffffff815bdeb5 0000000000000002
| 0000000000000000 ffff8800d9a33c98 ffffffff81073c86 ffff880408dd6008
| ffff880408dd6000 ffff8800d9a33cb8 ffffffff815c3d84 ffff88040b3ac000
|Call Trace:
| [<ffffffff815bdeb5>] dump_stack+0x4f/0x9e
| [<ffffffff81073c86>] __might_sleep+0xe6/0x150
| [<ffffffff815c3d84>] rt_spin_lock+0x24/0x50
| [<ffffffffa06beec0>] svc_xprt_do_enqueue+0x80/0x230 [sunrpc]
| [<ffffffffa06bf0bb>] svc_xprt_received+0x4b/0xc0 [sunrpc]
| [<ffffffffa06c03ed>] svc_add_new_perm_xprt+0x6d/0x80 [sunrpc]
| [<ffffffffa06b2693>] svc_addsock+0x143/0x200 [sunrpc]
| [<ffffffffa072e69c>] write_ports+0x28c/0x340 [nfsd]
| [<ffffffffa072d2ac>] nfsctl_transaction_write+0x4c/0x80 [nfsd]
| [<ffffffff8117ee83>] vfs_write+0xb3/0x1d0
| [<ffffffff8117f889>] SyS_write+0x49/0xb0
| [<ffffffff815c4556>] system_call_fastpath+0x16/0x1b
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
184/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rt: Introduce cpu_chill()
Date: Wed, 7 Mar 2012 20:51:03 +0100
Retry loops on RT might loop forever when the modifying side was
preempted. Add cpu_chill() to replace cpu_relax(). cpu_chill()
defaults to cpu_relax() for non RT. On RT it puts the looping task to
sleep for a tick so the preempted task can make progress.
Steven Rostedt changed it to use a hrtimer instead of msleep():
|
|Ulrich Obergfell pointed out that cpu_chill() calls msleep() which is woken
|up by the ksoftirqd running the TIMER softirq. But as the cpu_chill() is
|called from softirq context, it may block the ksoftirqd() from running, in
|which case, it may never wake up the msleep() causing the deadlock.
+ bigeasy later changed to schedule_hrtimeout()
|If a task calls cpu_chill() and gets woken up by a regular or spurious
|wakeup and has a signal pending, then it exits the sleep loop in
|do_nanosleep() and sets up the restart block. If restart->nanosleep.type is
|not TI_NONE then this results in accessing a stale user pointer from a
|previously interrupted syscall and a copy to user based on the stale
|pointer or a BUG() when 'type' is not supported in nanosleep_copyout().
+ bigeasy: add PF_NOFREEZE:
| [....] Waiting for /dev to be fully populated...
| =====================================
| [ BUG: udevd/229 still has locks held! ]
| 3.12.11-rt17 #23 Not tainted
| -------------------------------------
| 1 lock held by udevd/229:
| #0: (&type->i_mutex_dir_key#2){+.+.+.}, at: lookup_slow+0x28/0x98
|
| stack backtrace:
| CPU: 0 PID: 229 Comm: udevd Not tainted 3.12.11-rt17 #23
| (unwind_backtrace+0x0/0xf8) from (show_stack+0x10/0x14)
| (show_stack+0x10/0x14) from (dump_stack+0x74/0xbc)
| (dump_stack+0x74/0xbc) from (do_nanosleep+0x120/0x160)
| (do_nanosleep+0x120/0x160) from (hrtimer_nanosleep+0x90/0x110)
| (hrtimer_nanosleep+0x90/0x110) from (cpu_chill+0x30/0x38)
| (cpu_chill+0x30/0x38) from (dentry_kill+0x158/0x1ec)
| (dentry_kill+0x158/0x1ec) from (dput+0x74/0x15c)
| (dput+0x74/0x15c) from (lookup_real+0x4c/0x50)
| (lookup_real+0x4c/0x50) from (__lookup_hash+0x34/0x44)
| (__lookup_hash+0x34/0x44) from (lookup_slow+0x38/0x98)
| (lookup_slow+0x38/0x98) from (path_lookupat+0x208/0x7fc)
| (path_lookupat+0x208/0x7fc) from (filename_lookup+0x20/0x60)
| (filename_lookup+0x20/0x60) from (user_path_at_empty+0x50/0x7c)
| (user_path_at_empty+0x50/0x7c) from (user_path_at+0x14/0x1c)
| (user_path_at+0x14/0x1c) from (vfs_fstatat+0x48/0x94)
| (vfs_fstatat+0x48/0x94) from (SyS_stat64+0x14/0x30)
| (SyS_stat64+0x14/0x30) from (ret_fast_syscall+0x0/0x48)
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
185/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: block: Use cpu_chill() for retry loops
Date: Thu, 20 Dec 2012 18:28:26 +0100
Retry loops on RT might loop forever when the modifying side was
preempted. Steven also observed a live lock when there was a
concurrent priority boosting going on.
Use cpu_chill() instead of cpu_relax() to let the system
make progress.
[bigeasy: After all those changes that occured over the years, this one hunk is
left and should not cause any starvation on -RT anymore]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
186/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: fs: namespace: Use cpu_chill() in trylock loops
Date: Wed, 7 Mar 2012 21:00:34 +0100
Retry loops on RT might loop forever when the modifying side was
preempted. Use cpu_chill() instead of cpu_relax() to let the system
make progress.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
187/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: net: Use cpu_chill() instead of cpu_relax()
Date: Wed, 7 Mar 2012 21:10:04 +0100
Retry loops on RT might loop forever when the modifying side was
preempted. Use cpu_chill() instead of cpu_relax() to let the system
make progress.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
188/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: debugobjects: Make RT aware
Date: Sun, 17 Jul 2011 21:41:35 +0200
Avoid filling the pool / allocating memory with irqs off().
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
189/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: net: Use skbufhead with raw lock
Date: Tue, 12 Jul 2011 15:38:34 +0200
Use the rps lock as rawlock so we can keep irq-off regions. It looks low
latency. However we can't kfree() from this context therefore we defer this
to the softirq and use the tofree_queue list for it (similar to process_queue).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
190/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net: dev: always take qdisc's busylock in __dev_xmit_skb()
Date: Wed, 30 Mar 2016 13:36:29 +0200
The root-lock is dropped before dev_hard_start_xmit() is invoked and after
setting the __QDISC___STATE_RUNNING bit. If this task is now pushed away
by a task with a higher priority then the task with the higher priority
won't be able to submit packets to the NIC directly instead they will be
enqueued into the Qdisc. The NIC will remain idle until the task(s) with
higher priority leave the CPU and the task with lower priority gets back
and finishes the job.
If we take always the busylock we ensure that the RT task can boost the
low-prio task and submit the packet.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
191/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: irqwork: push most work into softirq context
Date: Tue, 23 Jun 2015 15:32:51 +0200
Initially we defered all irqwork into softirq because we didn't want the
latency spikes if perf or another user was busy and delayed the RT task.
The NOHZ trigger (nohz_full_kick_work) was the first user that did not work
as expected if it did not run in the original irqwork context so we had to
bring it back somehow for it. push_irq_work_func is the second one that
requires this.
This patch adds the IRQ_WORK_HARD_IRQ which makes sure the callback runs
in raw-irq context. Everything else is defered into softirq context. Without
-RT we have the orignal behavior.
This patch incorporates tglx orignal work which revoked a little bringing back
the arch_irq_work_raise() if possible and a few fixes from Steven Rostedt and
Mike Galbraith,
[bigeasy: melt tglx's irq_work_tick_soft() which splits irq_work_tick() into a
hard and soft variant]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
192/252 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: x86: crypto: Reduce preempt disabled regions
Date: Mon, 14 Nov 2011 18:19:27 +0100
Restrict the preempt disabled regions to the actual floating point
operations and enable preemption for the administrative actions.
This is necessary on RT to avoid that kfree and other operations are
called with preemption disabled.
Reported-and-tested-by: Carsten Emde <cbe@osadl.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
193/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: crypto: Reduce preempt disabled regions, more algos
Date: Fri, 21 Feb 2014 17:24:04 +0100
Don Estabrook reported
| kernel: WARNING: CPU: 2 PID: 858 at kernel/sched/core.c:2428 migrate_disable+0xed/0x100()
| kernel: WARNING: CPU: 2 PID: 858 at kernel/sched/core.c:2462 migrate_enable+0x17b/0x200()
| kernel: WARNING: CPU: 3 PID: 865 at kernel/sched/core.c:2428 migrate_disable+0xed/0x100()
and his backtrace showed some crypto functions which looked fine.
The problem is the following sequence:
glue_xts_crypt_128bit()
{
blkcipher_walk_virt(); /* normal migrate_disable() */
glue_fpu_begin(); /* get atomic */
while (nbytes) {
__glue_xts_crypt_128bit();
blkcipher_walk_done(); /* with nbytes = 0, migrate_enable()
* while we are atomic */
};
glue_fpu_end() /* no longer atomic */
}
and this is why the counter get out of sync and the warning is printed.
The other problem is that we are non-preemptible between
glue_fpu_begin() and glue_fpu_end() and the latency grows. To fix this,
I shorten the FPU off region and ensure blkcipher_walk_done() is called
with preemption enabled. This might hurt the performance because we now
enable/disable the FPU state more often but we gain lower latency and
the bug is gone.
Reported-by: Don Estabrook <don.estabrook@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
194/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: crypto: limit more FPU-enabled sections
Date: Thu, 30 Nov 2017 13:40:10 +0100
Those crypto drivers use SSE/AVX/… for their crypto work and in order to
do so in kernel they need to enable the "FPU" in kernel mode which
disables preemption.
There are two problems with the way they are used:
- the while loop which processes X bytes may create latency spikes and
should be avoided or limited.
- the cipher-walk-next part may allocate/free memory and may use
kmap_atomic().
The whole kernel_fpu_begin()/end() processing isn't probably that cheap.
It most likely makes sense to process as much of those as possible in one
go. The new *_fpu_sched_rt() schedules only if a RT task is pending.
Probably we should measure the performance those ciphers in pure SW
mode and with this optimisations to see if it makes sense to keep them
for RT.
This kernel_fpu_resched() makes the code more preemptible which might hurt
performance.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
195/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: crypto: cryptd - add a lock instead preempt_disable/local_bh_disable
Date: Thu, 26 Jul 2018 18:52:00 +0200
cryptd has a per-CPU lock which protected with local_bh_disable() and
preempt_disable().
Add an explicit spin_lock to make the locking context more obvious and
visible to lockdep. Since it is a per-CPU lock, there should be no lock
contention on the actual spinlock.
There is a small race-window where we could be migrated to another CPU
after the cpu_queue has been obtain. This is not a problem because the
actual ressource is protected by the spinlock.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
196/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: panic: skip get_random_bytes for RT_FULL in init_oops_id
Date: Tue, 14 Jul 2015 14:26:34 +0200
Disable on -RT. If this is invoked from irq-context we will have problems
to acquire the sleeping lock.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
197/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: x86: stackprotector: Avoid random pool on rt
Date: Thu, 16 Dec 2010 14:25:18 +0100
CPU bringup calls into the random pool to initialize the stack
canary. During boot that works nicely even on RT as the might sleep
checks are disabled. During CPU hotplug the might sleep checks
trigger. Making the locks in random raw is a major PITA, so avoid the
call on RT is the only sensible solution. This is basically the same
randomness which we get during boot where the random pool has no
entropy and we rely on the TSC randomnness.
Reported-by: Carsten Emde <carsten.emde@osadl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
198/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: random: Make it work on rt
Date: Tue, 21 Aug 2012 20:38:50 +0200
Delegate the random insertion to the forced threaded interrupt
handler. Store the return IP of the hard interrupt handler in the irq
descriptor and feed it into the random generator as a source of
entropy.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
199/252 [
Author: Priyanka Jain
Email: Priyanka.Jain@freescale.com
Subject: net: Remove preemption disabling in netif_rx()
Date: Thu, 17 May 2012 09:35:11 +0530
1)enqueue_to_backlog() (called from netif_rx) should be
bind to a particluar CPU. This can be achieved by
disabling migration. No need to disable preemption
2)Fixes crash "BUG: scheduling while atomic: ksoftirqd"
in case of RT.
If preemption is disabled, enqueue_to_backog() is called
in atomic context. And if backlog exceeds its count,
kfree_skb() is called. But in RT, kfree_skb() might
gets scheduled out, so it expects non atomic context.
3)When CONFIG_PREEMPT_RT is not defined,
migrate_enable(), migrate_disable() maps to
preempt_enable() and preempt_disable(), so no
change in functionality in case of non-RT.
-Replace preempt_enable(), preempt_disable() with
migrate_enable(), migrate_disable() respectively
-Replace get_cpu(), put_cpu() with get_cpu_light(),
put_cpu_light() respectively
Signed-off-by: Priyanka Jain <Priyanka.Jain@freescale.com>
Acked-by: Rajan Srivastava <Rajan.Srivastava@freescale.com>
Cc: <rostedt@goodmis.orgn>
Link: http://lkml.kernel.org/r/1337227511-2271-1-git-send-email-Priyanka.Jain@freescale.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
200/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: lockdep: Make it RT aware
Date: Sun, 17 Jul 2011 18:51:23 +0200
teach lockdep that we don't really do softirqs on -RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
201/252 [
Author: Yong Zhang
Email: yong.zhang@windriver.com
Subject: lockdep: selftest: Only do hardirq context test for raw spinlock
Date: Mon, 16 Apr 2012 15:01:56 +0800
On -rt there is no softirq context any more and rwlock is sleepable,
disable softirq context test and rwlock+irq test.
Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Cc: Yong Zhang <yong.zhang@windriver.com>
Link: http://lkml.kernel.org/r/1334559716-18447-3-git-send-email-yong.zhang0@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
202/252 [
Author: Josh Cartwright
Email: josh.cartwright@ni.com
Subject: lockdep: selftest: fix warnings due to missing PREEMPT_RT conditionals
Date: Wed, 28 Jan 2015 13:08:45 -0600
"lockdep: Selftest: Only do hardirq context test for raw spinlock"
disabled the execution of certain tests with PREEMPT_RT, but did
not prevent the tests from still being defined. This leads to warnings
like:
./linux/lib/locking-selftest.c:574:1: warning: 'irqsafe1_hard_rlock_12' defined but not used [-Wunused-function]
./linux/lib/locking-selftest.c:574:1: warning: 'irqsafe1_hard_rlock_21' defined but not used [-Wunused-function]
./linux/lib/locking-selftest.c:577:1: warning: 'irqsafe1_hard_wlock_12' defined but not used [-Wunused-function]
./linux/lib/locking-selftest.c:577:1: warning: 'irqsafe1_hard_wlock_21' defined but not used [-Wunused-function]
./linux/lib/locking-selftest.c:580:1: warning: 'irqsafe1_soft_spin_12' defined but not used [-Wunused-function]
...
Fixed by wrapping the test definitions in #ifndef CONFIG_PREEMPT_RT
conditionals.
Signed-off-by: Josh Cartwright <josh.cartwright@ni.com>
Signed-off-by: Xander Huff <xander.huff@ni.com>
Acked-by: Gratian Crisan <gratian.crisan@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
203/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: lockdep: disable self-test
Date: Tue, 17 Oct 2017 16:36:18 +0200
The self-test wasn't always 100% accurate for RT. We disabled a few
tests which failed because they had a different semantic for RT. Some
still reported false positives. Now the selftest locks up the system
during boot and it needs to be investigated…
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
204/252 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: drm,radeon,i915: Use preempt_disable/enable_rt() where recommended
Date: Sat, 27 Feb 2016 08:09:11 +0100
DRM folks identified the spots, so use them.
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-rt-users <linux-rt-users@vger.kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
205/252 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: drm,i915: Use local_lock/unlock_irq() in intel_pipe_update_start/end()
Date: Sat, 27 Feb 2016 09:01:42 +0100
[ 8.014039] BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:918
[ 8.014041] in_atomic(): 0, irqs_disabled(): 1, pid: 78, name: kworker/u4:4
[ 8.014045] CPU: 1 PID: 78 Comm: kworker/u4:4 Not tainted 4.1.7-rt7 #5
[ 8.014055] Workqueue: events_unbound async_run_entry_fn
[ 8.014059] 0000000000000000 ffff880037153748 ffffffff815f32c9 0000000000000002
[ 8.014063] ffff88013a50e380 ffff880037153768 ffffffff815ef075 ffff8800372c06c8
[ 8.014066] ffff8800372c06c8 ffff880037153778 ffffffff8107c0b3 ffff880037153798
[ 8.014067] Call Trace:
[ 8.014074] [<ffffffff815f32c9>] dump_stack+0x4a/0x61
[ 8.014078] [<ffffffff815ef075>] ___might_sleep.part.93+0xe9/0xee
[ 8.014082] [<ffffffff8107c0b3>] ___might_sleep+0x53/0x80
[ 8.014086] [<ffffffff815f9064>] rt_spin_lock+0x24/0x50
[ 8.014090] [<ffffffff8109368b>] prepare_to_wait+0x2b/0xa0
[ 8.014152] [<ffffffffa016c04c>] intel_pipe_update_start+0x17c/0x300 [i915]
[ 8.014156] [<ffffffff81093b40>] ? prepare_to_wait_event+0x120/0x120
[ 8.014201] [<ffffffffa0158f36>] intel_begin_crtc_commit+0x166/0x1e0 [i915]
[ 8.014215] [<ffffffffa00c806d>] drm_atomic_helper_commit_planes+0x5d/0x1a0 [drm_kms_helper]
[ 8.014260] [<ffffffffa0171e9b>] intel_atomic_commit+0xab/0xf0 [i915]
[ 8.014288] [<ffffffffa00654c7>] drm_atomic_commit+0x37/0x60 [drm]
[ 8.014298] [<ffffffffa00c6fcd>] drm_atomic_helper_plane_set_property+0x8d/0xd0 [drm_kms_helper]
[ 8.014301] [<ffffffff815f77d9>] ? __ww_mutex_lock+0x39/0x40
[ 8.014319] [<ffffffffa0053b3d>] drm_mode_plane_set_obj_prop+0x2d/0x90 [drm]
[ 8.014328] [<ffffffffa00c8edb>] restore_fbdev_mode+0x6b/0xf0 [drm_kms_helper]
[ 8.014337] [<ffffffffa00cae49>] drm_fb_helper_restore_fbdev_mode_unlocked+0x29/0x80 [drm_kms_helper]
[ 8.014346] [<ffffffffa00caec2>] drm_fb_helper_set_par+0x22/0x50 [drm_kms_helper]
[ 8.014390] [<ffffffffa016dfba>] intel_fbdev_set_par+0x1a/0x60 [i915]
[ 8.014394] [<ffffffff81327dc4>] fbcon_init+0x4f4/0x580
[ 8.014398] [<ffffffff8139ef4c>] visual_init+0xbc/0x120
[ 8.014401] [<ffffffff813a1623>] do_bind_con_driver+0x163/0x330
[ 8.014405] [<ffffffff813a1b2c>] do_take_over_console+0x11c/0x1c0
[ 8.014408] [<ffffffff813236e3>] do_fbcon_takeover+0x63/0xd0
[ 8.014410] [<ffffffff81328965>] fbcon_event_notify+0x785/0x8d0
[ 8.014413] [<ffffffff8107c12d>] ? __might_sleep+0x4d/0x90
[ 8.014416] [<ffffffff810775fe>] notifier_call_chain+0x4e/0x80
[ 8.014419] [<ffffffff810779cd>] __blocking_notifier_call_chain+0x4d/0x70
[ 8.014422] [<ffffffff81077a06>] blocking_notifier_call_chain+0x16/0x20
[ 8.014425] [<ffffffff8132b48b>] fb_notifier_call_chain+0x1b/0x20
[ 8.014428] [<ffffffff8132d8fa>] register_framebuffer+0x21a/0x350
[ 8.014439] [<ffffffffa00cb164>] drm_fb_helper_initial_config+0x274/0x3e0 [drm_kms_helper]
[ 8.014483] [<ffffffffa016f1cb>] intel_fbdev_initial_config+0x1b/0x20 [i915]
[ 8.014486] [<ffffffff8107912c>] async_run_entry_fn+0x4c/0x160
[ 8.014490] [<ffffffff81070ffa>] process_one_work+0x14a/0x470
[ 8.014493] [<ffffffff81071489>] worker_thread+0x169/0x4c0
[ 8.014496] [<ffffffff81071320>] ? process_one_work+0x470/0x470
[ 8.014499] [<ffffffff81076606>] kthread+0xc6/0xe0
[ 8.014502] [<ffffffff81070000>] ? queue_work_on+0x80/0x110
[ 8.014506] [<ffffffff81076540>] ? kthread_worker_fn+0x1c0/0x1c0
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-rt-users <linux-rt-users@vger.kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
206/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: drm/i915: disable tracing on -RT
Date: Thu, 6 Dec 2018 09:52:20 +0100
Luca Abeni reported this:
| BUG: scheduling while atomic: kworker/u8:2/15203/0x00000003
| CPU: 1 PID: 15203 Comm: kworker/u8:2 Not tainted 4.19.1-rt3 #10
| Call Trace:
| rt_spin_lock+0x3f/0x50
| gen6_read32+0x45/0x1d0 [i915]
| g4x_get_vblank_counter+0x36/0x40 [i915]
| trace_event_raw_event_i915_pipe_update_start+0x7d/0xf0 [i915]
The tracing events use trace_i915_pipe_update_start() among other events
use functions acquire spin locks. A few trace points use
intel_get_crtc_scanline(), others use ->get_vblank_counter() wich also
might acquire a sleeping lock.
Based on this I don't see any other way than disable trace points on RT.
Cc: stable-rt@vger.kernel.org
Reported-by: Luca Abeni <lucabe72@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
207/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: drm/i915: skip DRM_I915_LOW_LEVEL_TRACEPOINTS with NOTRACE
Date: Wed, 19 Dec 2018 10:47:02 +0100
The order of the header files is important. If this header file is
included after tracepoint.h was included then the NOTRACE here becomes a
nop. Currently this happens for two .c files which use the tracepoitns
behind DRM_I915_LOW_LEVEL_TRACEPOINTS.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
208/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: drm/i915: Don't disable interrupts for intel_engine_breadcrumbs_irq()
Date: Thu, 26 Sep 2019 12:29:05 +0200
The function intel_engine_breadcrumbs_irq() is always invoked from an interrupt
handler and for that reason it invokes (as an optimisation) only spin_lock()
for locking assuming that the interrupts are already disabled. The
function intel_engine_signal_breadcrumbs() is provided to disable
interrupts while the former function is invoked so that assumption is
also true for callers from preemptible context.
On PREEMPT_RT local_irq_disable() really disables interrupts and this
forbids to invoke spin_lock() which becomes a sleeping spinlock.
This is also problematic with `threadirqs' in conjunction with
irq_work. With force threading the interrupt handler, the handler is
invoked with disabled BH but with interrupts enabled. This is okay and
the lock itself is never acquired in IRQ context. This changes with
irq_work (signal_irq_work()) which _still_ invokes
intel_engine_breadcrumbs_irq() from IRQ context. Lockdep should see this
and complain.
Acquire the locks in intel_engine_breadcrumbs_irq() with _irqsave()
suffix and let all callers invoke intel_engine_breadcrumbs_irq()
directly instead using intel_engine_signal_breadcrumbs().
Reported-by: Clark Williams <williams@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
209/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: drm/i915: Drop the IRQ-off asserts
Date: Thu, 26 Sep 2019 12:30:21 +0200
The lockdep_assert_irqs_disabled() check is needless. The previous
lockdep_assert_held() check ensures that the lock is acquired and while
the lock is acquired lockdep also prints a warning if the interrupts are
not disabled if they have to be.
These IRQ-off asserts trigger on PREEMPT_RT because the locks become
sleeping locks and do not really disable interrupts.
Remove lockdep_assert_irqs_disabled().
Reported-by: Clark Williams <williams@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
210/252 [
Author: Mike Galbraith
Email: efault@gmx.de
Subject: cpuset: Convert callback_lock to raw_spinlock_t
Date: Sun, 8 Jan 2017 09:32:25 +0100
The two commits below add up to a cpuset might_sleep() splat for RT:
8447a0fee974 cpuset: convert callback_mutex to a spinlock
344736f29b35 cpuset: simplify cpuset_node_allowed API
BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:995
in_atomic(): 0, irqs_disabled(): 1, pid: 11718, name: cset
CPU: 135 PID: 11718 Comm: cset Tainted: G E 4.10.0-rt1-rt #4
Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0056.R01.1409242327 09/24/2014
Call Trace:
? dump_stack+0x5c/0x81
? ___might_sleep+0xf4/0x170
? rt_spin_lock+0x1c/0x50
? __cpuset_node_allowed+0x66/0xc0
? ___slab_alloc+0x390/0x570 <disables IRQs>
? anon_vma_fork+0x8f/0x140
? copy_page_range+0x6cf/0xb00
? anon_vma_fork+0x8f/0x140
? __slab_alloc.isra.74+0x5a/0x81
? anon_vma_fork+0x8f/0x140
? kmem_cache_alloc+0x1b5/0x1f0
? anon_vma_fork+0x8f/0x140
? copy_process.part.35+0x1670/0x1ee0
? _do_fork+0xdd/0x3f0
? _do_fork+0xdd/0x3f0
? do_syscall_64+0x61/0x170
? entry_SYSCALL64_slow_path+0x25/0x25
The later ensured that a NUMA box WILL take callback_lock in atomic
context by removing the allocator and reclaim path __GFP_HARDWALL
usage which prevented such contexts from taking callback_mutex.
One option would be to reinstate __GFP_HARDWALL protections for
RT, however, as the 8447a0fee974 changelog states:
The callback_mutex is only used to synchronize reads/updates of cpusets'
flags and cpu/node masks. These operations should always proceed fast so
there's no reason why we can't use a spinlock instead of the mutex.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
211/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: apparmor: use a locallock instead preempt_disable()
Date: Wed, 11 Oct 2017 17:43:49 +0200
get_buffers() disables preemption which acts as a lock for the per-CPU
variable. Since we can't disable preemption here on RT, a local_lock is
lock is used in order to remain on the same CPU and not to have more
than one user within the critical section.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
212/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: x86: Allow to enable RT
Date: Wed, 7 Aug 2019 18:15:38 +0200
Allow to select RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
213/252 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: mm, rt: kmap_atomic scheduling
Date: Thu, 28 Jul 2011 10:43:51 +0200
In fact, with migrate_disable() existing one could play games with
kmap_atomic. You could save/restore the kmap_atomic slots on context
switch (if there are any in use of course), this should be esp easy now
that we have a kmap_atomic stack.
Something like the below.. it wants replacing all the preempt_disable()
stuff with pagefault_disable() && migrate_disable() of course, but then
you can flip kmaps around like below.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
[dvhart@linux.intel.com: build fix]
Link: http://lkml.kernel.org/r/1311842631.5890.208.camel@twins
[tglx@linutronix.de: Get rid of the per cpu variable and store the idx
and the pte content right away in the task struct.
Shortens the context switch code. ]
]
214/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: x86/highmem: Add a "already used pte" check
Date: Mon, 11 Mar 2013 17:09:55 +0100
This is a copy from kmap_atomic_prot().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
215/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: arm/highmem: Flush tlb on unmap
Date: Mon, 11 Mar 2013 21:37:27 +0100
The tlb should be flushed on unmap and thus make the mapping entry
invalid. This is only done in the non-debug case which does not look
right.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
216/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: arm: Enable highmem for rt
Date: Wed, 13 Feb 2013 11:03:11 +0100
fixup highmem for ARM.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
217/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mm/scatterlist: Do not disable irqs on RT
Date: Fri, 3 Jul 2009 08:44:34 -0500
For -RT it is enough to keep pagefault disabled (which is currently handled by
kmap_atomic()).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
218/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Add support for lazy preemption
Date: Fri, 26 Oct 2012 18:50:54 +0100
It has become an obsession to mitigate the determinism vs. throughput
loss of RT. Looking at the mainline semantics of preemption points
gives a hint why RT sucks throughput wise for ordinary SCHED_OTHER
tasks. One major issue is the wakeup of tasks which are right away
preempting the waking task while the waking task holds a lock on which
the woken task will block right after having preempted the wakee. In
mainline this is prevented due to the implicit preemption disable of
spin/rw_lock held regions. On RT this is not possible due to the fully
preemptible nature of sleeping spinlocks.
Though for a SCHED_OTHER task preempting another SCHED_OTHER task this
is really not a correctness issue. RT folks are concerned about
SCHED_FIFO/RR tasks preemption and not about the purely fairness
driven SCHED_OTHER preemption latencies.
So I introduced a lazy preemption mechanism which only applies to
SCHED_OTHER tasks preempting another SCHED_OTHER task. Aside of the
existing preempt_count each tasks sports now a preempt_lazy_count
which is manipulated on lock acquiry and release. This is slightly
incorrect as for lazyness reasons I coupled this on
migrate_disable/enable so some other mechanisms get the same treatment
(e.g. get_cpu_light).
Now on the scheduler side instead of setting NEED_RESCHED this sets
NEED_RESCHED_LAZY in case of a SCHED_OTHER/SCHED_OTHER preemption and
therefor allows to exit the waking task the lock held region before
the woken task preempts. That also works better for cross CPU wakeups
as the other side can stay in the adaptive spinning loop.
For RT class preemption there is no change. This simply sets
NEED_RESCHED and forgoes the lazy preemption counter.
Initial test do not expose any observable latency increasement, but
history shows that I've been proven wrong before :)
The lazy preemption mode is per default on, but with
CONFIG_SCHED_DEBUG enabled it can be disabled via:
# echo NO_PREEMPT_LAZY >/sys/kernel/debug/sched_features
and reenabled via
# echo PREEMPT_LAZY >/sys/kernel/debug/sched_features
The test results so far are very machine and workload dependent, but
there is a clear trend that it enhances the non RT workload
performance.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
219/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: x86: Support for lazy preemption
Date: Thu, 1 Nov 2012 11:03:47 +0100
Implement the x86 pieces for lazy preempt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
220/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: arm: Add support for lazy preemption
Date: Wed, 31 Oct 2012 12:04:11 +0100
Implement the arm pieces for lazy preempt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
221/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: powerpc: Add support for lazy preemption
Date: Thu, 1 Nov 2012 10:14:11 +0100
Implement the powerpc pieces for lazy preempt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
222/252 [
Author: Anders Roxell
Email: anders.roxell@linaro.org
Subject: arch/arm64: Add lazy preempt support
Date: Thu, 14 May 2015 17:52:17 +0200
arm64 is missing support for PREEMPT_RT. The main feature which is
lacking is support for lazy preemption. The arch-specific entry code,
thread information structure definitions, and associated data tables
have to be extended to provide this support. Then the Kconfig file has
to be extended to indicate the support is available, and also to
indicate that support for full RT preemption is now available.
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
]
223/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: jump-label: disable if stop_machine() is used
Date: Wed, 8 Jul 2015 17:14:48 +0200
Some architectures are using stop_machine() while switching the opcode which
leads to latency spikes.
The architectures which use stop_machine() atm:
- ARM stop machine
- s390 stop machine
The architecures which use other sorcery:
- MIPS
- X86
- powerpc
- sparc
- arm64
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy: only ARM for now]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
224/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: leds: trigger: disable CPU trigger on -RT
Date: Thu, 23 Jan 2014 14:45:59 +0100
as it triggers:
|CPU: 0 PID: 0 Comm: swapper Not tainted 3.12.8-rt10 #141
|[<c0014aa4>] (unwind_backtrace+0x0/0xf8) from [<c0012788>] (show_stack+0x1c/0x20)
|[<c0012788>] (show_stack+0x1c/0x20) from [<c043c8dc>] (dump_stack+0x20/0x2c)
|[<c043c8dc>] (dump_stack+0x20/0x2c) from [<c004c5e8>] (__might_sleep+0x13c/0x170)
|[<c004c5e8>] (__might_sleep+0x13c/0x170) from [<c043f270>] (__rt_spin_lock+0x28/0x38)
|[<c043f270>] (__rt_spin_lock+0x28/0x38) from [<c043fa00>] (rt_read_lock+0x68/0x7c)
|[<c043fa00>] (rt_read_lock+0x68/0x7c) from [<c036cf74>] (led_trigger_event+0x2c/0x5c)
|[<c036cf74>] (led_trigger_event+0x2c/0x5c) from [<c036e0bc>] (ledtrig_cpu+0x54/0x5c)
|[<c036e0bc>] (ledtrig_cpu+0x54/0x5c) from [<c000ffd8>] (arch_cpu_idle_exit+0x18/0x1c)
|[<c000ffd8>] (arch_cpu_idle_exit+0x18/0x1c) from [<c00590b8>] (cpu_startup_entry+0xa8/0x234)
|[<c00590b8>] (cpu_startup_entry+0xa8/0x234) from [<c043b2cc>] (rest_init+0xb8/0xe0)
|[<c043b2cc>] (rest_init+0xb8/0xe0) from [<c061ebe0>] (start_kernel+0x2c4/0x380)
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
225/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: tty/serial/omap: Make the locking RT aware
Date: Thu, 28 Jul 2011 13:32:57 +0200
The lock is a sleeping lock and local_irq_save() is not the
optimsation we are looking for. Redo it to make it work on -RT and
non-RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
226/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: tty/serial/pl011: Make the locking work on RT
Date: Tue, 8 Jan 2013 21:36:51 +0100
The lock is a sleeping lock and local_irq_save() is not the optimsation
we are looking for. Redo it to make it work on -RT and non-RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
227/252 [
Author: Kurt Kanzenbach
Email: kurt@linutronix.de
Subject: tty: serial: pl011: explicitly initialize the flags variable
Date: Mon, 24 Sep 2018 10:29:01 +0200
Silence the following gcc warning:
drivers/tty/serial/amba-pl011.c: In function ‘pl011_console_write’:
./include/linux/spinlock.h:260:3: warning: ‘flags’ may be used uninitialized in this function [-Wmaybe-uninitialized]
_raw_spin_unlock_irqrestore(lock, flags); \
^~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/tty/serial/amba-pl011.c:2214:16: note: ‘flags’ was declared here
unsigned long flags;
^~~~~
The code is correct. Thus, initializing flags to zero doesn't change the
behavior and resolves the warning.
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
228/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: arm: include definition for cpumask_t
Date: Thu, 22 Dec 2016 17:28:33 +0100
This definition gets pulled in by other files. With the (later) split of
RCU and spinlock.h it won't compile anymore.
The split is done in ("rbtree: don't include the rcu header").
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
229/252 [
Author: Yadi.hu
Email: yadi.hu@windriver.com
Subject: ARM: enable irq in translation/section permission fault handlers
Date: Wed, 10 Dec 2014 10:32:09 +0800
Probably happens on all ARM, with
CONFIG_PREEMPT_RT
CONFIG_DEBUG_ATOMIC_SLEEP
This simple program....
int main() {
*((char*)0xc0001000) = 0;
};
[ 512.742724] BUG: sleeping function called from invalid context at kernel/rtmutex.c:658
[ 512.743000] in_atomic(): 0, irqs_disabled(): 128, pid: 994, name: a
[ 512.743217] INFO: lockdep is turned off.
[ 512.743360] irq event stamp: 0
[ 512.743482] hardirqs last enabled at (0): [< (null)>] (null)
[ 512.743714] hardirqs last disabled at (0): [<c0426370>] copy_process+0x3b0/0x11c0
[ 512.744013] softirqs last enabled at (0): [<c0426370>] copy_process+0x3b0/0x11c0
[ 512.744303] softirqs last disabled at (0): [< (null)>] (null)
[ 512.744631] [<c041872c>] (unwind_backtrace+0x0/0x104)
[ 512.745001] [<c09af0c4>] (dump_stack+0x20/0x24)
[ 512.745355] [<c0462490>] (__might_sleep+0x1dc/0x1e0)
[ 512.745717] [<c09b6770>] (rt_spin_lock+0x34/0x6c)
[ 512.746073] [<c0441bf0>] (do_force_sig_info+0x34/0xf0)
[ 512.746457] [<c0442668>] (force_sig_info+0x18/0x1c)
[ 512.746829] [<c041d880>] (__do_user_fault+0x9c/0xd8)
[ 512.747185] [<c041d938>] (do_bad_area+0x7c/0x94)
[ 512.747536] [<c041d990>] (do_sect_fault+0x40/0x48)
[ 512.747898] [<c040841c>] (do_DataAbort+0x40/0xa0)
[ 512.748181] Exception stack(0xecaa1fb0 to 0xecaa1ff8)
Oxc0000000 belongs to kernel address space, user task can not be
allowed to access it. For above condition, correct result is that
test case should receive a “segment fault” and exits but not stacks.
the root cause is commit 02fe2845d6a8 ("avoid enabling interrupts in
prefetch/data abort handlers"),it deletes irq enable block in Data
abort assemble code and move them into page/breakpiont/alignment fault
handlers instead. But author does not enable irq in translation/section
permission fault handlers. ARM disables irq when it enters exception/
interrupt mode, if kernel doesn't enable irq, it would be still disabled
during translation/section permission fault.
We see the above splat because do_force_sig_info is still called with
IRQs off, and that code eventually does a:
spin_lock_irqsave(&t->sighand->siglock, flags);
As this is architecture independent code, and we've not seen any other
need for other arch to have the siglock converted to raw lock, we can
conclude that we should enable irq for ARM translation/section
permission exception.
Signed-off-by: Yadi.hu <yadi.hu@windriver.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
230/252 [
Author: Josh Cartwright
Email: joshc@ni.com
Subject: genirq: update irq_set_irqchip_state documentation
Date: Thu, 11 Feb 2016 11:54:00 -0600
On -rt kernels, the use of migrate_disable()/migrate_enable() is
sufficient to guarantee a task isn't moved to another CPU. Update the
irq_set_irqchip_state() documentation to reflect this.
Signed-off-by: Josh Cartwright <joshc@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
231/252 [
Author: Josh Cartwright
Email: joshc@ni.com
Subject: KVM: arm/arm64: downgrade preempt_disable()d region to migrate_disable()
Date: Thu, 11 Feb 2016 11:54:01 -0600
kvm_arch_vcpu_ioctl_run() disables the use of preemption when updating
the vgic and timer states to prevent the calling task from migrating to
another CPU. It does so to prevent the task from writing to the
incorrect per-CPU GIC distributor registers.
On -rt kernels, it's possible to maintain the same guarantee with the
use of migrate_{disable,enable}(), with the added benefit that the
migrate-disabled region is preemptible. Update
kvm_arch_vcpu_ioctl_run() to do so.
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Reported-by: Manish Jaggi <Manish.Jaggi@caviumnetworks.com>
Signed-off-by: Josh Cartwright <joshc@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
232/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: arm64: fpsimd: Delay freeing memory in fpsimd_flush_thread()
Date: Wed, 25 Jul 2018 14:02:38 +0200
fpsimd_flush_thread() invokes kfree() via sve_free() within a preempt disabled
section which is not working on -RT.
Delay freeing of memory until preemption is enabled again.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
233/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: arm: at91: do not disable/enable clocks in a row
Date: Wed, 9 Mar 2016 10:51:06 +0100
Currently the driver will disable the clock and enable it one line later
if it is switching from periodic mode into one shot.
This can be avoided and causes a needless warning on -RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
234/252 [
Author: Benedikt Spranger
Email: b.spranger@linutronix.de
Subject: clocksource: TCLIB: Allow higher clock rates for clock events
Date: Mon, 8 Mar 2010 18:57:04 +0100
As default the TCLIB uses the 32KiHz base clock rate for clock events.
Add a compile time selection to allow higher clock resulution.
(fixed up by Sami Pietikäinen <Sami.Pietikainen@wapice.com>)
Signed-off-by: Benedikt Spranger <b.spranger@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
235/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: x86: Enable RT also on 32bit
Date: Thu, 7 Nov 2019 17:49:20 +0100
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
236/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: ARM: Allow to enable RT
Date: Fri, 11 Oct 2019 13:14:29 +0200
Allow to select RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
237/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: ARM64: Allow to enable RT
Date: Fri, 11 Oct 2019 13:14:35 +0200
Allow to select RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
238/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: powerpc/pseries/iommu: Use a locallock instead local_irq_save()
Date: Tue, 26 Mar 2019 18:31:54 +0100
The locallock protects the per-CPU variable tce_page. The function
attempts to allocate memory while tce_page is protected (by disabling
interrupts).
Use local_irq_save() instead of local_irq_disable().
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
239/252 [
Author: Bogdan Purcareata
Email: bogdan.purcareata@freescale.com
Subject: powerpc/kvm: Disable in-kernel MPIC emulation for PREEMPT_RT
Date: Fri, 24 Apr 2015 15:53:13 +0000
While converting the openpic emulation code to use a raw_spinlock_t enables
guests to run on RT, there's still a performance issue. For interrupts sent in
directed delivery mode with a multiple CPU mask, the emulated openpic will loop
through all of the VCPUs, and for each VCPUs, it call IRQ_check, which will loop
through all the pending interrupts for that VCPU. This is done while holding the
raw_lock, meaning that in all this time the interrupts and preemption are
disabled on the host Linux. A malicious user app can max both these number and
cause a DoS.
This temporary fix is sent for two reasons. First is so that users who want to
use the in-kernel MPIC emulation are aware of the potential latencies, thus
making sure that the hardware MPIC and their usage scenario does not involve
interrupts sent in directed delivery mode, and the number of possible pending
interrupts is kept small. Secondly, this should incentivize the development of a
proper openpic emulation that would be better suited for RT.
Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Bogdan Purcareata <bogdan.purcareata@freescale.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
240/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: powerpc: Disable highmem on RT
Date: Mon, 18 Jul 2011 17:08:34 +0200
The current highmem handling on -RT is not compatible and needs fixups.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
241/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: powerpc/stackprotector: work around stack-guard init from atomic
Date: Tue, 26 Mar 2019 18:31:29 +0100
This is invoked from the secondary CPU in atomic context. On x86 we use
tsc instead. On Power we XOR it against mftb() so lets use stack address
as the initial value.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
242/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: POWERPC: Allow to enable RT
Date: Fri, 11 Oct 2019 13:14:41 +0200
Allow to select RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
243/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mips: Disable highmem on RT
Date: Mon, 18 Jul 2011 17:10:12 +0200
The current highmem handling on -RT is not compatible and needs fixups.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
244/252 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: connector/cn_proc: Protect send_msg() with a local lock on RT
Date: Sun, 16 Oct 2016 05:11:54 +0200
|BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:931
|in_atomic(): 1, irqs_disabled(): 0, pid: 31807, name: sleep
|Preemption disabled at:[<ffffffff8148019b>] proc_exit_connector+0xbb/0x140
|
|CPU: 4 PID: 31807 Comm: sleep Tainted: G W E 4.8.0-rt11-rt #106
|Call Trace:
| [<ffffffff813436cd>] dump_stack+0x65/0x88
| [<ffffffff8109c425>] ___might_sleep+0xf5/0x180
| [<ffffffff816406b0>] __rt_spin_lock+0x20/0x50
| [<ffffffff81640978>] rt_read_lock+0x28/0x30
| [<ffffffff8156e209>] netlink_broadcast_filtered+0x49/0x3f0
| [<ffffffff81522621>] ? __kmalloc_reserve.isra.33+0x31/0x90
| [<ffffffff8156e5cd>] netlink_broadcast+0x1d/0x20
| [<ffffffff8147f57a>] cn_netlink_send_mult+0x19a/0x1f0
| [<ffffffff8147f5eb>] cn_netlink_send+0x1b/0x20
| [<ffffffff814801d8>] proc_exit_connector+0xf8/0x140
| [<ffffffff81077f71>] do_exit+0x5d1/0xba0
| [<ffffffff810785cc>] do_group_exit+0x4c/0xc0
| [<ffffffff81078654>] SyS_exit_group+0x14/0x20
| [<ffffffff81640a72>] entry_SYSCALL_64_fastpath+0x1a/0xa4
Since ab8ed951080e ("connector: fix out-of-order cn_proc netlink message
delivery") which is v4.7-rc6.
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
245/252 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: drivers/block/zram: Replace bit spinlocks with rtmutex for -rt
Date: Thu, 31 Mar 2016 04:08:28 +0200
They're nondeterministic, and lead to ___might_sleep() splats in -rt.
OTOH, they're a lot less wasteful than an rtmutex per page.
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
246/252 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: drivers/zram: Don't disable preemption in zcomp_stream_get/put()
Date: Thu, 20 Oct 2016 11:15:22 +0200
In v4.7, the driver switched to percpu compression streams, disabling
preemption via get/put_cpu_ptr(). Use a per-zcomp_strm lock here. We
also have to fix an lock order issue in zram_decompress_page() such
that zs_map_object() nests inside of zcomp_stream_put() as it does in
zram_bvec_write().
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
[bigeasy: get_locked_var() -> per zcomp_strm lock]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
247/252 [
Author: Julia Cartwright
Email: julia@ni.com
Subject: squashfs: make use of local lock in multi_cpu decompressor
Date: Mon, 7 May 2018 08:58:57 -0500
Currently, the squashfs multi_cpu decompressor makes use of
get_cpu_ptr()/put_cpu_ptr(), which unconditionally disable preemption
during decompression.
Because the workload is distributed across CPUs, all CPUs can observe a
very high wakeup latency, which has been seen to be as much as 8000us.
Convert this decompressor to make use of a local lock, which will allow
execution of the decompressor with preemption-enabled, but also ensure
concurrent accesses to the percpu compressor data on the local CPU will
be serialized.
Cc: stable-rt@vger.kernel.org
Reported-by: Alexander Stein <alexander.stein@systec-electronic.com>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Julia Cartwright <julia@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
248/252 [
Author: Haris Okanovic
Email: haris.okanovic@ni.com
Subject: tpm_tis: fix stall after iowrite*()s
Date: Tue, 15 Aug 2017 15:13:08 -0500
ioread8() operations to TPM MMIO addresses can stall the cpu when
immediately following a sequence of iowrite*()'s to the same region.
For example, cyclitest measures ~400us latency spikes when a non-RT
usermode application communicates with an SPI-based TPM chip (Intel Atom
E3940 system, PREEMPT_RT kernel). The spikes are caused by a
stalling ioread8() operation following a sequence of 30+ iowrite8()s to
the same address. I believe this happens because the write sequence is
buffered (in cpu or somewhere along the bus), and gets flushed on the
first LOAD instruction (ioread*()) that follows.
The enclosed change appears to fix this issue: read the TPM chip's
access register (status code) after every iowrite*() operation to
amortize the cost of flushing data to chip across multiple instructions.
Signed-off-by: Haris Okanovic <haris.okanovic@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
249/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: signals: Allow rt tasks to cache one sigqueue struct
Date: Fri, 3 Jul 2009 08:44:56 -0500
To avoid allocation allow rt tasks to cache one sigqueue struct in
task struct.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
250/252 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: genirq: Disable irqpoll on -rt
Date: Fri, 3 Jul 2009 08:29:57 -0500
Creates long latencies for no value
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
251/252 [
Author: Clark Williams
Email: williams@redhat.com
Subject: sysfs: Add /sys/kernel/realtime entry
Date: Sat, 30 Jul 2011 21:55:53 -0500
Add a /sys/kernel entry to indicate that the kernel is a
realtime kernel.
Clark says that he needs this for udev rules, udev needs to evaluate
if its a PREEMPT_RT kernel a few thousand times and parsing uname
output is too slow or so.
Are there better solutions? Should it exist and return 0 on !-rt?
Signed-off-by: Clark Williams <williams@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
]
252/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: Add localversion for -RT release
Date: Fri, 8 Jul 2011 20:25:16 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
Signed-off-by: Bruce Ashfield <bruce.ashfield@windriver.com>
|
|
1/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: libsas: remove irq save in sas_ata_qc_issue()
Date: Thu, 12 Apr 2018 09:16:22 +0200
[ upstream commit 2da11d4262639dc0e2fabc6a70886db57af25c43 ]
Since commit 312d3e56119a ("[SCSI] libsas: remove ata_port.lock
management duties from lldds") the sas_ata_qc_issue() function unlocks
the ata_port.lock and disables interrupts before doing so.
That lock is always taken with disabled interrupts so at this point, the
interrupts are already disabled. There is no need to disable the
interrupts before the unlock operation because they are already
disabled.
Restoring the interrupt state later does not change anything because
they were disabled and remain disabled. Therefore remove the operations
which do not change the behaviour.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
2/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: qla2xxx: remove irq save in qla2x00_poll()
Date: Thu, 12 Apr 2018 09:55:25 +0200
[ upstream commit b3a8aa90c46095cbad454eb068bfb5a8eb56d4e3 ]
In commit d2ba5675d899 ("[SCSI] qla2xxx: Disable local-interrupts while
polling for RISC status.") added a local_irq_disable() before invoking
the ->intr_handler callback. The function, which was used in this
callback, did not disable interrupts while acquiring the spin_lock so a
deadlock was possible and this change was one possible solution.
The function in question was qla2300_intr_handler() and is using
spin_lock_irqsave() since commit 43fac4d97a1a ("[SCSI] qla2xxx: Resolve
a performance issue in interrupt").
I checked all other ->intr_handler callbacks and all of them use the
irqsave variant so it is safe to remove the local_irq_save() block now.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
3/258 [
Author: "Steven Rostedt (VMware)"
Email: rostedt@goodmis.org
Subject: cgroup/tracing: Move taking of spin lock out of trace event handlers
Date: Mon, 9 Jul 2018 17:48:54 -0400
[ Upstream commit e4f8d81c738db6d3ffdabfb8329aa2feaa310699 ]
It is unwise to take spin locks from the handlers of trace events.
Mainly, because they can introduce lockups, because it introduces locks
in places that are normally not tested. Worse yet, because trace events
are tucked away in the include/trace/events/ directory, locks that are
taken there are forgotten about.
As a general rule, I tell people never to take any locks in a trace
event handler.
Several cgroup trace event handlers call cgroup_path() which eventually
takes the kernfs_rename_lock spinlock. This injects the spinlock in the
code without people realizing it. It also can cause issues for the
PREEMPT_RT patch, as the spinlock becomes a mutex, and the trace event
handlers are called with preemption disabled.
By moving the calculation of the cgroup_path() out of the trace event
handlers and into a macro (surrounded by a
trace_cgroup_##type##_enabled()), then we could place the cgroup_path
into a string, and pass that to the trace event. Not only does this
remove the taking of the spinlock out of the trace event handler, but
it also means that the cgroup_path() only needs to be called once (it
is currently called twice, once to get the length to reserver the
buffer for, and once again to get the path itself. Now it only needs to
be done once.
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
4/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: sched/core: Remove get_cpu() from sched_fork()
Date: Fri, 6 Jul 2018 15:06:15 +0200
[ Upstream commit af0fffd9300b97d8875aa745bc78e2f6fdb3c1f0 ]
get_cpu() disables preemption for the entire sched_fork() function.
This get_cpu() was introduced in commit:
dd41f596cda0 ("sched: cfs core code")
... which also invoked sched_balance_self() and this function
required preemption do be off.
Today, sched_balance_self() seems to be moved to ->task_fork callback
which is invoked while the ->pi_lock is held.
set_load_weight() could invoke reweight_task() which then via $callchain
might end up in smp_processor_id() but since `update_load' is false
this won't happen.
I didn't find any this_cpu*() or similar usage during the initialisation
of the task_struct.
The `cpu' value (from get_cpu()) is only used later in __set_task_cpu()
while the ->pi_lock lock is held.
Based on this it is possible to remove get_cpu() and use
smp_processor_id() for the `cpu' variable without breaking anything.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180706130615.g2ex2kmfu5kcvlq6@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
]
5/258 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: random: Remove preempt disabled region
Date: Fri, 3 Jul 2009 08:29:30 -0500
No need to keep preemption disabled across the whole function.
mix_pool_bytes() uses a spin_lock() to protect the pool and there are
other places like write_pool() whhich invoke mix_pool_bytes() without
disabling preemption.
credit_entropy_bits() is invoked from other places like
add_hwgenerator_randomness() without disabling preemption.
Before commit 95b709b6be49 ("random: drop trickle mode") the function
used __this_cpu_inc_return() which would require disabled preemption.
The preempt_disable() section was added in commit 43d5d3018c37 ("[PATCH]
random driver preempt robustness", history tree). It was claimed that
the code relied on "vt_ioctl() being called under BKL".
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy: enhance the commit message]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
6/258 [
Author: Anna-Maria Gleixner
Email: anna-maria@linutronix.de
Subject: iommu/amd: Remove redundant WARN_ON()
Date: Fri, 20 Jul 2018 10:45:45 +0200
The WARN_ON() was introduced in commit 272e4f99e966 ("iommu/amd: WARN
when __[attach|detach]_device are called with irqs enabled") to ensure
that the domain->lock is taken in proper irqs disabled context. This
is required, because the domain->lock is taken as well in irq
context.
The proper context check by the WARN_ON() is redundant, because it is
already covered by LOCKDEP. When working with locks and changing
context, a run with LOCKDEP is required anyway and would detect the
wrong lock context.
Furthermore all callers for those functions are within the same file
and all callers acquire another lock which already disables interrupts.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
7/258 [
Author: Anna-Maria Gleixner
Email: anna-maria@linutronix.de
Subject: drivers/md/raid5: Use irqsave variant of atomic_dec_and_lock()
Date: Fri, 4 May 2018 17:45:32 +0200
The irqsave variant of atomic_dec_and_lock handles irqsave/restore when
taking/releasing the spin lock. With this variant the call of
local_irq_save is no longer required.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
8/258 [
Author: Anna-Maria Gleixner
Email: anna-maria@linutronix.de
Subject: drivers/md/raid5: Do not disable irq on release_inactive_stripe_list() call
Date: Fri, 4 May 2018 17:45:33 +0200
There is no need to invoke release_inactive_stripe_list() with interrupts
disabled. All call sites, except raid5_release_stripe(), unlock
->device_lock and enable interrupts before invoking the function.
Make it consistent.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
9/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: bdi: use refcount_t for reference counting instead atomic_t
Date: Mon, 7 May 2018 16:51:09 +0200
refcount_t type and corresponding API should be used instead of atomic_t when
the variable is used as a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free situations.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
10/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: userns: use refcount_t for reference counting instead atomic_t
Date: Mon, 7 May 2018 17:09:42 +0200
refcount_t type and corresponding API should be used instead of atomic_t when
the variable is used as a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free situations.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
11/258 [
Author: Anna-Maria Gleixner
Email: anna-maria@linutronix.de
Subject: bdi: Use irqsave variant of refcount_dec_and_lock()
Date: Wed, 4 Apr 2018 11:43:56 +0200
The irqsave variant of refcount_dec_and_lock handles irqsave/restore when
taking/releasing the spin lock. With this variant the call of
local_irq_save/restore is no longer required.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
[bigeasy: s@atomic_dec_and_lock@refcount_dec_and_lock@g ]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
12/258 [
Author: Anna-Maria Gleixner
Email: anna-maria@linutronix.de
Subject: userns: Use irqsave variant of refcount_dec_and_lock()
Date: Wed, 4 Apr 2018 11:43:57 +0200
The irqsave variant of refcount_dec_and_lock handles irqsave/restore when
taking/releasing the spin lock. With this variant the call of
local_irq_save/restore is no longer required.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
[bigeasy: s@atomic_dec_and_lock@refcount_dec_and_lock@g ]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
13/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: libata: remove ata_sff_data_xfer_noirq()
Date: Thu, 19 Apr 2018 12:55:14 +0200
ata_sff_data_xfer_noirq() is invoked via the ->sff_data_xfer hook. The
latter is invoked by ata_pio_sector(), atapi_send_cdb() and
__atapi_pio_bytes() which in turn is invoked by ata_sff_hsm_move().
The latter function requires that the "ap->lock" lock is held which
needs to be taken with disabled interrupts.
There is no need have to have ata_sff_data_xfer_noirq() which invokes
ata_sff_data_xfer32() with disabled interrupts because at this point the
interrupts are already disabled.
Remove the function and its references to it and replace all callers
with ata_sff_data_xfer32().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
14/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: ntfs: don't disable interrupts during kmap_atomic()
Date: Tue, 10 Apr 2018 17:54:32 +0200
ntfs_end_buffer_async_read() disables interrupts around kmap_atomic(). This is
a leftover from the old kmap_atomic() implementation which relied on fixed
mapping slots, so the caller had to make sure that the same slot could not be
reused from an interrupting context.
kmap_atomic() was changed to dynamic slots long ago and commit 1ec9c5ddc17a
("include/linux/highmem.h: remove the second argument of k[un]map_atomic()")
removed the slot assignements, but the callers were not checked for now
redundant interrupt disabling.
Remove the conditional interrupt disable.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
15/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: mm: workingset: remove local_irq_disable() from count_shadow_nodes()
Date: Fri, 22 Jun 2018 10:48:51 +0200
In commit 0c7c1bed7e13 ("mm: make counting of list_lru_one::nr_items
lockless") the
spin_lock(&nlru->lock);
statement was replaced with
rcu_read_lock();
in __list_lru_count_one(). The comment in count_shadow_nodes() says that
the local_irq_disable() is required because the lock must be acquired
with disabled interrupts and (spin_lock()) does not do so.
Since the lock is replaced with rcu_read_lock() the local_irq_disable()
is no longer needed. The code path is
list_lru_shrink_count()
-> list_lru_count_one()
-> __list_lru_count_one()
-> rcu_read_lock()
-> list_lru_from_memcg_idx()
-> rcu_read_unlock()
Remove the local_irq_disable() statement.
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
16/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: mm: workingset: make shadow_lru_isolate() use locking suffix
Date: Fri, 22 Jun 2018 11:43:35 +0200
shadow_lru_isolate() disables interrupts and acquires a lock. It could
use spin_lock_irq() instead. It also uses local_irq_enable() while it
could use spin_unlock_irq()/xa_unlock_irq().
Use proper suffix for lock/unlock in order to enable/disable interrupts
during release/acquire of a lock.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
17/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: mm/list_lru: use list_lru_walk_one() in list_lru_walk_node()
Date: Tue, 3 Jul 2018 12:56:19 +0200
list_lru_walk_node() invokes __list_lru_walk_one() with -1 as the
memcg_idx parameter. The same can be achieved by list_lru_walk_one() and
passing NULL as memcg argument which then gets converted into -1. This
is a preparation step when the spin_lock() function is lifted to the
caller of __list_lru_walk_one().
Invoke list_lru_walk_one() instead __list_lru_walk_one() when possible.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
18/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: mm/list_lru: Move locking from __list_lru_walk_one() to its caller
Date: Tue, 3 Jul 2018 13:06:07 +0200
Move the locking inside __list_lru_walk_one() to its caller. This is a
preparation step in order to introduce list_lru_walk_one_irq() which
does spin_lock_irq() instead of spin_lock() for the locking.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
19/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: mm/list_lru: Pass struct list_lru_node as an argument __list_lru_walk_one()
Date: Tue, 3 Jul 2018 13:08:56 +0200
__list_lru_walk_one() is invoked with struct list_lru *lru, int nid as
the first two argument. Those two are only used to retrieve struct
list_lru_node. Since this is already done by the caller of the function
for the locking, we can pass struct list_lru_node directly and avoid the
dance around it.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
20/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: mm/list_lru: Introduce list_lru_shrink_walk_irq()
Date: Tue, 3 Jul 2018 13:17:27 +0200
Provide list_lru_shrink_walk_irq() and let it behave like
list_lru_walk_one() except that it locks the spinlock with
spin_lock_irq(). This is used by scan_shadow_nodes() because its lock
nests within the i_pages lock which is acquired with IRQ.
This change allows to use proper locking promitives instead hand crafted
lock_irq_disable() plus spin_lock().
There is no EXPORT_SYMBOL provided because the current user is in-KERNEL
only.
Add list_lru_shrink_walk_irq() which acquires the spinlock with the
proper locking primitives.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
21/258 [
Author: Alexandre Belloni
Email: alexandre.belloni@bootlin.com
Subject: ARM: at91: add TCB registers definitions
Date: Wed, 18 Apr 2018 12:51:38 +0200
Add registers and bits definitions for the timer counter blocks found on
Atmel ARM SoCs.
Tested-by: Alexander Dahl <ada@thorsis.com>
Tested-by: Andras Szemzo <szemzo.andras@gmail.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
22/258 [
Author: Alexandre Belloni
Email: alexandre.belloni@bootlin.com
Subject: clocksource/drivers: Add a new driver for the Atmel ARM TC blocks
Date: Wed, 18 Apr 2018 12:51:39 +0200
Add a driver for the Atmel Timer Counter Blocks. This driver provides a
clocksource and two clockevent devices.
One of the clockevent device is linked to the clocksource counter and so it
will run at the same frequency. This will be used when there is only on TCB
channel available for timers.
The other clockevent device runs on a separate TCB channel when available.
This driver uses regmap and syscon to be able to probe early in the boot
and avoid having to switch on the TCB clocksource later. Using regmap also
means that unused TCB channels may be used by other drivers (PWM for
example). read/writel are still used to access channel specific registers
to avoid the performance impact of regmap (mainly locking).
Tested-by: Alexander Dahl <ada@thorsis.com>
Tested-by: Andras Szemzo <szemzo.andras@gmail.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
23/258 [
Author: Alexandre Belloni
Email: alexandre.belloni@bootlin.com
Subject: clocksource/drivers: atmel-pit: make option silent
Date: Wed, 18 Apr 2018 12:51:40 +0200
To conform with the other option, make the ATMEL_PIT option silent so it
can be selected from the platform
Tested-by: Alexander Dahl <ada@thorsis.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
24/258 [
Author: Alexandre Belloni
Email: alexandre.belloni@bootlin.com
Subject: ARM: at91: Implement clocksource selection
Date: Wed, 18 Apr 2018 12:51:41 +0200
Allow selecting and unselecting the PIT clocksource driver so it doesn't
have to be compile when unused.
Tested-by: Alexander Dahl <ada@thorsis.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
25/258 [
Author: Alexandre Belloni
Email: alexandre.belloni@bootlin.com
Subject: ARM: configs: at91: use new TCB timer driver
Date: Wed, 18 Apr 2018 12:51:42 +0200
Unselecting ATMEL_TCLIB switches the TCB timer driver from tcb_clksrc to
timer-atmel-tcb.
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
26/258 [
Author: Alexandre Belloni
Email: alexandre.belloni@bootlin.com
Subject: ARM: configs: at91: unselect PIT
Date: Wed, 18 Apr 2018 12:51:43 +0200
The PIT is not required anymore to successfully boot and may actually harm
in case preempt-rt is used because the PIT interrupt is shared.
Disable it so the TCB clocksource is used.
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
27/258 [
Author: Frank Rowand
Email: frank.rowand@am.sony.com
Subject: arm: Convert arm boot_lock to raw
Date: Mon, 19 Sep 2011 14:51:14 -0700
The arm boot_lock is used by the secondary processor startup code. The locking
task is the idle thread, which has idle->sched_class == &idle_sched_class.
idle_sched_class->enqueue_task == NULL, so if the idle task blocks on the
lock, the attempt to wake it when the lock becomes available will fail:
try_to_wake_up()
...
activate_task()
enqueue_task()
p->sched_class->enqueue_task(rq, p, flags)
Fix by converting boot_lock to a raw spin lock.
Signed-off-by: Frank Rowand <frank.rowand@am.sony.com>
Link: http://lkml.kernel.org/r/4E77B952.3010606@am.sony.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Tony Lindgren <tony@atomide.com>
Acked-by: Krzysztof Kozlowski <krzk@kernel.org>
Tested-by: Krzysztof Kozlowski <krzk@kernel.org> [Exynos5422 Linaro PM-QA]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
28/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: x86/ioapic: Don't let setaffinity unmask threaded EOI interrupt too early
Date: Tue, 17 Jul 2018 18:25:31 +0200
There is an issue with threaded interrupts which are marked ONESHOT
and using the fasteoi handler.
if (IS_ONESHOT())
mask_irq();
....
....
cond_unmask_eoi_irq()
chip->irq_eoi();
So if setaffinity is pending then the interrupt will be moved and then
unmasked, which is wrong as it should be kept masked up to the point where
the threaded handler finished. It's not a real problem, the interrupt will
just be able to fire before the threaded handler has finished, though the irq
masked state will be wrong for a bit.
The patch below should cure the issue. It also renames the horribly
misnomed functions so it becomes clear what they are supposed to do.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy: add the body of the patch, use the same functions in both
ifdef paths (spotted by Andy Shevchenko)]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
29/258 [
Author: Yang Shi
Email: yang.shi@linaro.org
Subject: arm: kprobe: replace patch_lock to raw lock
Date: Thu, 10 Nov 2016 16:17:55 -0800
When running kprobe on -rt kernel, the below bug is caught:
BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:931
in_atomic(): 1, irqs_disabled(): 128, pid: 14, name: migration/0
INFO: lockdep is turned off.
irq event stamp: 238
hardirqs last enabled at (237): [<80b5aecc>] _raw_spin_unlock_irqrestore+0x88/0x90
hardirqs last disabled at (238): [<80b56d88>] __schedule+0xec/0x94c
softirqs last enabled at (0): [<80225584>] copy_process.part.5+0x30c/0x1994
softirqs last disabled at (0): [< (null)>] (null)
Preemption disabled at:[<802f2b98>] cpu_stopper_thread+0xc0/0x140
CPU: 0 PID: 14 Comm: migration/0 Tainted: G O 4.8.3-rt2 #1
Hardware name: Freescale LS1021A
[<80212e7c>] (unwind_backtrace) from [<8020cd2c>] (show_stack+0x20/0x24)
[<8020cd2c>] (show_stack) from [<80689e14>] (dump_stack+0xa0/0xcc)
[<80689e14>] (dump_stack) from [<8025a43c>] (___might_sleep+0x1b8/0x2a4)
[<8025a43c>] (___might_sleep) from [<80b5b324>] (rt_spin_lock+0x34/0x74)
[<80b5b324>] (rt_spin_lock) from [<80b5c31c>] (__patch_text_real+0x70/0xe8)
[<80b5c31c>] (__patch_text_real) from [<80b5c3ac>] (patch_text_stop_machine+0x18/0x20)
[<80b5c3ac>] (patch_text_stop_machine) from [<802f2920>] (multi_cpu_stop+0xfc/0x134)
[<802f2920>] (multi_cpu_stop) from [<802f2ba0>] (cpu_stopper_thread+0xc8/0x140)
[<802f2ba0>] (cpu_stopper_thread) from [<802563a4>] (smpboot_thread_fn+0x1a4/0x354)
[<802563a4>] (smpboot_thread_fn) from [<80251d38>] (kthread+0x104/0x11c)
[<80251d38>] (kthread) from [<80207f70>] (ret_from_fork+0x14/0x24)
Since patch_text_stop_machine() is called in stop_machine() which disables IRQ,
sleepable lock should be not used in this atomic context, so replace patch_lock
to raw lock.
Signed-off-by: Yang Shi <yang.shi@linaro.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
30/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: arm/unwind: use a raw_spin_lock
Date: Fri, 20 Sep 2013 14:31:54 +0200
Mostly unwind is done with irqs enabled however SLUB may call it with
irqs disabled while creating a new SLUB cache.
I had system freeze while loading a module which called
kmem_cache_create() on init. That means SLUB's __slab_alloc() disabled
interrupts and then
->new_slab_objects()
->new_slab()
->setup_object()
->setup_object_debug()
->init_tracking()
->set_track()
->save_stack_trace()
->save_stack_trace_tsk()
->walk_stackframe()
->unwind_frame()
->unwind_find_idx()
=>spin_lock_irqsave(&unwind_lock);
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
31/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: cgroup: use irqsave in cgroup_rstat_flush_locked()
Date: Tue, 3 Jul 2018 18:19:48 +0200
All callers of cgroup_rstat_flush_locked() acquire cgroup_rstat_lock
either with spin_lock_irq() or spin_lock_irqsave().
cgroup_rstat_flush_locked() itself acquires cgroup_rstat_cpu_lock which
is a raw_spin_lock. This lock is also acquired in cgroup_rstat_updated()
in IRQ context and therefore requires _irqsave() locking suffix in
cgroup_rstat_flush_locked().
Since there is no difference between spin_lock_t and raw_spin_lock_t
on !RT lockdep does not complain here. On RT lockdep complains because
the interrupts were not disabled here and a deadlock is possible.
Acquire the raw_spin_lock_t with disabled interrupts.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
32/258 [
Author: Clark Williams
Email: williams@redhat.com
Subject: fscache: initialize cookie hash table raw spinlocks
Date: Tue, 3 Jul 2018 13:34:30 -0500
The fscache cookie mechanism uses a hash table of hlist_bl_head structures. The
PREEMPT_RT patcheset adds a raw spinlock to this structure and so on PREEMPT_RT
the structures get used uninitialized, causing warnings about bad magic numbers
when spinlock debugging is turned on.
Use the init function for fscache cookies.
Signed-off-by: Clark Williams <williams@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
33/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: irqchip/gic-v3-its: Make its_lock a raw_spin_lock_t
Date: Fri, 13 Jul 2018 15:30:52 +0200
The its_lock lock is held while a new device is added to the list and
during setup while the CPU is booted. Even on -RT the CPU-bootup is
performed with disabled interrupts.
Make its_lock a raw_spin_lock_t.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
34/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: irqchip/gic-v3-its: Move ITS' ->pend_page allocation into an early CPU up hook
Date: Fri, 13 Jul 2018 15:45:36 +0200
The AP-GIC-starting hook allocates memory for the ->pend_page while the
CPU is started during boot-up. This callback is invoked on the target
CPU with disabled interrupts.
This does not work on -RT beacuse memory allocations are not possible
with disabled interrupts.
Move the memory allocation to an earlier hotplug step which invoked with
enabled interrupts on the boot CPU.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
35/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: efi: Allow efi=runtime
Date: Thu, 26 Jul 2018 15:06:10 +0200
In case the option "efi=noruntime" is default at built-time, the user
could overwrite its sate by `efi=runtime' and allow it again.
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
36/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: x86/efi: drop task_lock() from efi_switch_mm()
Date: Tue, 24 Jul 2018 14:48:55 +0200
efi_switch_mm() is a wrapper around switch_mm() which saves current's
->active_mm, sets the requests mm as ->active_mm and invokes
switch_mm().
I don't think that task_lock() is required during that procedure. It
protects ->mm which isn't changed here.
It needs to be mentioned that during the whole procedure (switch to
EFI's mm and back) the preemption needs to be disabled. A context switch
at this point would reset the cr3 value based on current->mm. Also, this
function may not be invoked at the same time on a different CPU because
it would overwrite the efi_scratch.prev_mm information.
Remove task_lock() and also update the comment to reflect it.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
37/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: arm64: KVM: compute_layout before altenates are applied
Date: Thu, 26 Jul 2018 09:13:42 +0200
compute_layout() is invoked as part of an alternative fixup under
stop_machine() and needs a sleeping lock as part of get_random_long().
Invoke compute_layout() before the alternatives are applied.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
38/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: NFSv4: replace seqcount_t with a seqlock_t
Date: Fri, 28 Oct 2016 23:05:11 +0200
The raw_write_seqcount_begin() in nfs4_reclaim_open_state() bugs me
because it maps to preempt_disable() in -RT which I can't have at this
point. So I took a look at the code.
It the lockdep part was removed in commit abbec2da13f0 ("NFS: Use
raw_write_seqcount_begin/end int nfs4_reclaim_open_state") because
lockdep complained. The whole seqcount thing was introduced in commit
c137afabe330 ("NFSv4: Allow the state manager to mark an open_owner as
being recovered").
The recovery threads runs only once.
write_seqlock() does not work on !RT because it disables preemption and it the
writer side is preemptible (has to remain so despite the fact that it will
block readers).
Reported-by: kernel test robot <xiaolong.ye@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
39/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: kernel: sched: Provide a pointer to the valid CPU mask
Date: Tue, 4 Apr 2017 12:50:16 +0200
In commit 4b53a3412d66 ("sched/core: Remove the tsk_nr_cpus_allowed()
wrapper") the tsk_nr_cpus_allowed() wrapper was removed. There was not
much difference in !RT but in RT we used this to implement
migrate_disable(). Within a migrate_disable() section the CPU mask is
restricted to single CPU while the "normal" CPU mask remains untouched.
As an alternative implementation Ingo suggested to use
struct task_struct {
const cpumask_t *cpus_ptr;
cpumask_t cpus_mask;
};
with
t->cpus_allowed_ptr = &t->cpus_allowed;
In -RT we then can switch the cpus_ptr to
t->cpus_allowed_ptr = &cpumask_of(task_cpu(p));
in a migration disabled region. The rules are simple:
- Code that 'uses' ->cpus_allowed would use the pointer.
- Code that 'modifies' ->cpus_allowed would use the direct mask.
While converting the existing users I tried to stick with the rules
above however… well mostly CPUFREQ tries to temporary switch the CPU
mask to do something on a certain CPU and then switches the mask back it
its original value. So in theory `cpus_ptr' could or should be used.
However if this is invoked in a migration disabled region (which is not
the case because it would require something like preempt_disable() and
set_cpus_allowed_ptr() might sleep so it can't be) then the "restore"
part would restore the wrong mask. So it only looks strange and I go for
the pointer…
Some drivers copy the cpumask without cpumask_copy() and others use
cpumask_copy but without alloc_cpumask_var(). I did not fix those as
part of this, could do this as a follow up…
So is this the way we want it?
Is the usage of `cpus_ptr' vs `cpus_mask' for the set + restore part
(see cpufreq users) what we want? At some point it looks like they
should use a different interface for their doing. I am not sure why
switching to certain CPU is important but maybe it could be done via a
workqueue from the CPUFREQ core (so we have a comment desribing why are
doing this and a get_online_cpus() to ensure that the CPU does not go
offline too early).
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
40/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: kernel/sched/core: add migrate_disable()
Date: Sat, 27 May 2017 19:02:06 +0200
]
41/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: arm: at91: do not disable/enable clocks in a row
Date: Wed, 9 Mar 2016 10:51:06 +0100
Currently the driver will disable the clock and enable it one line later
if it is switching from periodic mode into one shot.
This can be avoided and causes a needless warning on -RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
42/258 [
Author: Benedikt Spranger
Email: b.spranger@linutronix.de
Subject: clocksource: TCLIB: Allow higher clock rates for clock events
Date: Mon, 8 Mar 2010 18:57:04 +0100
As default the TCLIB uses the 32KiHz base clock rate for clock events.
Add a compile time selection to allow higher clock resulution.
(fixed up by Sami Pietikäinen <Sami.Pietikainen@wapice.com>)
Signed-off-by: Benedikt Spranger <b.spranger@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
43/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: timekeeping: Split jiffies seqlock
Date: Thu, 14 Feb 2013 22:36:59 +0100
Replace jiffies_lock seqlock with a simple seqcounter and a rawlock so
it can be taken in atomic context on RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
44/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: signal: Revert ptrace preempt magic
Date: Wed, 21 Sep 2011 19:57:12 +0200
Upstream commit '53da1d9456fe7f8 fix ptrace slowness' is nothing more
than a bandaid around the ptrace design trainwreck. It's not a
correctness issue, it's merily a cosmetic bandaid.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
45/258 [
Author: Marc Kleine-Budde
Email: mkl@pengutronix.de
Subject: net: sched: Use msleep() instead of yield()
Date: Wed, 5 Mar 2014 00:49:47 +0100
On PREEMPT_RT enabled systems the interrupt handler run as threads at prio 50
(by default). If a high priority userspace process tries to shut down a busy
network interface it might spin in a yield loop waiting for the device to
become idle. With the interrupt thread having a lower priority than the
looping process it might never be scheduled and so result in a deadlock on UP
systems.
With Magic SysRq the following backtrace can be produced:
> test_app R running 0 174 168 0x00000000
> [<c02c7070>] (__schedule+0x220/0x3fc) from [<c02c7870>] (preempt_schedule_irq+0x48/0x80)
> [<c02c7870>] (preempt_schedule_irq+0x48/0x80) from [<c0008fa8>] (svc_preempt+0x8/0x20)
> [<c0008fa8>] (svc_preempt+0x8/0x20) from [<c001a984>] (local_bh_enable+0x18/0x88)
> [<c001a984>] (local_bh_enable+0x18/0x88) from [<c025316c>] (dev_deactivate_many+0x220/0x264)
> [<c025316c>] (dev_deactivate_many+0x220/0x264) from [<c023be04>] (__dev_close_many+0x64/0xd4)
> [<c023be04>] (__dev_close_many+0x64/0xd4) from [<c023be9c>] (__dev_close+0x28/0x3c)
> [<c023be9c>] (__dev_close+0x28/0x3c) from [<c023f7f0>] (__dev_change_flags+0x88/0x130)
> [<c023f7f0>] (__dev_change_flags+0x88/0x130) from [<c023f904>] (dev_change_flags+0x10/0x48)
> [<c023f904>] (dev_change_flags+0x10/0x48) from [<c024c140>] (do_setlink+0x370/0x7ec)
> [<c024c140>] (do_setlink+0x370/0x7ec) from [<c024d2f0>] (rtnl_newlink+0x2b4/0x450)
> [<c024d2f0>] (rtnl_newlink+0x2b4/0x450) from [<c024cfa0>] (rtnetlink_rcv_msg+0x158/0x1f4)
> [<c024cfa0>] (rtnetlink_rcv_msg+0x158/0x1f4) from [<c0256740>] (netlink_rcv_skb+0xac/0xc0)
> [<c0256740>] (netlink_rcv_skb+0xac/0xc0) from [<c024bbd8>] (rtnetlink_rcv+0x18/0x24)
> [<c024bbd8>] (rtnetlink_rcv+0x18/0x24) from [<c02561b8>] (netlink_unicast+0x13c/0x198)
> [<c02561b8>] (netlink_unicast+0x13c/0x198) from [<c025651c>] (netlink_sendmsg+0x264/0x2e0)
> [<c025651c>] (netlink_sendmsg+0x264/0x2e0) from [<c022af98>] (sock_sendmsg+0x78/0x98)
> [<c022af98>] (sock_sendmsg+0x78/0x98) from [<c022bb50>] (___sys_sendmsg.part.25+0x268/0x278)
> [<c022bb50>] (___sys_sendmsg.part.25+0x268/0x278) from [<c022cf08>] (__sys_sendmsg+0x48/0x78)
> [<c022cf08>] (__sys_sendmsg+0x48/0x78) from [<c0009320>] (ret_fast_syscall+0x0/0x2c)
This patch works around the problem by replacing yield() by msleep(1), giving
the interrupt thread time to finish, similar to other changes contained in the
rt patch set. Using wait_for_completion() instead would probably be a better
solution.
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
46/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: dm rq: remove BUG_ON(!irqs_disabled) check
Date: Tue, 27 Mar 2018 16:24:15 +0200
In commit 052189a2ec95 ("dm: remove superfluous irq disablement in
dm_request_fn") the spin_lock_irq() was replaced with spin_lock() + a
check for disabled interrupts. Later the locking part was removed in
commit 2eb6e1e3aa87 ("dm: submit stacked requests in irq enabled
context") but the BUG_ON() check remained.
Since the original purpose for the "are-irqs-off" check is gone (the
->queue_lock has been removed) remove it.
Cc: Keith Busch <keith.busch@intel.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
47/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: usb: do no disable interrupts in giveback
Date: Fri, 8 Nov 2013 17:34:54 +0100
Since commit 94dfd7ed ("USB: HCD: support giveback of URB in tasklet
context") the USB code disables interrupts before invoking the complete
callback.
This should not be required the HCD completes the URBs either in hard-irq
context or in BH context. Lockdep may report false positives if one has two
HCDs (one completes in IRQ and the other in BH context) and is using the same
USB driver (device) with both HCDs. This is safe since the same URBs are never
mixed with those two HCDs.
Longeterm we should force all HCDs to complete in the same context.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
48/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rt: Provide PREEMPT_RT_BASE config switch
Date: Fri, 17 Jun 2011 12:39:57 +0200
Introduce PREEMPT_RT_BASE which enables parts of
PREEMPT_RT_FULL. Forces interrupt threading and enables some of the RT
substitutions for testing.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
49/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: kconfig: Add PREEMPT_RT_FULL
Date: Wed, 29 Jun 2011 14:58:57 +0200
Introduce the final symbol for PREEMPT_RT_FULL.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
50/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: cpumask: Disable CONFIG_CPUMASK_OFFSTACK for RT
Date: Wed, 14 Dec 2011 01:03:49 +0100
There are "valid" GFP_ATOMIC allocations such as
|BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:931
|in_atomic(): 1, irqs_disabled(): 0, pid: 2130, name: tar
|1 lock held by tar/2130:
| #0: (&mm->mmap_sem){++++++}, at: [<ffffffff811d4e89>] SyS_brk+0x39/0x190
|Preemption disabled at:[<ffffffff81063048>] flush_tlb_mm_range+0x28/0x350
|
|CPU: 1 PID: 2130 Comm: tar Tainted: G W 4.8.2-rt2+ #747
|Call Trace:
| [<ffffffff814d52dc>] dump_stack+0x86/0xca
| [<ffffffff810a26fb>] ___might_sleep+0x14b/0x240
| [<ffffffff819bc1d4>] rt_spin_lock+0x24/0x60
| [<ffffffff81194fba>] get_page_from_freelist+0x83a/0x11b0
| [<ffffffff81195e8b>] __alloc_pages_nodemask+0x15b/0x1190
| [<ffffffff811f0b81>] alloc_pages_current+0xa1/0x1f0
| [<ffffffff811f7df5>] new_slab+0x3e5/0x690
| [<ffffffff811fb0d5>] ___slab_alloc+0x495/0x660
| [<ffffffff811fb311>] __slab_alloc.isra.79+0x71/0xc0
| [<ffffffff811fb447>] __kmalloc_node+0xe7/0x240
| [<ffffffff814d4ee0>] alloc_cpumask_var_node+0x20/0x50
| [<ffffffff814d4f3e>] alloc_cpumask_var+0xe/0x10
| [<ffffffff810430c1>] native_send_call_func_ipi+0x21/0x130
| [<ffffffff8111c13f>] smp_call_function_many+0x22f/0x370
| [<ffffffff81062b64>] native_flush_tlb_others+0x1a4/0x3a0
| [<ffffffff8106309b>] flush_tlb_mm_range+0x7b/0x350
| [<ffffffff811c88e2>] tlb_flush_mmu_tlbonly+0x62/0xd0
| [<ffffffff811c9af4>] tlb_finish_mmu+0x14/0x50
| [<ffffffff811d1c84>] unmap_region+0xe4/0x110
| [<ffffffff811d3db3>] do_munmap+0x293/0x470
| [<ffffffff811d4f8c>] SyS_brk+0x13c/0x190
| [<ffffffff810032e2>] do_fast_syscall_32+0xb2/0x2f0
| [<ffffffff819be181>] entry_SYSENTER_compat+0x51/0x60
which forbid allocations at run-time.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
51/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: jump-label: disable if stop_machine() is used
Date: Wed, 8 Jul 2015 17:14:48 +0200
Some architectures are using stop_machine() while switching the opcode which
leads to latency spikes.
The architectures which use stop_machine() atm:
- ARM stop machine
- s390 stop machine
The architecures which use other sorcery:
- MIPS
- X86
- powerpc
- sparc
- arm64
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy: only ARM for now]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
52/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: kconfig: Disable config options which are not RT compatible
Date: Sun, 24 Jul 2011 12:11:43 +0200
Disable stuff which is known to have issues on RT
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
53/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: lockdep: disable self-test
Date: Tue, 17 Oct 2017 16:36:18 +0200
The self-test wasn't always 100% accurate for RT. We disabled a few
tests which failed because they had a different semantic for RT. Some
still reported false positives. Now the selftest locks up the system
during boot and it needs to be investigated…
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
54/258 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: mm: Allow only slub on RT
Date: Fri, 3 Jul 2009 08:44:03 -0500
Disable SLAB and SLOB on -RT. Only SLUB is adopted to -RT needs.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
55/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: locking: Disable spin on owner for RT
Date: Sun, 17 Jul 2011 21:51:45 +0200
Drop spin on owner for mutex / rwsem. We are most likely not using it
but…
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
56/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rcu: Disable RCU_FAST_NO_HZ on RT
Date: Sun, 28 Oct 2012 13:26:09 +0000
This uses a timer_list timer from the irq disabled guts of the idle
code. Disable it for now to prevent wreckage.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
57/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: rcu: make RCU_BOOST default on RT
Date: Fri, 21 Mar 2014 20:19:05 +0100
Since it is no longer invoked from the softirq people run into OOM more
often if the priority of the RCU thread is too low. Making boosting
default on RT should help in those case and it can be switched off if
someone knows better.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
58/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Disable CONFIG_RT_GROUP_SCHED on RT
Date: Mon, 18 Jul 2011 17:03:52 +0200
Carsten reported problems when running:
taskset 01 chrt -f 1 sleep 1
from within rc.local on a F15 machine. The task stays running and
never gets on the run queue because some of the run queues have
rt_throttled=1 which does not go away. Works nice from a ssh login
shell. Disabling CONFIG_RT_GROUP_SCHED solves that as well.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
59/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net/core: disable NET_RX_BUSY_POLL
Date: Sat, 27 May 2017 19:02:06 +0200
sk_busy_loop() does preempt_disable() followed by a few operations which can
take sleeping locks and may get long.
I _think_ that we could use preempt_disable_nort() (in sk_busy_loop()) instead
but after a successfull cmpxchg(&napi->state, …) we would gain the ressource
and could be scheduled out. At this point nobody knows who (which context) owns
it and so it could take a while until the state is realeased and napi_poll()
could be invoked again.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
60/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: arm*: disable NEON in kernel mode
Date: Fri, 1 Dec 2017 10:42:03 +0100
NEON in kernel mode is used by the crypto algorithms and raid6 code.
While the raid6 code looks okay, the crypto algorithms do not: NEON
is enabled on first invocation and may allocate/free/map memory before
the NEON mode is disabled again.
This needs to be changed until it can be enabled.
On ARM NEON in kernel mode can be simply disabled. on ARM64 it needs to
stay on due to possible EFI callbacks so here I disable each algorithm.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
61/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: powerpc: Use generic rwsem on RT
Date: Tue, 14 Jul 2015 14:26:34 +0200
Use generic code which uses rtmutex
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
62/258 [
Author: Bogdan Purcareata
Email: bogdan.purcareata@freescale.com
Subject: powerpc/kvm: Disable in-kernel MPIC emulation for PREEMPT_RT_FULL
Date: Fri, 24 Apr 2015 15:53:13 +0000
While converting the openpic emulation code to use a raw_spinlock_t enables
guests to run on RT, there's still a performance issue. For interrupts sent in
directed delivery mode with a multiple CPU mask, the emulated openpic will loop
through all of the VCPUs, and for each VCPUs, it call IRQ_check, which will loop
through all the pending interrupts for that VCPU. This is done while holding the
raw_lock, meaning that in all this time the interrupts and preemption are
disabled on the host Linux. A malicious user app can max both these number and
cause a DoS.
This temporary fix is sent for two reasons. First is so that users who want to
use the in-kernel MPIC emulation are aware of the potential latencies, thus
making sure that the hardware MPIC and their usage scenario does not involve
interrupts sent in directed delivery mode, and the number of possible pending
interrupts is kept small. Secondly, this should incentivize the development of a
proper openpic emulation that would be better suited for RT.
Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Bogdan Purcareata <bogdan.purcareata@freescale.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
63/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: powerpc: Disable highmem on RT
Date: Mon, 18 Jul 2011 17:08:34 +0200
The current highmem handling on -RT is not compatible and needs fixups.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
64/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mips: Disable highmem on RT
Date: Mon, 18 Jul 2011 17:10:12 +0200
The current highmem handling on -RT is not compatible and needs fixups.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
65/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: x86: Use generic rwsem_spinlocks on -rt
Date: Sun, 26 Jul 2009 02:21:32 +0200
Simplifies the separation of anon_rw_semaphores and rw_semaphores for
-rt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
66/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: leds: trigger: disable CPU trigger on -RT
Date: Thu, 23 Jan 2014 14:45:59 +0100
as it triggers:
|CPU: 0 PID: 0 Comm: swapper Not tainted 3.12.8-rt10 #141
|[<c0014aa4>] (unwind_backtrace+0x0/0xf8) from [<c0012788>] (show_stack+0x1c/0x20)
|[<c0012788>] (show_stack+0x1c/0x20) from [<c043c8dc>] (dump_stack+0x20/0x2c)
|[<c043c8dc>] (dump_stack+0x20/0x2c) from [<c004c5e8>] (__might_sleep+0x13c/0x170)
|[<c004c5e8>] (__might_sleep+0x13c/0x170) from [<c043f270>] (__rt_spin_lock+0x28/0x38)
|[<c043f270>] (__rt_spin_lock+0x28/0x38) from [<c043fa00>] (rt_read_lock+0x68/0x7c)
|[<c043fa00>] (rt_read_lock+0x68/0x7c) from [<c036cf74>] (led_trigger_event+0x2c/0x5c)
|[<c036cf74>] (led_trigger_event+0x2c/0x5c) from [<c036e0bc>] (ledtrig_cpu+0x54/0x5c)
|[<c036e0bc>] (ledtrig_cpu+0x54/0x5c) from [<c000ffd8>] (arch_cpu_idle_exit+0x18/0x1c)
|[<c000ffd8>] (arch_cpu_idle_exit+0x18/0x1c) from [<c00590b8>] (cpu_startup_entry+0xa8/0x234)
|[<c00590b8>] (cpu_startup_entry+0xa8/0x234) from [<c043b2cc>] (rest_init+0xb8/0xe0)
|[<c043b2cc>] (rest_init+0xb8/0xe0) from [<c061ebe0>] (start_kernel+0x2c4/0x380)
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
67/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: cpufreq: drop K8's driver from beeing selected
Date: Thu, 9 Apr 2015 15:23:01 +0200
Ralf posted a picture of a backtrace from
| powernowk8_target_fn() -> transition_frequency_fidvid() and then at the
| end:
| 932 policy = cpufreq_cpu_get(smp_processor_id());
| 933 cpufreq_cpu_put(policy);
crashing the system on -RT. I assumed that policy was a NULL pointer but
was rulled out. Since Ralf can't do any more investigations on this and
I have no machine with this, I simply switch it off.
Reported-by: Ralf Mardorf <ralf.mardorf@alice-dsl.net>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
68/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: md: disable bcache
Date: Thu, 29 Aug 2013 11:48:57 +0200
It uses anon semaphores
|drivers/md/bcache/request.c: In function ‘cached_dev_write_complete’:
|drivers/md/bcache/request.c:1007:2: error: implicit declaration of function ‘up_read_non_owner’ [-Werror=implicit-function-declaration]
| up_read_non_owner(&dc->writeback_lock);
| ^
|drivers/md/bcache/request.c: In function ‘request_write’:
|drivers/md/bcache/request.c:1033:2: error: implicit declaration of function ‘down_read_non_owner’ [-Werror=implicit-function-declaration]
| down_read_non_owner(&dc->writeback_lock);
| ^
either we get rid of those or we have to introduce them…
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
69/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: efi: Disable runtime services on RT
Date: Thu, 26 Jul 2018 15:03:16 +0200
Based on meassurements the EFI functions get_variable /
get_next_variable take up to 2us which looks okay.
The functions get_time, set_time take around 10ms. Those 10ms are too
much. Even one ms would be too much.
Ard mentioned that SetVariable might even trigger larger latencies if
the firware will erase flash blocks on NOR.
The time-functions are used by efi-rtc and can be triggered during
runtimed (either via explicit read/write or ntp sync).
The variable write could be used by pstore.
These functions can be disabled without much of a loss. The poweroff /
reboot hooks may be provided by PSCI.
Disable EFI's runtime wrappers.
This was observed on "EFI v2.60 by SoftIron Overdrive 1000".
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
70/258 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: printk: Add a printk kill switch
Date: Fri, 22 Jul 2011 17:58:40 +0200
Add a prinkt-kill-switch. This is used from (NMI) watchdog to ensure that
it does not dead-lock with the early printk code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
71/258 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: printk: Add "force_early_printk" boot param to help with debugging
Date: Fri, 2 Sep 2011 14:41:29 +0200
Gives me an option to screw printk and actually see what the machine
says.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1314967289.1301.11.camel@twins
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/n/tip-ykb97nsfmobq44xketrxs977@git.kernel.org
]
72/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: preempt: Provide preempt_*_(no)rt variants
Date: Fri, 24 Jul 2009 12:38:56 +0200
RT needs a few preempt_disable/enable points which are not necessary
otherwise. Implement variants to avoid #ifdeffery.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
73/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: futex: workaround migrate_disable/enable in different context
Date: Wed, 8 Mar 2017 14:23:35 +0100
migrate_disable()/migrate_enable() takes a different path in atomic() vs
!atomic() context. These little hacks ensure that we don't underflow / overflow
the migrate code counts properly while we lock the hb lockwith interrupts
enabled and unlock it with interrupts disabled.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
74/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rt: Add local irq locks
Date: Mon, 20 Jun 2011 09:03:47 +0200
Introduce locallock. For !RT this maps to preempt_disable()/
local_irq_disable() so there is not much that changes. For RT this will
map to a spinlock. This makes preemption possible and locked "ressource"
gets the lockdep anotation it wouldn't have otherwise. The locks are
recursive for owner == current. Also, all locks user migrate_disable()
which ensures that the task is not migrated to another CPU while the lock
is held and the owner is preempted.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
75/258 [
Author: Julia Cartwright
Email: julia@ni.com
Subject: locallock: provide {get,put}_locked_ptr() variants
Date: Mon, 7 May 2018 08:58:56 -0500
Provide a set of locallocked accessors for pointers to per-CPU data;
this is useful for dynamically-allocated per-CPU regions, for example.
These are symmetric with the {get,put}_cpu_ptr() per-CPU accessor
variants.
Signed-off-by: Julia Cartwright <julia@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
76/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mm/scatterlist: Do not disable irqs on RT
Date: Fri, 3 Jul 2009 08:44:34 -0500
For -RT it is enough to keep pagefault disabled (which is currently handled by
kmap_atomic()).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
77/258 [
Author: Oleg Nesterov
Email: oleg@redhat.com
Subject: signal/x86: Delay calling signals in atomic
Date: Tue, 14 Jul 2015 14:26:34 +0200
On x86_64 we must disable preemption before we enable interrupts
for stack faults, int3 and debugging, because the current task is using
a per CPU debug stack defined by the IST. If we schedule out, another task
can come in and use the same stack and cause the stack to be corrupted
and crash the kernel on return.
When CONFIG_PREEMPT_RT_FULL is enabled, spin_locks become mutexes, and
one of these is the spin lock used in signal handling.
Some of the debug code (int3) causes do_trap() to send a signal.
This function calls a spin lock that has been converted to a mutex
and has the possibility to sleep. If this happens, the above issues with
the corrupted stack is possible.
Instead of calling the signal right away, for PREEMPT_RT and x86_64,
the signal information is stored on the stacks task_struct and
TIF_NOTIFY_RESUME is set. Then on exit of the trap, the signal resume
code will send the signal when preemption is enabled.
[ rostedt: Switched from #ifdef CONFIG_PREEMPT_RT_FULL to
ARCH_RT_DELAYS_SIGNAL_SEND and added comments to the code. ]
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
78/258 [
Author: Yang Shi
Email: yang.shi@linaro.org
Subject: x86/signal: delay calling signals on 32bit
Date: Thu, 10 Dec 2015 10:58:51 -0800
When running some ptrace single step tests on x86-32 machine, the below problem
is triggered:
BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:917
in_atomic(): 1, irqs_disabled(): 0, pid: 1041, name: dummy2
Preemption disabled at:[<c100326f>] do_debug+0x1f/0x1a0
CPU: 10 PID: 1041 Comm: dummy2 Tainted: G W 4.1.13-rt13 #1
Call Trace:
[<c1aa8306>] dump_stack+0x46/0x5c
[<c1080517>] ___might_sleep+0x137/0x220
[<c1ab0eff>] rt_spin_lock+0x1f/0x80
[<c1064b5a>] do_force_sig_info+0x2a/0xc0
[<c106567d>] force_sig_info+0xd/0x10
[<c1010cff>] send_sigtrap+0x6f/0x80
[<c10033b1>] do_debug+0x161/0x1a0
[<c1ab2921>] debug_stack_correct+0x2e/0x35
This happens since 959274753857 ("x86, traps: Track entry into and exit
from IST context") which was merged in v4.1-rc1.
Signed-off-by: Yang Shi <yang.shi@linaro.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
79/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: buffer_head: Replace bh_uptodate_lock for -rt
Date: Fri, 18 Mar 2011 09:18:52 +0100
Wrap the bit_spin_lock calls into a separate inline and add the RT
replacements with a real spinlock.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
80/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: fs: jbd/jbd2: Make state lock and journal head lock rt safe
Date: Fri, 18 Mar 2011 10:11:25 +0100
bit_spin_locks break under RT.
Based on a previous patch from Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
--
include/linux/buffer_head.h | 8 ++++++++
include/linux/jbd2.h | 24 ++++++++++++++++++++++++
2 files changed, 32 insertions(+)
]
81/258 [
Author: Paul Gortmaker
Email: paul.gortmaker@windriver.com
Subject: list_bl: Make list head locking RT safe
Date: Fri, 21 Jun 2013 15:07:25 -0400
As per changes in include/linux/jbd_common.h for avoiding the
bit_spin_locks on RT ("fs: jbd/jbd2: Make state lock and journal
head lock rt safe") we do the same thing here.
We use the non atomic __set_bit and __clear_bit inside the scope of
the lock to preserve the ability of the existing LIST_DEBUG code to
use the zero'th bit in the sanity checks.
As a bit spinlock, we had no lockdep visibility into the usage
of the list head locking. Now, if we were to implement it as a
standard non-raw spinlock, we would see:
BUG: sleeping function called from invalid context at kernel/rtmutex.c:658
in_atomic(): 1, irqs_disabled(): 0, pid: 122, name: udevd
5 locks held by udevd/122:
#0: (&sb->s_type->i_mutex_key#7/1){+.+.+.}, at: [<ffffffff811967e8>] lock_rename+0xe8/0xf0
#1: (rename_lock){+.+...}, at: [<ffffffff811a277c>] d_move+0x2c/0x60
#2: (&dentry->d_lock){+.+...}, at: [<ffffffff811a0763>] dentry_lock_for_move+0xf3/0x130
#3: (&dentry->d_lock/2){+.+...}, at: [<ffffffff811a0734>] dentry_lock_for_move+0xc4/0x130
#4: (&dentry->d_lock/3){+.+...}, at: [<ffffffff811a0747>] dentry_lock_for_move+0xd7/0x130
Pid: 122, comm: udevd Not tainted 3.4.47-rt62 #7
Call Trace:
[<ffffffff810b9624>] __might_sleep+0x134/0x1f0
[<ffffffff817a24d4>] rt_spin_lock+0x24/0x60
[<ffffffff811a0c4c>] __d_shrink+0x5c/0xa0
[<ffffffff811a1b2d>] __d_drop+0x1d/0x40
[<ffffffff811a24be>] __d_move+0x8e/0x320
[<ffffffff811a278e>] d_move+0x3e/0x60
[<ffffffff81199598>] vfs_rename+0x198/0x4c0
[<ffffffff8119b093>] sys_renameat+0x213/0x240
[<ffffffff817a2de5>] ? _raw_spin_unlock+0x35/0x60
[<ffffffff8107781c>] ? do_page_fault+0x1ec/0x4b0
[<ffffffff817a32ca>] ? retint_swapgs+0xe/0x13
[<ffffffff813eb0e6>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[<ffffffff8119b0db>] sys_rename+0x1b/0x20
[<ffffffff817a3b96>] system_call_fastpath+0x1a/0x1f
Since we are only taking the lock during short lived list operations,
lets assume for now that it being raw won't be a significant latency
concern.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
82/258 [
Author: Josh Cartwright
Email: joshc@ni.com
Subject: list_bl: fixup bogus lockdep warning
Date: Thu, 31 Mar 2016 00:04:25 -0500
At first glance, the use of 'static inline' seems appropriate for
INIT_HLIST_BL_HEAD().
However, when a 'static inline' function invocation is inlined by gcc,
all callers share any static local data declared within that inline
function.
This presents a problem for how lockdep classes are setup. raw_spinlocks, for
example, when CONFIG_DEBUG_SPINLOCK,
# define raw_spin_lock_init(lock) \
do { \
static struct lock_class_key __key; \
\
__raw_spin_lock_init((lock), #lock, &__key); \
} while (0)
When this macro is expanded into a 'static inline' caller, like
INIT_HLIST_BL_HEAD():
static inline INIT_HLIST_BL_HEAD(struct hlist_bl_head *h)
{
h->first = NULL;
raw_spin_lock_init(&h->lock);
}
...the static local lock_class_key object is made a function static.
For compilation units which initialize invoke INIT_HLIST_BL_HEAD() more
than once, then, all of the invocations share this same static local
object.
This can lead to some very confusing lockdep splats (example below).
Solve this problem by forcing the INIT_HLIST_BL_HEAD() to be a macro,
which prevents the lockdep class object sharing.
=============================================
[ INFO: possible recursive locking detected ]
4.4.4-rt11 #4 Not tainted
---------------------------------------------
kswapd0/59 is trying to acquire lock:
(&h->lock#2){+.+.-.}, at: mb_cache_shrink_scan
but task is already holding lock:
(&h->lock#2){+.+.-.}, at: mb_cache_shrink_scan
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&h->lock#2);
lock(&h->lock#2);
*** DEADLOCK ***
May be due to missing lock nesting notation
2 locks held by kswapd0/59:
#0: (shrinker_rwsem){+.+...}, at: rt_down_read_trylock
#1: (&h->lock#2){+.+.-.}, at: mb_cache_shrink_scan
Reported-by: Luis Claudio R. Goncalves <lclaudio@uudg.org>
Tested-by: Luis Claudio R. Goncalves <lclaudio@uudg.org>
Signed-off-by: Josh Cartwright <joshc@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
83/258 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: genirq: Disable irqpoll on -rt
Date: Fri, 3 Jul 2009 08:29:57 -0500
Creates long latencies for no value
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
84/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: genirq: Force interrupt thread on RT
Date: Sun, 3 Apr 2011 11:57:29 +0200
Force threaded_irqs and optimize the code (force_irqthreads) in regard
to this.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
85/258 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: Split IRQ-off and zone->lock while freeing pages from PCP list #2
Date: Mon, 28 May 2018 15:24:21 +0200
Split the IRQ-off section while accessing the PCP list from zone->lock
while freeing pages.
Introcude isolate_pcp_pages() which separates the pages from the PCP
list onto a temporary list and then free the temporary list via
free_pcppages_bulk().
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
86/258 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: Split IRQ-off and zone->lock while freeing pages from PCP list #2
Date: Mon, 28 May 2018 15:24:21 +0200
Split the IRQ-off section while accessing the PCP list from zone->lock
while freeing pages.
Introcude isolate_pcp_pages() which separates the pages from the PCP
list onto a temporary list and then free the temporary list via
free_pcppages_bulk().
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
87/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mm/SLxB: change list_lock to raw_spinlock_t
Date: Mon, 28 May 2018 15:24:22 +0200
The list_lock is used with used with IRQs off on RT. Make it a raw_spinlock_t
otherwise the interrupts won't be disabled on -RT. The locking rules remain
the same on !RT.
This patch changes it for SLAB and SLUB since both share the same header
file for struct kmem_cache_node defintion.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
88/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mm/SLUB: delay giving back empty slubs to IRQ enabled regions
Date: Thu, 21 Jun 2018 17:29:19 +0200
__free_slab() is invoked with disabled interrupts which increases the
irq-off time while __free_pages() is doing the work.
Allow __free_slab() to be invoked with enabled interrupts and move
everything from interrupts-off invocations to a temporary per-CPU list
so it can be processed later.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
89/258 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: mm: page_alloc: rt-friendly per-cpu pages
Date: Fri, 3 Jul 2009 08:29:37 -0500
rt-friendly per-cpu pages: convert the irqs-off per-cpu locking
method into a preemptible, explicit-per-cpu-locks method.
Contains fixes from:
Peter Zijlstra <a.p.zijlstra@chello.nl>
Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
90/258 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: mm/swap: Convert to percpu locked
Date: Fri, 3 Jul 2009 08:29:51 -0500
Replace global locks (get_cpu + local_irq_save) with "local_locks()".
Currently there is one of for "rotate" and one for "swap".
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
91/258 [
Author: Luiz Capitulino
Email: lcapitulino@redhat.com
Subject: mm: perform lru_add_drain_all() remotely
Date: Fri, 27 May 2016 15:03:28 +0200
lru_add_drain_all() works by scheduling lru_add_drain_cpu() to run
on all CPUs that have non-empty LRU pagevecs and then waiting for
the scheduled work to complete. However, workqueue threads may never
have the chance to run on a CPU that's running a SCHED_FIFO task.
This causes lru_add_drain_all() to block forever.
This commit solves this problem by changing lru_add_drain_all()
to drain the LRU pagevecs of remote CPUs. This is done by grabbing
swapvec_lock and calling lru_add_drain_cpu().
PS: This is based on an idea and initial implementation by
Rik van Riel.
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
92/258 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: mm/vmstat: Protect per cpu variables with preempt disable on RT
Date: Fri, 3 Jul 2009 08:30:13 -0500
Disable preemption on -RT for the vmstat code. On vanila the code runs in
IRQ-off regions while on -RT it is not. "preempt_disable" ensures that the
same ressources is not updated in parallel due to preemption.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
93/258 [
Author: Frank Rowand
Email: frank.rowand@am.sony.com
Subject: ARM: Initialize split page table locks for vector page
Date: Sat, 1 Oct 2011 18:58:13 -0700
Without this patch, ARM can not use SPLIT_PTLOCK_CPUS if
PREEMPT_RT_FULL=y because vectors_user_mapping() creates a
VM_ALWAYSDUMP mapping of the vector page (address 0xffff0000), but no
ptl->lock has been allocated for the page. An attempt to coredump
that page will result in a kernel NULL pointer dereference when
follow_page() attempts to lock the page.
The call tree to the NULL pointer dereference is:
do_notify_resume()
get_signal_to_deliver()
do_coredump()
elf_core_dump()
get_dump_page()
__get_user_pages()
follow_page()
pte_offset_map_lock() <----- a #define
...
rt_spin_lock()
The underlying problem is exposed by mm-shrink-the-page-frame-to-rt-size.patch.
Signed-off-by: Frank Rowand <frank.rowand@am.sony.com>
Cc: Frank <Frank_Rowand@sonyusa.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/4E87C535.2030907@am.sony.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
94/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mm: Enable SLUB for RT
Date: Thu, 25 Oct 2012 10:32:35 +0100
Avoid the memory allocation in IRQ section
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy: factor out everything except the kcalloc() workaorund ]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
95/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: slub: Enable irqs for __GFP_WAIT
Date: Wed, 9 Jan 2013 12:08:15 +0100
SYSTEM_RUNNING might be too late for enabling interrupts. Allocations
with GFP_WAIT can happen before that. So use this as an indicator.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
96/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: slub: Disable SLUB_CPU_PARTIAL
Date: Wed, 15 Apr 2015 19:00:47 +0200
|BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:915
|in_atomic(): 1, irqs_disabled(): 0, pid: 87, name: rcuop/7
|1 lock held by rcuop/7/87:
| #0: (rcu_callback){......}, at: [<ffffffff8112c76a>] rcu_nocb_kthread+0x1ca/0x5d0
|Preemption disabled at:[<ffffffff811eebd9>] put_cpu_partial+0x29/0x220
|
|CPU: 0 PID: 87 Comm: rcuop/7 Tainted: G W 4.0.0-rt0+ #477
|Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
| 000000000007a9fc ffff88013987baf8 ffffffff817441c7 0000000000000007
| 0000000000000000 ffff88013987bb18 ffffffff810eee51 0000000000000000
| ffff88013fc10200 ffff88013987bb48 ffffffff8174a1c4 000000000007a9fc
|Call Trace:
| [<ffffffff817441c7>] dump_stack+0x4f/0x90
| [<ffffffff810eee51>] ___might_sleep+0x121/0x1b0
| [<ffffffff8174a1c4>] rt_spin_lock+0x24/0x60
| [<ffffffff811a689a>] __free_pages_ok+0xaa/0x540
| [<ffffffff811a729d>] __free_pages+0x1d/0x30
| [<ffffffff811eddd5>] __free_slab+0xc5/0x1e0
| [<ffffffff811edf46>] free_delayed+0x56/0x70
| [<ffffffff811eecfd>] put_cpu_partial+0x14d/0x220
| [<ffffffff811efc98>] __slab_free+0x158/0x2c0
| [<ffffffff811f0021>] kmem_cache_free+0x221/0x2d0
| [<ffffffff81204d0c>] file_free_rcu+0x2c/0x40
| [<ffffffff8112c7e3>] rcu_nocb_kthread+0x243/0x5d0
| [<ffffffff810e951c>] kthread+0xfc/0x120
| [<ffffffff8174abc8>] ret_from_fork+0x58/0x90
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
97/258 [
Author: Yang Shi
Email: yang.shi@windriver.com
Subject: mm/memcontrol: Don't call schedule_work_on in preemption disabled context
Date: Wed, 30 Oct 2013 11:48:33 -0700
The following trace is triggered when running ltp oom test cases:
BUG: sleeping function called from invalid context at kernel/rtmutex.c:659
in_atomic(): 1, irqs_disabled(): 0, pid: 17188, name: oom03
Preemption disabled at:[<ffffffff8112ba70>] mem_cgroup_reclaim+0x90/0xe0
CPU: 2 PID: 17188 Comm: oom03 Not tainted 3.10.10-rt3 #2
Hardware name: Intel Corporation Calpella platform/MATXM-CORE-411-B, BIOS 4.6.3 08/18/2010
ffff88007684d730 ffff880070df9b58 ffffffff8169918d ffff880070df9b70
ffffffff8106db31 ffff88007688b4a0 ffff880070df9b88 ffffffff8169d9c0
ffff88007688b4a0 ffff880070df9bc8 ffffffff81059da1 0000000170df9bb0
Call Trace:
[<ffffffff8169918d>] dump_stack+0x19/0x1b
[<ffffffff8106db31>] __might_sleep+0xf1/0x170
[<ffffffff8169d9c0>] rt_spin_lock+0x20/0x50
[<ffffffff81059da1>] queue_work_on+0x61/0x100
[<ffffffff8112b361>] drain_all_stock+0xe1/0x1c0
[<ffffffff8112ba70>] mem_cgroup_reclaim+0x90/0xe0
[<ffffffff8112beda>] __mem_cgroup_try_charge+0x41a/0xc40
[<ffffffff810f1c91>] ? release_pages+0x1b1/0x1f0
[<ffffffff8106f200>] ? sched_exec+0x40/0xb0
[<ffffffff8112cc87>] mem_cgroup_charge_common+0x37/0x70
[<ffffffff8112e2c6>] mem_cgroup_newpage_charge+0x26/0x30
[<ffffffff8110af68>] handle_pte_fault+0x618/0x840
[<ffffffff8103ecf6>] ? unpin_current_cpu+0x16/0x70
[<ffffffff81070f94>] ? migrate_enable+0xd4/0x200
[<ffffffff8110cde5>] handle_mm_fault+0x145/0x1e0
[<ffffffff810301e1>] __do_page_fault+0x1a1/0x4c0
[<ffffffff8169c9eb>] ? preempt_schedule_irq+0x4b/0x70
[<ffffffff8169e3b7>] ? retint_kernel+0x37/0x40
[<ffffffff8103053e>] do_page_fault+0xe/0x10
[<ffffffff8169e4c2>] page_fault+0x22/0x30
So, to prevent schedule_work_on from being called in preempt disabled context,
replace the pair of get/put_cpu() to get/put_cpu_light().
Signed-off-by: Yang Shi <yang.shi@windriver.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
98/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: mm/memcontrol: Replace local_irq_disable with local locks
Date: Wed, 28 Jan 2015 17:14:16 +0100
There are a few local_irq_disable() which then take sleeping locks. This
patch converts them local locks.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
99/258 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: mm/zsmalloc: copy with get_cpu_var() and locking
Date: Tue, 22 Mar 2016 11:16:09 +0100
get_cpu_var() disables preemption and triggers a might_sleep() splat later.
This is replaced with get_locked_var().
This bitspinlocks are replaced with a proper mutex which requires a slightly
larger struct to allocate.
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
[bigeasy: replace the bitspin_lock() with a mutex, get_locked_var(). Mike then
fixed the size magic]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
100/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: radix-tree: use local locks
Date: Wed, 25 Jan 2017 16:34:27 +0100
The preload functionality uses per-CPU variables and preempt-disable to
ensure that it does not switch CPUs during its usage. This patch adds
local_locks() instead preempt_disable() for the same purpose and to
remain preemptible on -RT.
Cc: stable-rt@vger.kernel.org
Reported-and-debugged-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
101/258 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: timers: Prepare for full preemption
Date: Fri, 3 Jul 2009 08:29:34 -0500
When softirqs can be preempted we need to make sure that cancelling
the timer from the active thread can not deadlock vs. a running timer
callback. Add a waitqueue to resolve that.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
102/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: x86: kvm Require const tsc for RT
Date: Sun, 6 Nov 2011 12:26:18 +0100
Non constant TSC is a nightmare on bare metal already, but with
virtualization it becomes a complete disaster because the workarounds
are horrible latency wise. That's also a preliminary for running RT in
a guest on top of a RT host.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
103/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: wait.h: include atomic.h
Date: Mon, 28 Oct 2013 12:19:57 +0100
| CC init/main.o
|In file included from include/linux/mmzone.h:9:0,
| from include/linux/gfp.h:4,
| from include/linux/kmod.h:22,
| from include/linux/module.h:13,
| from init/main.c:15:
|include/linux/wait.h: In function ‘wait_on_atomic_t’:
|include/linux/wait.h:982:2: error: implicit declaration of function ‘atomic_read’ [-Werror=implicit-function-declaration]
| if (atomic_read(val) == 0)
| ^
This pops up on ARM. Non-RT gets its atomic.h include from spinlock.h
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
104/258 [
Author: Daniel Wagner
Email: daniel.wagner@bmw-carit.de
Subject: work-simple: Simple work queue implemenation
Date: Fri, 11 Jul 2014 15:26:11 +0200
Provides a framework for enqueuing callbacks from irq context
PREEMPT_RT_FULL safe. The callbacks are executed in kthread context.
Bases on wait-simple.
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
]
105/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: completion: Use simple wait queues
Date: Fri, 11 Jan 2013 11:23:51 +0100
Completions have no long lasting callbacks and therefor do not need
the complex waitqueue variant. Use simple waitqueues which reduces the
contention on the waitqueue lock.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
106/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: fs/aio: simple simple work
Date: Mon, 16 Feb 2015 18:49:10 +0100
|BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:768
|in_atomic(): 1, irqs_disabled(): 0, pid: 26, name: rcuos/2
|2 locks held by rcuos/2/26:
| #0: (rcu_callback){.+.+..}, at: [<ffffffff810b1a12>] rcu_nocb_kthread+0x1e2/0x380
| #1: (rcu_read_lock_sched){.+.+..}, at: [<ffffffff812acd26>] percpu_ref_kill_rcu+0xa6/0x1c0
|Preemption disabled at:[<ffffffff810b1a93>] rcu_nocb_kthread+0x263/0x380
|Call Trace:
| [<ffffffff81582e9e>] dump_stack+0x4e/0x9c
| [<ffffffff81077aeb>] __might_sleep+0xfb/0x170
| [<ffffffff81589304>] rt_spin_lock+0x24/0x70
| [<ffffffff811c5790>] free_ioctx_users+0x30/0x130
| [<ffffffff812ace34>] percpu_ref_kill_rcu+0x1b4/0x1c0
| [<ffffffff810b1a93>] rcu_nocb_kthread+0x263/0x380
| [<ffffffff8106e046>] kthread+0xd6/0xf0
| [<ffffffff81591eec>] ret_from_fork+0x7c/0xb0
replace this preempt_disable() friendly swork.
Reported-By: Mike Galbraith <umgwanakikbuti@gmail.com>
Suggested-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
107/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: genirq: Do not invoke the affinity callback via a workqueue on RT
Date: Wed, 21 Aug 2013 17:48:46 +0200
Joe Korty reported, that __irq_set_affinity_locked() schedules a
workqueue while holding a rawlock which results in a might_sleep()
warning.
This patch uses swork_queue() instead.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
108/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: time/hrtimer: avoid schedule_work() with interrupts disabled
Date: Wed, 15 Nov 2017 17:29:51 +0100
The NOHZ code tries to schedule a workqueue with interrupts disabled.
Since this does not work -RT I am switching it to swork instead.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
109/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: hrtimer: consolidate hrtimer_init() + hrtimer_init_sleeper() calls
Date: Mon, 4 Sep 2017 18:31:50 +0200
hrtimer_init_sleeper() calls require a prior initialisation of the
hrtimer object with hrtimer_init(). Lets make the initialisation of
the hrtimer object part of hrtimer_init_sleeper(). To remain
consistent consider init_on_stack as well.
Beside adapting the hrtimer_init_sleeper[_on_stack]() functions, call
sites need to be updated as well.
[anna-maria: Updating the commit message]
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
110/258 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: hrtimers: Prepare full preemption
Date: Fri, 3 Jul 2009 08:29:34 -0500
Make cancellation of a running callback in softirq context safe
against preemption.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
111/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: hrtimer: by timers by default into the softirq context
Date: Fri, 3 Jul 2009 08:44:31 -0500
We can't have hrtimers callbacks running in hardirq context on RT. Therefore
the timers are deferred to the softirq context by default.
There are few timers which expect to be run in hardirq context even on RT.
Those are:
- very short running where low latency is critical (kvm lapic)
- timers which take raw locks and need run in hard-irq context (perf, sched)
- wake up related timer (kernel side of clock_nanosleep() and so on)
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
112/258 [
Author: Yang Shi
Email: yang.shi@windriver.com
Subject: hrtimer: Move schedule_work call to helper thread
Date: Mon, 16 Sep 2013 14:09:19 -0700
When run ltp leapsec_timer test, the following call trace is caught:
BUG: sleeping function called from invalid context at kernel/rtmutex.c:659
in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1
Preemption disabled at:[<ffffffff810857f3>] cpu_startup_entry+0x133/0x310
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.10.10-rt3 #2
Hardware name: Intel Corporation Calpella platform/MATXM-CORE-411-B, BIOS 4.6.3 08/18/2010
ffffffff81c2f800 ffff880076843e40 ffffffff8169918d ffff880076843e58
ffffffff8106db31 ffff88007684b4a0 ffff880076843e70 ffffffff8169d9c0
ffff88007684b4a0 ffff880076843eb0 ffffffff81059da1 0000001876851200
Call Trace:
<IRQ> [<ffffffff8169918d>] dump_stack+0x19/0x1b
[<ffffffff8106db31>] __might_sleep+0xf1/0x170
[<ffffffff8169d9c0>] rt_spin_lock+0x20/0x50
[<ffffffff81059da1>] queue_work_on+0x61/0x100
[<ffffffff81065aa1>] clock_was_set_delayed+0x21/0x30
[<ffffffff810883be>] do_timer+0x40e/0x660
[<ffffffff8108f487>] tick_do_update_jiffies64+0xf7/0x140
[<ffffffff8108fe42>] tick_check_idle+0x92/0xc0
[<ffffffff81044327>] irq_enter+0x57/0x70
[<ffffffff816a040e>] smp_apic_timer_interrupt+0x3e/0x9b
[<ffffffff8169f80a>] apic_timer_interrupt+0x6a/0x70
<EOI> [<ffffffff8155ea1c>] ? cpuidle_enter_state+0x4c/0xc0
[<ffffffff8155eb68>] cpuidle_idle_call+0xd8/0x2d0
[<ffffffff8100b59e>] arch_cpu_idle+0xe/0x30
[<ffffffff8108585e>] cpu_startup_entry+0x19e/0x310
[<ffffffff8168efa2>] start_secondary+0x1ad/0x1b0
The clock_was_set_delayed is called in hard IRQ handler (timer interrupt), which
calls schedule_work.
Under PREEMPT_RT_FULL, schedule_work calls spinlocks which could sleep, so it's
not safe to call schedule_work in interrupt context.
Reference upstream commit b68d61c705ef02384c0538b8d9374545097899ca
(rt,ntp: Move call to schedule_delayed_work() to helper thread)
from git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git, which
makes a similar change.
Signed-off-by: Yang Shi <yang.shi@windriver.com>
[bigeasy: use swork_queue() instead a helper thread]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
113/258 [
Author: John Stultz
Email: johnstul@us.ibm.com
Subject: posix-timers: Thread posix-cpu-timers on -rt
Date: Fri, 3 Jul 2009 08:29:58 -0500
posix-cpu-timer code takes non -rt safe locks in hard irq
context. Move it to a thread.
[ 3.0 fixes from Peter Zijlstra <peterz@infradead.org> ]
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
114/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Move task_struct cleanup to RCU
Date: Tue, 31 May 2011 16:59:16 +0200
__put_task_struct() does quite some expensive work. We don't want to
burden random tasks with that.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
115/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Limit the number of task migrations per batch
Date: Mon, 6 Jun 2011 12:12:51 +0200
Put an upper limit on the number of tasks which are migrated per batch
to avoid large latencies.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
116/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Move mmdrop to RCU on RT
Date: Mon, 6 Jun 2011 12:20:33 +0200
Takes sleeping locks and calls into the memory allocator, so nothing
we want to do in task switch and oder atomic contexts.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
117/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: kernel/sched: move stack + kprobe clean up to __put_task_struct()
Date: Mon, 21 Nov 2016 19:31:08 +0100
There is no need to free the stack before the task struct (except for reasons
mentioned in commit 68f24b08ee89 ("sched/core: Free the stack early if
CONFIG_THREAD_INFO_IN_TASK")). This also comes handy on -RT because we can't
free memory in preempt disabled region.
Cc: stable-rt@vger.kernel.org #for kprobe_flush_task()
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
118/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Add saved_state for tasks blocked on sleeping locks
Date: Sat, 25 Jun 2011 09:21:04 +0200
Spinlocks are state preserving in !RT. RT changes the state when a
task gets blocked on a lock. So we need to remember the state before
the lock contention. If a regular wakeup (not a RTmutex related
wakeup) happens, the saved_state is updated to running. When the lock
sleep is done, the saved state is restored.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
119/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Do not account rcu_preempt_depth on RT in might_sleep()
Date: Tue, 7 Jun 2011 09:19:06 +0200
RT changes the rcu_preempt_depth semantics, so we cannot check for it
in might_sleep().
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
120/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Use the proper LOCK_OFFSET for cond_resched()
Date: Sun, 17 Jul 2011 22:51:33 +0200
RT does not increment preempt count when a 'sleeping' spinlock is
locked. Update PREEMPT_LOCK_OFFSET for that case.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
121/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Disable TTWU_QUEUE on RT
Date: Tue, 13 Sep 2011 16:42:35 +0200
The queued remote wakeup mechanism can introduce rather large
latencies if the number of migrated tasks is high. Disable it for RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
122/258 [
Author: Steven Rostedt
Email: rostedt@goodmis.org
Subject: sched/workqueue: Only wake up idle workers if not blocked on sleeping spin lock
Date: Mon, 18 Mar 2013 15:12:49 -0400
In -rt, most spin_locks() turn into mutexes. One of these spin_lock
conversions is performed on the workqueue gcwq->lock. When the idle
worker is worken, the first thing it will do is grab that same lock and
it too will block, possibly jumping into the same code, but because
nr_running would already be decremented it prevents an infinite loop.
But this is still a waste of CPU cycles, and it doesn't follow the method
of mainline, as new workers should only be woken when a worker thread is
truly going to sleep, and not just blocked on a spin_lock().
Check the saved_state too before waking up new workers.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
123/258 [
Author: Daniel Bristot de Oliveira
Email: bristot@redhat.com
Subject: rt: Increase/decrease the nr of migratory tasks when enabling/disabling migration
Date: Mon, 26 Jun 2017 17:07:15 +0200
There is a problem in the migrate_disable()/enable() implementation
regarding the number of migratory tasks in the rt/dl RQs. The problem
is the following:
When a task is attached to the rt runqueue, it is checked if it either
can run in more than one CPU, or if it is with migration disable. If
either check is true, the rt_rq->rt_nr_migratory counter is not
increased. The counter increases otherwise.
When the task is detached, the same check is done. If either check is
true, the rt_rq->rt_nr_migratory counter is not decreased. The counter
decreases otherwise. The same check is done in the dl scheduler.
One important thing is that, migrate disable/enable does not touch this
counter for tasks attached to the rt rq. So suppose the following chain
of events.
Assumptions:
Task A is the only runnable task in A Task B runs on the CPU B
Task A runs on CFS (non-rt) Task B has RT priority
Thus, rt_nr_migratory is 0 B is running
Task A can run on all CPUS.
Timeline:
CPU A/TASK A CPU B/TASK B
A takes the rt mutex X .
A disables migration .
. B tries to take the rt mutex X
. As it is held by A {
. A inherits the rt priority of B
. A is dequeued from CFS RQ of CPU A
. A is enqueued in the RT RQ of CPU A
. As migration is disabled
. rt_nr_migratory in A is not increased
.
A enables migration
A releases the rt mutex X {
A returns to its original priority
A ask to be dequeued from RT RQ {
As migration is now enabled and it can run on all CPUS {
rt_nr_migratory should be decreased
As rt_nr_migratory is 0, rt_nr_migratory under flows
}
}
This variable is important because it notifies if there are more than one
runnable & migratory task in the runqueue. If there are more than one
tasks, the rt_rq is set as overloaded, and then tries to migrate some
tasks. This rule is important to keep the scheduler working conserving,
that is, in a system with M CPUs, the M highest priority tasks should be
running.
As rt_nr_migratory is unsigned, it will become > 0, notifying that the
RQ is overloaded, activating pushing mechanism without need.
This patch fixes this problem by decreasing/increasing the
rt/dl_nr_migratory in the migrate disable/enable operations.
Reported-by: Pei Zhang <pezhang@redhat.com>
Reported-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: LKML <linux-kernel@vger.kernel.org>
Cc: linux-rt-users <linux-rt-users@vger.kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
124/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: hotplug: Lightweight get online cpus
Date: Wed, 15 Jun 2011 12:36:06 +0200
get_online_cpus() is a heavy weight function which involves a global
mutex. migrate_disable() wants a simpler construct which prevents only
a CPU from going doing while a task is in a migrate disabled section.
Implement a per cpu lockless mechanism, which serializes only in the
real unplug case on a global mutex. That serialization affects only
tasks on the cpu which should be brought down.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
125/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: trace: Add migrate-disabled counter to tracing output
Date: Sun, 17 Jul 2011 21:56:42 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
126/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: lockdep: Make it RT aware
Date: Sun, 17 Jul 2011 18:51:23 +0200
teach lockdep that we don't really do softirqs on -RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
127/258 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: tasklet: Prevent tasklets from going into infinite spin in RT
Date: Tue, 29 Nov 2011 20:18:22 -0500
When CONFIG_PREEMPT_RT_FULL is enabled, tasklets run as threads,
and spinlocks turn are mutexes. But this can cause issues with
tasks disabling tasklets. A tasklet runs under ksoftirqd, and
if a tasklets are disabled with tasklet_disable(), the tasklet
count is increased. When a tasklet runs, it checks this counter
and if it is set, it adds itself back on the softirq queue and
returns.
The problem arises in RT because ksoftirq will see that a softirq
is ready to run (the tasklet softirq just re-armed itself), and will
not sleep, but instead run the softirqs again. The tasklet softirq
will still see that the count is non-zero and will not execute
the tasklet and requeue itself on the softirq again, which will
cause ksoftirqd to run it again and again and again.
It gets worse because ksoftirqd runs as a real-time thread.
If it preempted the task that disabled tasklets, and that task
has migration disabled, or can't run for other reasons, the tasklet
softirq will never run because the count will never be zero, and
ksoftirqd will go into an infinite loop. As an RT task, it this
becomes a big problem.
This is a hack solution to have tasklet_disable stop tasklets, and
when a tasklet runs, instead of requeueing the tasklet softirqd
it delays it. When tasklet_enable() is called, and tasklets are
waiting, then the tasklet_enable() will kick the tasklets to continue.
This prevents the lock up from ksoftirq going into an infinite loop.
[ rostedt@goodmis.org: ported to 3.0-rt ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
128/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: softirq: Check preemption after reenabling interrupts
Date: Sun, 13 Nov 2011 17:17:09 +0100
raise_softirq_irqoff() disables interrupts and wakes the softirq
daemon, but after reenabling interrupts there is no preemption check,
so the execution of the softirq thread might be delayed arbitrarily.
In principle we could add that check to local_irq_enable/restore, but
that's overkill as the rasie_softirq_irqoff() sections are the only
ones which show this behaviour.
Reported-by: Carsten Emde <cbe@osadl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
129/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: softirq: Disable softirq stacks for RT
Date: Mon, 18 Jul 2011 13:59:17 +0200
Disable extra stacks for softirqs. We want to preempt softirqs and
having them on special IRQ-stack does not make this easier.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
130/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: softirq: Split softirq locks
Date: Thu, 4 Oct 2012 14:20:47 +0100
The 3.x RT series removed the split softirq implementation in favour
of pushing softirq processing into the context of the thread which
raised it. Though this prevents us from handling the various softirqs
at different priorities. Now instead of reintroducing the split
softirq threads we split the locks which serialize the softirq
processing.
If a softirq is raised in context of a thread, then the softirq is
noted on a per thread field, if the thread is in a bh disabled
region. If the softirq is raised from hard interrupt context, then the
bit is set in the flag field of ksoftirqd and ksoftirqd is invoked.
When a thread leaves a bh disabled region, then it tries to execute
the softirqs which have been raised in its own context. It acquires
the per softirq / per cpu lock for the softirq and then checks,
whether the softirq is still pending in the per cpu
local_softirq_pending() field. If yes, it runs the softirq. If no,
then some other task executed it already. This allows for zero config
softirq elevation in the context of user space tasks or interrupt
threads.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
131/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net/core: use local_bh_disable() in netif_rx_ni()
Date: Fri, 16 Jun 2017 19:03:16 +0200
In 2004 netif_rx_ni() gained a preempt_disable() section around
netif_rx() and its do_softirq() + testing for it. The do_softirq() part
is required because netif_rx() raises the softirq but does not invoke
it. The preempt_disable() is required to remain on the same CPU which added the
skb to the per-CPU list.
All this can be avoided be putting this into a local_bh_disable()ed
section. The local_bh_enable() part will invoke do_softirq() if
required.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
132/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: genirq: Allow disabling of softirq processing in irq thread context
Date: Tue, 31 Jan 2012 13:01:27 +0100
The processing of softirqs in irq thread context is a performance gain
for the non-rt workloads of a system, but it's counterproductive for
interrupts which are explicitely related to the realtime
workload. Allow such interrupts to prevent softirq processing in their
thread context.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
133/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: softirq: split timer softirqs out of ksoftirqd
Date: Wed, 20 Jan 2016 16:34:17 +0100
The softirqd runs in -RT with SCHED_FIFO (prio 1) and deals mostly with
timer wakeup which can not happen in hardirq context. The prio has been
risen from the normal SCHED_OTHER so the timer wakeup does not happen
too late.
With enough networking load it is possible that the system never goes
idle and schedules ksoftirqd and everything else with a higher priority.
One of the tasks left behind is one of RCU's threads and so we see stalls
and eventually run out of memory.
This patch moves the TIMER and HRTIMER softirqs out of the `ksoftirqd`
thread into its own `ktimersoftd`. The former can now run SCHED_OTHER
(same as mainline) and the latter at SCHED_FIFO due to the wakeups.
From networking point of view: The NAPI callback runs after the network
interrupt thread completes. If its run time takes too long the NAPI code
itself schedules the `ksoftirqd`. Here in the thread it can run at
SCHED_OTHER priority and it won't defer RCU anymore.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
134/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: rtmutex: trylock is okay on -RT
Date: Wed, 2 Dec 2015 11:34:07 +0100
non-RT kernel could deadlock on rt_mutex_trylock() in softirq context. On
-RT we don't run softirqs in IRQ context but in thread context so it is
not a issue here.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
135/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: fs/nfs: turn rmdir_sem into a semaphore
Date: Thu, 15 Sep 2016 10:51:27 +0200
The RW semaphore had a reader side which used the _non_owner version
because it most likely took the reader lock in one thread and released it
in another which would cause lockdep to complain if the "regular"
version was used.
On -RT we need the owner because the rw lock is turned into a rtmutex.
The semaphores on the hand are "plain simple" and should work as
expected. We can't have multiple readers but on -RT we don't allow
multiple readers anyway so that is not a loss.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
136/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: Handle the various new futex race conditions
Date: Fri, 10 Jun 2011 11:04:15 +0200
RT opens a few new interesting race conditions in the rtmutex/futex
combo due to futex hash bucket lock being a 'sleeping' spinlock and
therefor not disabling preemption.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
137/258 [
Author: Steven Rostedt
Email: rostedt@goodmis.org
Subject: futex: Fix bug on when a requeued RT task times out
Date: Tue, 14 Jul 2015 14:26:34 +0200
Requeue with timeout causes a bug with PREEMPT_RT_FULL.
The bug comes from a timed out condition.
TASK 1 TASK 2
------ ------
futex_wait_requeue_pi()
futex_wait_queue_me()
<timed out>
double_lock_hb();
raw_spin_lock(pi_lock);
if (current->pi_blocked_on) {
} else {
current->pi_blocked_on = PI_WAKE_INPROGRESS;
run_spin_unlock(pi_lock);
spin_lock(hb->lock); <-- blocked!
plist_for_each_entry_safe(this) {
rt_mutex_start_proxy_lock();
task_blocks_on_rt_mutex();
BUG_ON(task->pi_blocked_on)!!!!
The BUG_ON() actually has a check for PI_WAKE_INPROGRESS, but the
problem is that, after TASK 1 sets PI_WAKE_INPROGRESS, it then tries to
grab the hb->lock, which it fails to do so. As the hb->lock is a mutex,
it will block and set the "pi_blocked_on" to the hb->lock.
When TASK 2 goes to requeue it, the check for PI_WAKE_INPROGESS fails
because the task1's pi_blocked_on is no longer set to that, but instead,
set to the hb->lock.
The fix:
When calling rt_mutex_start_proxy_lock() a check is made to see
if the proxy tasks pi_blocked_on is set. If so, exit out early.
Otherwise set it to a new flag PI_REQUEUE_INPROGRESS, which notifies
the proxy task that it is being requeued, and will handle things
appropriately.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
138/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: futex: Ensure lock/unlock symetry versus pi_lock and hash bucket lock
Date: Fri, 1 Mar 2013 11:17:42 +0100
In exit_pi_state_list() we have the following locking construct:
spin_lock(&hb->lock);
raw_spin_lock_irq(&curr->pi_lock);
...
spin_unlock(&hb->lock);
In !RT this works, but on RT the migrate_enable() function which is
called from spin_unlock() sees atomic context due to the held pi_lock
and just decrements the migrate_disable_atomic counter of the
task. Now the next call to migrate_disable() sees the counter being
negative and issues a warning. That check should be in
migrate_enable() already.
Fix this by dropping pi_lock before unlocking hb->lock and reaquire
pi_lock after that again. This is safe as the loop code reevaluates
head again under the pi_lock.
Reported-by: Yong Zhang <yong.zhang@windriver.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
139/258 [
Author: Grygorii Strashko
Email: Grygorii.Strashko@linaro.org
Subject: pid.h: include atomic.h
Date: Tue, 21 Jul 2015 19:43:56 +0300
This patch fixes build error:
CC kernel/pid_namespace.o
In file included from kernel/pid_namespace.c:11:0:
include/linux/pid.h: In function 'get_pid':
include/linux/pid.h:78:3: error: implicit declaration of function 'atomic_inc' [-Werror=implicit-function-declaration]
atomic_inc(&pid->count);
^
which happens when
CONFIG_PROVE_LOCKING=n
CONFIG_DEBUG_SPINLOCK=n
CONFIG_DEBUG_MUTEXES=n
CONFIG_DEBUG_LOCK_ALLOC=n
CONFIG_PID_NS=y
Vanilla gets this via spinlock.h.
Signed-off-by: Grygorii Strashko <Grygorii.Strashko@linaro.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
140/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: arm: include definition for cpumask_t
Date: Thu, 22 Dec 2016 17:28:33 +0100
This definition gets pulled in by other files. With the (later) split of
RCU and spinlock.h it won't compile anymore.
The split is done in ("rbtree: don't include the rcu header").
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
141/258 [
Author: "Wolfgang M. Reimer"
Email: linuxball@gmail.com
Subject: locking: locktorture: Do NOT include rwlock.h directly
Date: Tue, 21 Jul 2015 16:20:07 +0200
Including rwlock.h directly will cause kernel builds to fail
if CONFIG_PREEMPT_RT_FULL is defined. The correct header file
(rwlock_rt.h OR rwlock.h) will be included by spinlock.h which
is included by locktorture.c anyway.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Wolfgang M. Reimer <linuxball@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
142/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: Add rtmutex_lock_killable()
Date: Thu, 9 Jun 2011 11:43:52 +0200
Add "killable" type to rtmutex. We need this since rtmutex are used as
"normal" mutexes which do use this type.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
143/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: Make lock_killable work
Date: Sat, 1 Apr 2017 12:50:59 +0200
Locking an rt mutex killable does not work because signal handling is
restricted to TASK_INTERRUPTIBLE.
Use signal_pending_state() unconditionaly.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
144/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: spinlock: Split the lock types header
Date: Wed, 29 Jun 2011 19:34:01 +0200
Split raw_spinlock into its own file and the remaining spinlock_t into
its own non-RT header. The non-RT header will be replaced later by sleeping
spinlocks.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
145/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: Avoid include hell
Date: Wed, 29 Jun 2011 20:06:39 +0200
Include only the required raw types. This avoids pulling in the
complete spinlock header which in turn requires rtmutex.h at some point.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
146/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: rbtree: don't include the rcu header
Date: Wed, 22 Aug 2018 09:24:48 -0400
The RCU header pulls in spinlock.h and fails due not yet defined types:
|In file included from include/linux/spinlock.h:275:0,
| from include/linux/rcupdate.h:38,
| from include/linux/rbtree.h:34,
| from include/linux/rtmutex.h:17,
| from include/linux/spinlock_types.h:18,
| from kernel/bounds.c:13:
|include/linux/rwlock_rt.h:16:38: error: unknown type name ‘rwlock_t’
| extern void __lockfunc rt_write_lock(rwlock_t *rwlock);
| ^
This patch moves the required RCU function from the rcupdate.h header file into
a new header file which can be included by both users.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
147/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: Provide rt_mutex_slowlock_locked()
Date: Thu, 12 Oct 2017 16:14:22 +0200
This is the inner-part of rt_mutex_slowlock(), required for rwsem-rt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
148/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: export lockdep-less version of rt_mutex's lock, trylock and unlock
Date: Thu, 12 Oct 2017 16:36:39 +0200
Required for lock implementation ontop of rtmutex.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
149/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: add sleeping lock implementation
Date: Thu, 12 Oct 2017 17:11:19 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
150/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: add mutex implementation based on rtmutex
Date: Thu, 12 Oct 2017 17:17:03 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
151/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: add rwsem implementation based on rtmutex
Date: Thu, 12 Oct 2017 17:28:34 +0200
The RT specific R/W semaphore implementation restricts the number of readers
to one because a writer cannot block on multiple readers and inherit its
priority or budget.
The single reader restricting is painful in various ways:
- Performance bottleneck for multi-threaded applications in the page fault
path (mmap sem)
- Progress blocker for drivers which are carefully crafted to avoid the
potential reader/writer deadlock in mainline.
The analysis of the writer code pathes shows, that properly written RT tasks
should not take them. Syscalls like mmap(), file access which take mmap sem
write locked have unbound latencies which are completely unrelated to mmap
sem. Other R/W sem users like graphics drivers are not suitable for RT tasks
either.
So there is little risk to hurt RT tasks when the RT rwsem implementation is
changed in the following way:
- Allow concurrent readers
- Make writers block until the last reader left the critical section. This
blocking is not subject to priority/budget inheritance.
- Readers blocked on a writer inherit their priority/budget in the normal
way.
There is a drawback with this scheme. R/W semaphores become writer unfair
though the applications which have triggered writer starvation (mostly on
mmap_sem) in the past are not really the typical workloads running on a RT
system. So while it's unlikely to hit writer starvation, it's possible. If
there are unexpected workloads on RT systems triggering it, we need to rethink
the approach.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
152/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: add rwlock implementation based on rtmutex
Date: Thu, 12 Oct 2017 17:18:06 +0200
The implementation is bias-based, similar to the rwsem implementation.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
153/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: wire up RT's locking
Date: Thu, 12 Oct 2017 17:31:14 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
154/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: rtmutex: add ww_mutex addon for mutex-rt
Date: Thu, 12 Oct 2017 17:34:38 +0200
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
155/258 [
Author: Mikulas Patocka
Email: mpatocka@redhat.com
Subject: locking/rt-mutex: fix deadlock in device mapper / block-IO
Date: Mon, 13 Nov 2017 12:56:53 -0500
When some block device driver creates a bio and submits it to another
block device driver, the bio is added to current->bio_list (in order to
avoid unbounded recursion).
However, this queuing of bios can cause deadlocks, in order to avoid them,
device mapper registers a function flush_current_bio_list. This function
is called when device mapper driver blocks. It redirects bios queued on
current->bio_list to helper workqueues, so that these bios can proceed
even if the driver is blocked.
The problem with CONFIG_PREEMPT_RT_FULL is that when the device mapper
driver blocks, it won't call flush_current_bio_list (because
tsk_is_pi_blocked returns true in sched_submit_work), so deadlocks in
block device stack can happen.
Note that we can't call blk_schedule_flush_plug if tsk_is_pi_blocked
returns true - that would cause
BUG_ON(rt_mutex_real_waiter(task->pi_blocked_on)) in
task_blocks_on_rt_mutex when flush_current_bio_list attempts to take a
spinlock.
So the proper fix is to call blk_schedule_flush_plug in rt_mutex_fastlock,
when fast acquire failed and when the task is about to block.
CC: stable-rt@vger.kernel.org
[bigeasy: The deadlock is not device-mapper specific, it can also occur
in plain EXT4]
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
156/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locking/rtmutex: re-init the wait_lock in rt_mutex_init_proxy_locked()
Date: Thu, 16 Nov 2017 16:48:48 +0100
We could provide a key-class for the lockdep (and fixup all callers) or
move the init to all callers (like it was) in order to avoid lockdep
seeing a double-lock of the wait_lock.
Reported-by: Fernando Lopez-Lezcano <nando@ccrma.Stanford.EDU>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
157/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: ptrace: fix ptrace vs tasklist_lock race
Date: Thu, 29 Aug 2013 18:21:04 +0200
As explained by Alexander Fyodorov <halcy@yandex.ru>:
|read_lock(&tasklist_lock) in ptrace_stop() is converted to mutex on RT kernel,
|and it can remove __TASK_TRACED from task->state (by moving it to
|task->saved_state). If parent does wait() on child followed by a sys_ptrace
|call, the following race can happen:
|
|- child sets __TASK_TRACED in ptrace_stop()
|- parent does wait() which eventually calls wait_task_stopped() and returns
| child's pid
|- child blocks on read_lock(&tasklist_lock) in ptrace_stop() and moves
| __TASK_TRACED flag to saved_state
|- parent calls sys_ptrace, which calls ptrace_check_attach() and wait_task_inactive()
The patch is based on his initial patch where an additional check is
added in case the __TASK_TRACED moved to ->saved_state. The pi_lock is
taken in case the caller is interrupted between looking into ->state and
->saved_state.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
158/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: rtmutex: annotate sleeping lock context
Date: Thu, 21 Sep 2017 14:25:13 +0200
The RCU code complains on schedule() within a rcu_readlock() section.
The valid scenario on -RT is if a sleeping is held. In order to suppress
the warning the mirgrate_disable counter was used to identify the
invocation of schedule() due to lock contention.
Grygorii Strashko report that during CPU hotplug we might see the
warning via
rt_spin_lock() -> migrate_disable() -> pin_current_cpu() -> __read_rt_lock()
because the counter is not yet set.
It is also possible to trigger the warning from cpu_chill()
(seen on a kblockd_mod_delayed_work_on() caller).
To address this RCU warning I annotate the sleeping lock context. The
counter is incremented before migrate_disable() so the warning Grygorii
should not trigger anymore. Additionally I use that counter in
cpu_chill() to avoid the RCU warning from there.
Reported-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
159/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: sched/migrate_disable: fallback to preempt_disable() instead barrier()
Date: Thu, 5 Jul 2018 14:44:51 +0200
On SMP + !RT migrate_disable() is still around. It is not part of spin_lock()
anymore so it has almost no users. However the futex code has a workaround for
the !in_atomic() part of migrate disable which fails because the matching
migrade_disable() is no longer part of spin_lock().
On !SMP + !RT migrate_disable() is reduced to barrier(). This is not optimal
because we few spots where a "preempt_disable()" statement was replaced with
"migrate_disable()".
We also used the migration_disable counter to figure out if a sleeping lock is
acquired so RCU does not complain about schedule() during rcu_read_lock() while
a sleeping lock is held. This changed, we no longer use it, we have now a
sleeping_lock counter for the RCU purpose.
This means we can now:
- for SMP + RT_BASE
full migration program, nothing changes here
- for !SMP + RT_BASE
the migration counting is no longer required. It used to ensure that the task
is not migrated to another CPU and that this CPU remains online. !SMP ensures
that already.
Move it to CONFIG_SCHED_DEBUG so the counting is done for debugging purpose
only.
- for all other cases including !RT
fallback to preempt_disable(). The only remaining users of migrate_disable()
are those which were converted from preempt_disable() and the futex
workaround which is already in the preempt_disable() section due to the
spin_lock that is held.
Cc: stable-rt@vger.kernel.org
Reported-by: joe.korty@concurrent-rt.com
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
160/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locking: don't check for __LINUX_SPINLOCK_TYPES_H on -RT archs
Date: Fri, 4 Aug 2017 17:40:42 +0200
Upstream uses arch_spinlock_t within spinlock_t and requests that
spinlock_types.h header file is included first.
On -RT we have the rt_mutex with its raw_lock wait_lock which needs
architectures' spinlock_types.h header file for its definition. However
we need rt_mutex first because it is used to build the spinlock_t so
that check does not work for us.
Therefore I am dropping that check.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
161/258 [
Author: Peter Zijlstra
Email: a.p.zijlstra@chello.nl
Subject: rcu: Frob softirq test
Date: Sat, 13 Aug 2011 00:23:17 +0200
With RT_FULL we get the below wreckage:
[ 126.060484] =======================================================
[ 126.060486] [ INFO: possible circular locking dependency detected ]
[ 126.060489] 3.0.1-rt10+ #30
[ 126.060490] -------------------------------------------------------
[ 126.060492] irq/24-eth0/1235 is trying to acquire lock:
[ 126.060495] (&(lock)->wait_lock#2){+.+...}, at: [<ffffffff81501c81>] rt_mutex_slowunlock+0x16/0x55
[ 126.060503]
[ 126.060504] but task is already holding lock:
[ 126.060506] (&p->pi_lock){-...-.}, at: [<ffffffff81074fdc>] try_to_wake_up+0x35/0x429
[ 126.060511]
[ 126.060511] which lock already depends on the new lock.
[ 126.060513]
[ 126.060514]
[ 126.060514] the existing dependency chain (in reverse order) is:
[ 126.060516]
[ 126.060516] -> #1 (&p->pi_lock){-...-.}:
[ 126.060519] [<ffffffff810afe9e>] lock_acquire+0x145/0x18a
[ 126.060524] [<ffffffff8150291e>] _raw_spin_lock_irqsave+0x4b/0x85
[ 126.060527] [<ffffffff810b5aa4>] task_blocks_on_rt_mutex+0x36/0x20f
[ 126.060531] [<ffffffff815019bb>] rt_mutex_slowlock+0xd1/0x15a
[ 126.060534] [<ffffffff81501ae3>] rt_mutex_lock+0x2d/0x2f
[ 126.060537] [<ffffffff810d9020>] rcu_boost+0xad/0xde
[ 126.060541] [<ffffffff810d90ce>] rcu_boost_kthread+0x7d/0x9b
[ 126.060544] [<ffffffff8109a760>] kthread+0x99/0xa1
[ 126.060547] [<ffffffff81509b14>] kernel_thread_helper+0x4/0x10
[ 126.060551]
[ 126.060552] -> #0 (&(lock)->wait_lock#2){+.+...}:
[ 126.060555] [<ffffffff810af1b8>] __lock_acquire+0x1157/0x1816
[ 126.060558] [<ffffffff810afe9e>] lock_acquire+0x145/0x18a
[ 126.060561] [<ffffffff8150279e>] _raw_spin_lock+0x40/0x73
[ 126.060564] [<ffffffff81501c81>] rt_mutex_slowunlock+0x16/0x55
[ 126.060566] [<ffffffff81501ce7>] rt_mutex_unlock+0x27/0x29
[ 126.060569] [<ffffffff810d9f86>] rcu_read_unlock_special+0x17e/0x1c4
[ 126.060573] [<ffffffff810da014>] __rcu_read_unlock+0x48/0x89
[ 126.060576] [<ffffffff8106847a>] select_task_rq_rt+0xc7/0xd5
[ 126.060580] [<ffffffff8107511c>] try_to_wake_up+0x175/0x429
[ 126.060583] [<ffffffff81075425>] wake_up_process+0x15/0x17
[ 126.060585] [<ffffffff81080a51>] wakeup_softirqd+0x24/0x26
[ 126.060590] [<ffffffff81081df9>] irq_exit+0x49/0x55
[ 126.060593] [<ffffffff8150a3bd>] smp_apic_timer_interrupt+0x8a/0x98
[ 126.060597] [<ffffffff81509793>] apic_timer_interrupt+0x13/0x20
[ 126.060600] [<ffffffff810d5952>] irq_forced_thread_fn+0x1b/0x44
[ 126.060603] [<ffffffff810d582c>] irq_thread+0xde/0x1af
[ 126.060606] [<ffffffff8109a760>] kthread+0x99/0xa1
[ 126.060608] [<ffffffff81509b14>] kernel_thread_helper+0x4/0x10
[ 126.060611]
[ 126.060612] other info that might help us debug this:
[ 126.060614]
[ 126.060615] Possible unsafe locking scenario:
[ 126.060616]
[ 126.060617] CPU0 CPU1
[ 126.060619] ---- ----
[ 126.060620] lock(&p->pi_lock);
[ 126.060623] lock(&(lock)->wait_lock);
[ 126.060625] lock(&p->pi_lock);
[ 126.060627] lock(&(lock)->wait_lock);
[ 126.060629]
[ 126.060629] *** DEADLOCK ***
[ 126.060630]
[ 126.060632] 1 lock held by irq/24-eth0/1235:
[ 126.060633] #0: (&p->pi_lock){-...-.}, at: [<ffffffff81074fdc>] try_to_wake_up+0x35/0x429
[ 126.060638]
[ 126.060638] stack backtrace:
[ 126.060641] Pid: 1235, comm: irq/24-eth0 Not tainted 3.0.1-rt10+ #30
[ 126.060643] Call Trace:
[ 126.060644] <IRQ> [<ffffffff810acbde>] print_circular_bug+0x289/0x29a
[ 126.060651] [<ffffffff810af1b8>] __lock_acquire+0x1157/0x1816
[ 126.060655] [<ffffffff810ab3aa>] ? trace_hardirqs_off_caller+0x1f/0x99
[ 126.060658] [<ffffffff81501c81>] ? rt_mutex_slowunlock+0x16/0x55
[ 126.060661] [<ffffffff810afe9e>] lock_acquire+0x145/0x18a
[ 126.060664] [<ffffffff81501c81>] ? rt_mutex_slowunlock+0x16/0x55
[ 126.060668] [<ffffffff8150279e>] _raw_spin_lock+0x40/0x73
[ 126.060671] [<ffffffff81501c81>] ? rt_mutex_slowunlock+0x16/0x55
[ 126.060674] [<ffffffff810d9655>] ? rcu_report_qs_rsp+0x87/0x8c
[ 126.060677] [<ffffffff81501c81>] rt_mutex_slowunlock+0x16/0x55
[ 126.060680] [<ffffffff810d9ea3>] ? rcu_read_unlock_special+0x9b/0x1c4
[ 126.060683] [<ffffffff81501ce7>] rt_mutex_unlock+0x27/0x29
[ 126.060687] [<ffffffff810d9f86>] rcu_read_unlock_special+0x17e/0x1c4
[ 126.060690] [<ffffffff810da014>] __rcu_read_unlock+0x48/0x89
[ 126.060693] [<ffffffff8106847a>] select_task_rq_rt+0xc7/0xd5
[ 126.060696] [<ffffffff810683da>] ? select_task_rq_rt+0x27/0xd5
[ 126.060701] [<ffffffff810a852a>] ? clockevents_program_event+0x8e/0x90
[ 126.060704] [<ffffffff8107511c>] try_to_wake_up+0x175/0x429
[ 126.060708] [<ffffffff810a95dc>] ? tick_program_event+0x1f/0x21
[ 126.060711] [<ffffffff81075425>] wake_up_process+0x15/0x17
[ 126.060715] [<ffffffff81080a51>] wakeup_softirqd+0x24/0x26
[ 126.060718] [<ffffffff81081df9>] irq_exit+0x49/0x55
[ 126.060721] [<ffffffff8150a3bd>] smp_apic_timer_interrupt+0x8a/0x98
[ 126.060724] [<ffffffff81509793>] apic_timer_interrupt+0x13/0x20
[ 126.060726] <EOI> [<ffffffff81072855>] ? migrate_disable+0x75/0x12d
[ 126.060733] [<ffffffff81080a61>] ? local_bh_disable+0xe/0x1f
[ 126.060736] [<ffffffff81080a70>] ? local_bh_disable+0x1d/0x1f
[ 126.060739] [<ffffffff810d5952>] irq_forced_thread_fn+0x1b/0x44
[ 126.060742] [<ffffffff81502ac0>] ? _raw_spin_unlock_irq+0x3b/0x59
[ 126.060745] [<ffffffff810d582c>] irq_thread+0xde/0x1af
[ 126.060748] [<ffffffff810d5937>] ? irq_thread_fn+0x3a/0x3a
[ 126.060751] [<ffffffff810d574e>] ? irq_finalize_oneshot+0xd1/0xd1
[ 126.060754] [<ffffffff810d574e>] ? irq_finalize_oneshot+0xd1/0xd1
[ 126.060757] [<ffffffff8109a760>] kthread+0x99/0xa1
[ 126.060761] [<ffffffff81509b14>] kernel_thread_helper+0x4/0x10
[ 126.060764] [<ffffffff81069ed7>] ? finish_task_switch+0x87/0x10a
[ 126.060768] [<ffffffff81502ec4>] ? retint_restore_args+0xe/0xe
[ 126.060771] [<ffffffff8109a6c7>] ? __init_kthread_worker+0x8c/0x8c
[ 126.060774] [<ffffffff81509b10>] ? gs_change+0xb/0xb
Because irq_exit() does:
void irq_exit(void)
{
account_system_vtime(current);
trace_hardirq_exit();
sub_preempt_count(IRQ_EXIT_OFFSET);
if (!in_interrupt() && local_softirq_pending())
invoke_softirq();
...
}
Which triggers a wakeup, which uses RCU, now if the interrupted task has
t->rcu_read_unlock_special set, the rcu usage from the wakeup will end
up in rcu_read_unlock_special(). rcu_read_unlock_special() will test
for in_irq(), which will fail as we just decremented preempt_count
with IRQ_EXIT_OFFSET, and in_sering_softirq(), which for
PREEMPT_RT_FULL reads:
int in_serving_softirq(void)
{
int res;
preempt_disable();
res = __get_cpu_var(local_softirq_runner) == current;
preempt_enable();
return res;
}
Which will thus also fail, resulting in the above wreckage.
The 'somewhat' ugly solution is to open-code the preempt_count() test
in rcu_read_unlock_special().
Also, we're not at all sure how ->rcu_read_unlock_special gets set
here... so this is very likely a bandaid and more thought is required.
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
]
162/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rcu: Merge RCU-bh into RCU-preempt
Date: Wed, 5 Oct 2011 11:59:38 -0700
The Linux kernel has long RCU-bh read-side critical sections that
intolerably increase scheduling latency under mainline's RCU-bh rules,
which include RCU-bh read-side critical sections being non-preemptible.
This patch therefore arranges for RCU-bh to be implemented in terms of
RCU-preempt for CONFIG_PREEMPT_RT_FULL=y.
This has the downside of defeating the purpose of RCU-bh, namely,
handling the case where the system is subjected to a network-based
denial-of-service attack that keeps at least one CPU doing full-time
softirq processing. This issue will be fixed by a later commit.
The current commit will need some work to make it appropriate for
mainline use, for example, it needs to be extended to cover Tiny RCU.
[ paulmck: Added a useful changelog ]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20111005185938.GA20403@linux.vnet.ibm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
163/258 [
Author: "Paul E. McKenney"
Email: paulmck@linux.vnet.ibm.com
Subject: rcu: Make ksoftirqd do RCU quiescent states
Date: Wed, 5 Oct 2011 11:45:18 -0700
Implementing RCU-bh in terms of RCU-preempt makes the system vulnerable
to network-based denial-of-service attacks. This patch therefore
makes __do_softirq() invoke rcu_bh_qs(), but only when __do_softirq()
is running in ksoftirqd context. A wrapper layer in interposed so that
other calls to __do_softirq() avoid invoking rcu_bh_qs(). The underlying
function __do_softirq_common() does the actual work.
The reason that rcu_bh_qs() is bad in these non-ksoftirqd contexts is
that there might be a local_bh_enable() inside an RCU-preempt read-side
critical section. This local_bh_enable() can invoke __do_softirq()
directly, so if __do_softirq() were to invoke rcu_bh_qs() (which just
calls rcu_preempt_qs() in the PREEMPT_RT_FULL case), there would be
an illegal RCU-preempt quiescent state in the middle of an RCU-preempt
read-side critical section. Therefore, quiescent states can only happen
in cases where __do_softirq() is invoked directly from ksoftirqd.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20111005184518.GA21601@linux.vnet.ibm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
164/258 [
Author: "Paul E. McKenney"
Email: paulmck@linux.vnet.ibm.com
Subject: rcu: Eliminate softirq processing from rcutree
Date: Mon, 4 Nov 2013 13:21:10 -0800
Running RCU out of softirq is a problem for some workloads that would
like to manage RCU core processing independently of other softirq work,
for example, setting kthread priority. This commit therefore moves the
RCU core work from softirq to a per-CPU/per-flavor SCHED_OTHER kthread
named rcuc. The SCHED_OTHER approach avoids the scalability problems
that appeared with the earlier attempt to move RCU core processing to
from softirq to kthreads. That said, kernels built with RCU_BOOST=y
will run the rcuc kthreads at the RCU-boosting priority.
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
165/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: srcu: use cpu_online() instead custom check
Date: Wed, 13 Sep 2017 14:43:41 +0200
The current check via srcu_online is slightly racy because after looking
at srcu_online there could be an interrupt that interrupted us long
enough until the CPU we checked against went offline.
An alternative would be to hold the hotplug rwsem (so the CPUs don't
change their state) and then check based on cpu_online() if we queue it
on a specific CPU or not. queue_work_on() itself can handle if something
is enqueued on an offline CPU but a timer which is enqueued on an offline
CPU won't fire until the CPU is back online.
I am not sure if the removal in rcu_init() is okay or not. I assume that
SRCU won't enqueue a work item before SRCU is up and ready.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
166/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: srcu: replace local_irqsave() with a locallock
Date: Thu, 12 Oct 2017 18:37:12 +0200
There are two instances which disable interrupts in order to become a
stable this_cpu_ptr() pointer. The restore part is coupled with
spin_unlock_irqrestore() which does not work on RT.
Replace the local_irq_save() call with the appropriate local_lock()
version of it.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
167/258 [
Author: Julia Cartwright
Email: julia@ni.com
Subject: rcu: enable rcu_normal_after_boot by default for RT
Date: Wed, 12 Oct 2016 11:21:14 -0500
The forcing of an expedited grace period is an expensive and very
RT-application unfriendly operation, as it forcibly preempts all running
tasks on CPUs which are preventing the gp from expiring.
By default, as a policy decision, disable the expediting of grace
periods (after boot) on configurations which enable PREEMPT_RT_FULL.
Suggested-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Julia Cartwright <julia@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
168/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: tty/serial/omap: Make the locking RT aware
Date: Thu, 28 Jul 2011 13:32:57 +0200
The lock is a sleeping lock and local_irq_save() is not the
optimsation we are looking for. Redo it to make it work on -RT and
non-RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
169/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: tty/serial/pl011: Make the locking work on RT
Date: Tue, 8 Jan 2013 21:36:51 +0100
The lock is a sleeping lock and local_irq_save() is not the optimsation
we are looking for. Redo it to make it work on -RT and non-RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
170/258 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: rt: Improve the serial console PASS_LIMIT
Date: Wed, 14 Dec 2011 13:05:54 +0100
Beyond the warning:
drivers/tty/serial/8250/8250.c:1613:6: warning: unused variable ‘pass_counter’ [-Wunused-variable]
the solution of just looping infinitely was ugly - up it to 1 million to
give it a chance to continue in some really ugly situation.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
171/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: tty: serial: 8250: don't take the trylock during oops
Date: Mon, 11 Apr 2016 16:55:02 +0200
An oops with irqs off (panic() from irqsafe hrtimer like the watchdog
timer) will lead to a lockdep warning on each invocation and as such
never completes.
Therefore we skip the trylock in the oops case.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
172/258 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: locking/percpu-rwsem: Remove preempt_disable variants
Date: Wed, 23 Nov 2016 16:29:32 +0100
Effective revert commit:
87709e28dc7c ("fs/locks: Use percpu_down_read_preempt_disable()")
This is causing major pain for PREEMPT_RT and is only a very small
performance issue for PREEMPT=y.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
]
173/258 [
Author: Yong Zhang
Email: yong.zhang0@gmail.com
Subject: mm: Protect activate_mm() by preempt_[disable&enable]_rt()
Date: Tue, 15 May 2012 13:53:56 +0800
User preempt_*_rt instead of local_irq_*_rt or otherwise there will be
warning on ARM like below:
WARNING: at build/linux/kernel/smp.c:459 smp_call_function_many+0x98/0x264()
Modules linked in:
[<c0013bb4>] (unwind_backtrace+0x0/0xe4) from [<c001be94>] (warn_slowpath_common+0x4c/0x64)
[<c001be94>] (warn_slowpath_common+0x4c/0x64) from [<c001bec4>] (warn_slowpath_null+0x18/0x1c)
[<c001bec4>] (warn_slowpath_null+0x18/0x1c) from [<c0053ff8>](smp_call_function_many+0x98/0x264)
[<c0053ff8>] (smp_call_function_many+0x98/0x264) from [<c0054364>] (smp_call_function+0x44/0x6c)
[<c0054364>] (smp_call_function+0x44/0x6c) from [<c0017d50>] (__new_context+0xbc/0x124)
[<c0017d50>] (__new_context+0xbc/0x124) from [<c009e49c>] (flush_old_exec+0x460/0x5e4)
[<c009e49c>] (flush_old_exec+0x460/0x5e4) from [<c00d61ac>] (load_elf_binary+0x2e0/0x11ac)
[<c00d61ac>] (load_elf_binary+0x2e0/0x11ac) from [<c009d060>] (search_binary_handler+0x94/0x2a4)
[<c009d060>] (search_binary_handler+0x94/0x2a4) from [<c009e8fc>] (do_execve+0x254/0x364)
[<c009e8fc>] (do_execve+0x254/0x364) from [<c0010e84>] (sys_execve+0x34/0x54)
[<c0010e84>] (sys_execve+0x34/0x54) from [<c000da00>] (ret_fast_syscall+0x0/0x30)
---[ end trace 0000000000000002 ]---
The reason is that ARM need irq enabled when doing activate_mm().
According to mm-protect-activate-switch-mm.patch, actually
preempt_[disable|enable]_rt() is sufficient.
Inspired-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1337061236-1766-1-git-send-email-yong.zhang0@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
174/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: fs/dcache: bring back explicit INIT_HLIST_BL_HEAD init
Date: Wed, 13 Sep 2017 12:32:34 +0200
Commit 3d375d78593c ("mm: update callers to use HASH_ZERO flag") removed
INIT_HLIST_BL_HEAD and uses the ZERO flag instead for the init. However
on RT we have also a spinlock which needs an init call so we can't use
that.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
175/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: fs/dcache: disable preemption on i_dir_seq's write side
Date: Fri, 20 Oct 2017 11:29:53 +0200
i_dir_seq is an opencoded seqcounter. Based on the code it looks like we
could have two writers in parallel despite the fact that the d_lock is
held. The problem is that during the write process on RT the preemption
is still enabled and if this process is interrupted by a reader with RT
priority then we lock up.
To avoid that lock up I am disabling the preemption during the update.
The rename of i_dir_seq is here to ensure to catch new write sides in
future.
Cc: stable-rt@vger.kernel.org
Reported-by: Oleg.Karfich@wago.com
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
176/258 [
Author: Julia Cartwright
Email: julia@ni.com
Subject: squashfs: make use of local lock in multi_cpu decompressor
Date: Mon, 7 May 2018 08:58:57 -0500
Currently, the squashfs multi_cpu decompressor makes use of
get_cpu_ptr()/put_cpu_ptr(), which unconditionally disable preemption
during decompression.
Because the workload is distributed across CPUs, all CPUs can observe a
very high wakeup latency, which has been seen to be as much as 8000us.
Convert this decompressor to make use of a local lock, which will allow
execution of the decompressor with preemption-enabled, but also ensure
concurrent accesses to the percpu compressor data on the local CPU will
be serialized.
Cc: stable-rt@vger.kernel.org
Reported-by: Alexander Stein <alexander.stein@systec-electronic.com>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Julia Cartwright <julia@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
177/258 [
Author: Daniel Wagner
Email: wagi@monom.org
Subject: thermal: Defer thermal wakups to threads
Date: Tue, 17 Feb 2015 09:37:44 +0100
On RT the spin lock in pkg_temp_thermal_platfrom_thermal_notify will
call schedule while we run in irq context.
[<ffffffff816850ac>] dump_stack+0x4e/0x8f
[<ffffffff81680f7d>] __schedule_bug+0xa6/0xb4
[<ffffffff816896b4>] __schedule+0x5b4/0x700
[<ffffffff8168982a>] schedule+0x2a/0x90
[<ffffffff8168a8b5>] rt_spin_lock_slowlock+0xe5/0x2d0
[<ffffffff8168afd5>] rt_spin_lock+0x25/0x30
[<ffffffffa03a7b75>] pkg_temp_thermal_platform_thermal_notify+0x45/0x134 [x86_pkg_temp_thermal]
[<ffffffff8103d4db>] ? therm_throt_process+0x1b/0x160
[<ffffffff8103d831>] intel_thermal_interrupt+0x211/0x250
[<ffffffff8103d8c1>] smp_thermal_interrupt+0x21/0x40
[<ffffffff8169415d>] thermal_interrupt+0x6d/0x80
Let's defer the work to a kthread.
Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
[bigeasy: reoder init/denit position. TODO: flush swork on exit]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
178/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: fs/epoll: Do not disable preemption on RT
Date: Fri, 8 Jul 2011 16:35:35 +0200
ep_call_nested() takes a sleeping lock so we can't disable preemption.
The light version is enough since ep_call_nested() doesn't mind beeing
invoked twice on the same CPU.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
179/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mm/vmalloc: Another preempt disable region which sucks
Date: Tue, 12 Jul 2011 11:39:36 +0200
Avoid the preempt disable version of get_cpu_var(). The inner-lock should
provide enough serialisation.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
180/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: block: mq: use cpu_light()
Date: Wed, 9 Apr 2014 10:37:23 +0200
there is a might sleep splat because get_cpu() disables preemption and
later we grab a lock. As a workaround for this we use get_cpu_light().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
181/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: block/mq: do not invoke preempt_disable()
Date: Tue, 14 Jul 2015 14:26:34 +0200
preempt_disable() and get_cpu() don't play well together with the sleeping
locks it tries to allocate later.
It seems to be enough to replace it with get_cpu_light() and migrate_disable().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
182/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: block/mq: don't complete requests via IPI
Date: Thu, 29 Jan 2015 15:10:08 +0100
The IPI runs in hardirq context and there are sleeping locks. This patch
moves the completion into a workqueue.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
183/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: md: raid5: Make raid5_percpu handling RT aware
Date: Tue, 6 Apr 2010 16:51:31 +0200
__raid_run_ops() disables preemption with get_cpu() around the access
to the raid5_percpu variables. That causes scheduling while atomic
spews on RT.
Serialize the access to the percpu data with a lock and keep the code
preemptible.
Reported-by: Udo van den Heuvel <udovdh@xs4all.nl>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Udo van den Heuvel <udovdh@xs4all.nl>
]
184/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rt: Introduce cpu_chill()
Date: Wed, 7 Mar 2012 20:51:03 +0100
Retry loops on RT might loop forever when the modifying side was
preempted. Add cpu_chill() to replace cpu_relax(). cpu_chill()
defaults to cpu_relax() for non RT. On RT it puts the looping task to
sleep for a tick so the preempted task can make progress.
Steven Rostedt changed it to use a hrtimer instead of msleep():
|
|Ulrich Obergfell pointed out that cpu_chill() calls msleep() which is woken
|up by the ksoftirqd running the TIMER softirq. But as the cpu_chill() is
|called from softirq context, it may block the ksoftirqd() from running, in
|which case, it may never wake up the msleep() causing the deadlock.
+ bigeasy later changed to schedule_hrtimeout()
|If a task calls cpu_chill() and gets woken up by a regular or spurious
|wakeup and has a signal pending, then it exits the sleep loop in
|do_nanosleep() and sets up the restart block. If restart->nanosleep.type is
|not TI_NONE then this results in accessing a stale user pointer from a
|previously interrupted syscall and a copy to user based on the stale
|pointer or a BUG() when 'type' is not supported in nanosleep_copyout().
+ bigeasy: add PF_NOFREEZE:
| [....] Waiting for /dev to be fully populated...
| =====================================
| [ BUG: udevd/229 still has locks held! ]
| 3.12.11-rt17 #23 Not tainted
| -------------------------------------
| 1 lock held by udevd/229:
| #0: (&type->i_mutex_dir_key#2){+.+.+.}, at: lookup_slow+0x28/0x98
|
| stack backtrace:
| CPU: 0 PID: 229 Comm: udevd Not tainted 3.12.11-rt17 #23
| (unwind_backtrace+0x0/0xf8) from (show_stack+0x10/0x14)
| (show_stack+0x10/0x14) from (dump_stack+0x74/0xbc)
| (dump_stack+0x74/0xbc) from (do_nanosleep+0x120/0x160)
| (do_nanosleep+0x120/0x160) from (hrtimer_nanosleep+0x90/0x110)
| (hrtimer_nanosleep+0x90/0x110) from (cpu_chill+0x30/0x38)
| (cpu_chill+0x30/0x38) from (dentry_kill+0x158/0x1ec)
| (dentry_kill+0x158/0x1ec) from (dput+0x74/0x15c)
| (dput+0x74/0x15c) from (lookup_real+0x4c/0x50)
| (lookup_real+0x4c/0x50) from (__lookup_hash+0x34/0x44)
| (__lookup_hash+0x34/0x44) from (lookup_slow+0x38/0x98)
| (lookup_slow+0x38/0x98) from (path_lookupat+0x208/0x7fc)
| (path_lookupat+0x208/0x7fc) from (filename_lookup+0x20/0x60)
| (filename_lookup+0x20/0x60) from (user_path_at_empty+0x50/0x7c)
| (user_path_at_empty+0x50/0x7c) from (user_path_at+0x14/0x1c)
| (user_path_at+0x14/0x1c) from (vfs_fstatat+0x48/0x94)
| (vfs_fstatat+0x48/0x94) from (SyS_stat64+0x14/0x30)
| (SyS_stat64+0x14/0x30) from (ret_fast_syscall+0x0/0x48)
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
185/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: block: blk-mq: move blk_queue_usage_counter_release() into process context
Date: Tue, 13 Mar 2018 13:49:16 +0100
| BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:914
| in_atomic(): 1, irqs_disabled(): 0, pid: 255, name: kworker/u257:6
| 5 locks held by kworker/u257:6/255:
| #0: ("events_unbound"){.+.+.+}, at: [<ffffffff8108edf1>] process_one_work+0x171/0x5e0
| #1: ((&entry->work)){+.+.+.}, at: [<ffffffff8108edf1>] process_one_work+0x171/0x5e0
| #2: (&shost->scan_mutex){+.+.+.}, at: [<ffffffffa000faa3>] __scsi_add_device+0xa3/0x130 [scsi_mod]
| #3: (&set->tag_list_lock){+.+...}, at: [<ffffffff812f09fa>] blk_mq_init_queue+0x96a/0xa50
| #4: (rcu_read_lock_sched){......}, at: [<ffffffff8132887d>] percpu_ref_kill_and_confirm+0x1d/0x120
| Preemption disabled at:[<ffffffff812eff76>] blk_mq_freeze_queue_start+0x56/0x70
|
| CPU: 2 PID: 255 Comm: kworker/u257:6 Not tainted 3.18.7-rt0+ #1
| Workqueue: events_unbound async_run_entry_fn
| 0000000000000003 ffff8800bc29f998 ffffffff815b3a12 0000000000000000
| 0000000000000000 ffff8800bc29f9b8 ffffffff8109aa16 ffff8800bc29fa28
| ffff8800bc5d1bc8 ffff8800bc29f9e8 ffffffff815b8dd4 ffff880000000000
| Call Trace:
| [<ffffffff815b3a12>] dump_stack+0x4f/0x7c
| [<ffffffff8109aa16>] __might_sleep+0x116/0x190
| [<ffffffff815b8dd4>] rt_spin_lock+0x24/0x60
| [<ffffffff810b6089>] __wake_up+0x29/0x60
| [<ffffffff812ee06e>] blk_mq_usage_counter_release+0x1e/0x20
| [<ffffffff81328966>] percpu_ref_kill_and_confirm+0x106/0x120
| [<ffffffff812eff76>] blk_mq_freeze_queue_start+0x56/0x70
| [<ffffffff812f0000>] blk_mq_update_tag_set_depth+0x40/0xd0
| [<ffffffff812f0a1c>] blk_mq_init_queue+0x98c/0xa50
| [<ffffffffa000dcf0>] scsi_mq_alloc_queue+0x20/0x60 [scsi_mod]
| [<ffffffffa000ea35>] scsi_alloc_sdev+0x2f5/0x370 [scsi_mod]
| [<ffffffffa000f494>] scsi_probe_and_add_lun+0x9e4/0xdd0 [scsi_mod]
| [<ffffffffa000fb26>] __scsi_add_device+0x126/0x130 [scsi_mod]
| [<ffffffffa013033f>] ata_scsi_scan_host+0xaf/0x200 [libata]
| [<ffffffffa012b5b6>] async_port_probe+0x46/0x60 [libata]
| [<ffffffff810978fb>] async_run_entry_fn+0x3b/0xf0
| [<ffffffff8108ee81>] process_one_work+0x201/0x5e0
percpu_ref_kill_and_confirm() invokes blk_mq_usage_counter_release() in
a rcu-sched region. swait based wake queue can't be used due to
wake_up_all() usage and disabled interrupts in !RT configs (as reported
by Corey Minyard).
The wq_has_sleeper() check has been suggested by Peter Zijlstra.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
186/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: block: Use cpu_chill() for retry loops
Date: Thu, 20 Dec 2012 18:28:26 +0100
Retry loops on RT might loop forever when the modifying side was
preempted. Steven also observed a live lock when there was a
concurrent priority boosting going on.
Use cpu_chill() instead of cpu_relax() to let the system
make progress.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
187/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: fs: dcache: Use cpu_chill() in trylock loops
Date: Wed, 7 Mar 2012 21:00:34 +0100
Retry loops on RT might loop forever when the modifying side was
preempted. Use cpu_chill() instead of cpu_relax() to let the system
make progress.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
188/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: net: Use cpu_chill() instead of cpu_relax()
Date: Wed, 7 Mar 2012 21:10:04 +0100
Retry loops on RT might loop forever when the modifying side was
preempted. Use cpu_chill() instead of cpu_relax() to let the system
make progress.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
189/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: fs/dcache: use swait_queue instead of waitqueue
Date: Wed, 14 Sep 2016 14:35:49 +0200
__d_lookup_done() invokes wake_up_all() while holding a hlist_bl_lock()
which disables preemption. As a workaround convert it to swait.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
190/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: workqueue: Use normal rcu
Date: Wed, 24 Jul 2013 15:26:54 +0200
There is no need for sched_rcu. The undocumented reason why sched_rcu
is used is to avoid a few explicit rcu_read_lock()/unlock() pairs by
abusing the fact that sched_rcu reader side critical sections are also
protected by preempt or irq disabled regions.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
191/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: workqueue: Use local irq lock instead of irq disable regions
Date: Sun, 17 Jul 2011 21:42:26 +0200
Use a local_irq_lock as a replacement for irq off regions. We keep the
semantic of irq-off in regard to the pool->lock and remain preemptible.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
192/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: workqueue: Prevent workqueue versus ata-piix livelock
Date: Mon, 1 Jul 2013 11:02:42 +0200
An Intel i7 system regularly detected rcu_preempt stalls after the kernel
was upgraded from 3.6-rt to 3.8-rt. When the stall happened, disk I/O was no
longer possible, unless the system was restarted.
The kernel message was:
INFO: rcu_preempt self-detected stall on CPU { 6}
[..]
NMI backtrace for cpu 6
CPU 6
Pid: 119, comm: irq/19-ata_piix Not tainted 3.8.13-rt13 #11 Shuttle Inc. SX58/SX58
RIP: 0010:[<ffffffff8124ca60>] [<ffffffff8124ca60>] ip_compute_csum+0x30/0x30
RSP: 0018:ffff880333303cb0 EFLAGS: 00000002
RAX: 0000000000000006 RBX: 00000000000003e9 RCX: 0000000000000034
RDX: 0000000000000000 RSI: ffffffff81aa16d0 RDI: 0000000000000001
RBP: ffff880333303ce8 R08: ffffffff81aa16d0 R09: ffffffff81c1b8cc
R10: 0000000000000000 R11: 0000000000000000 R12: 000000000005161f
R13: 0000000000000006 R14: ffffffff81aa16d0 R15: 0000000000000002
FS: 0000000000000000(0000) GS:ffff880333300000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000003c1b2bb420 CR3: 0000000001a0f000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process irq/19-ata_piix (pid: 119, threadinfo ffff88032d88a000, task ffff88032df80000)
Stack:
ffffffff8124cb32 000000000005161e 00000000000003e9 0000000000001000
0000000000009022 ffffffff81aa16d0 0000000000000002 ffff880333303cf8
ffffffff8124caa9 ffff880333303d08 ffffffff8124cad2 ffff880333303d28
Call Trace:
<IRQ>
[<ffffffff8124cb32>] ? delay_tsc+0x33/0xe3
[<ffffffff8124caa9>] __delay+0xf/0x11
[<ffffffff8124cad2>] __const_udelay+0x27/0x29
[<ffffffff8102d1fa>] native_safe_apic_wait_icr_idle+0x39/0x45
[<ffffffff8102dc9b>] __default_send_IPI_dest_field.constprop.0+0x1e/0x58
[<ffffffff8102dd1e>] default_send_IPI_mask_sequence_phys+0x49/0x7d
[<ffffffff81030326>] physflat_send_IPI_all+0x17/0x19
[<ffffffff8102de53>] arch_trigger_all_cpu_backtrace+0x50/0x79
[<ffffffff810b21d0>] rcu_check_callbacks+0x1cb/0x568
[<ffffffff81048c9c>] ? raise_softirq+0x2e/0x35
[<ffffffff81086be0>] ? tick_sched_do_timer+0x38/0x38
[<ffffffff8104f653>] update_process_times+0x44/0x55
[<ffffffff81086866>] tick_sched_handle+0x4a/0x59
[<ffffffff81086c1c>] tick_sched_timer+0x3c/0x5b
[<ffffffff81062845>] __run_hrtimer+0x9b/0x158
[<ffffffff810631d8>] hrtimer_interrupt+0x172/0x2aa
[<ffffffff8102d498>] smp_apic_timer_interrupt+0x76/0x89
[<ffffffff814d881d>] apic_timer_interrupt+0x6d/0x80
<EOI>
[<ffffffff81057cd2>] ? __local_lock_irqsave+0x17/0x4a
[<ffffffff81059336>] try_to_grab_pending+0x42/0x17e
[<ffffffff8105a699>] mod_delayed_work_on+0x32/0x88
[<ffffffff8105a70b>] mod_delayed_work+0x1c/0x1e
[<ffffffff8122ae84>] blk_run_queue_async+0x37/0x39
[<ffffffff81230985>] flush_end_io+0xf1/0x107
[<ffffffff8122e0da>] blk_finish_request+0x21e/0x264
[<ffffffff8122e162>] blk_end_bidi_request+0x42/0x60
[<ffffffff8122e1ba>] blk_end_request+0x10/0x12
[<ffffffff8132de46>] scsi_io_completion+0x1bf/0x492
[<ffffffff81335cec>] ? sd_done+0x298/0x2ef
[<ffffffff81325a02>] scsi_finish_command+0xe9/0xf2
[<ffffffff8132dbcb>] scsi_softirq_done+0x106/0x10f
[<ffffffff812333d3>] blk_done_softirq+0x77/0x87
[<ffffffff8104826f>] do_current_softirqs+0x172/0x2e1
[<ffffffff810aa820>] ? irq_thread_fn+0x3a/0x3a
[<ffffffff81048466>] local_bh_enable+0x43/0x72
[<ffffffff810aa866>] irq_forced_thread_fn+0x46/0x52
[<ffffffff810ab089>] irq_thread+0x8c/0x17c
[<ffffffff810ab179>] ? irq_thread+0x17c/0x17c
[<ffffffff810aaffd>] ? wake_threads_waitq+0x44/0x44
[<ffffffff8105eb18>] kthread+0x8d/0x95
[<ffffffff8105ea8b>] ? __kthread_parkme+0x65/0x65
[<ffffffff814d7b7c>] ret_from_fork+0x7c/0xb0
[<ffffffff8105ea8b>] ? __kthread_parkme+0x65/0x65
The state of softirqd of this CPU at the time of the crash was:
ksoftirqd/6 R running task 0 53 2 0x00000000
ffff88032fc39d18 0000000000000046 ffff88033330c4c0 ffff8803303f4710
ffff88032fc39fd8 ffff88032fc39fd8 0000000000000000 0000000000062500
ffff88032df88000 ffff8803303f4710 0000000000000000 ffff88032fc38000
Call Trace:
[<ffffffff8105a3ae>] ? __queue_work+0x27c/0x27c
[<ffffffff814d178c>] preempt_schedule+0x61/0x76
[<ffffffff8106cccf>] migrate_enable+0xe5/0x1df
[<ffffffff8105a3ae>] ? __queue_work+0x27c/0x27c
[<ffffffff8104ef52>] run_timer_softirq+0x161/0x1d6
[<ffffffff8104826f>] do_current_softirqs+0x172/0x2e1
[<ffffffff8104840b>] run_ksoftirqd+0x2d/0x45
[<ffffffff8106658a>] smpboot_thread_fn+0x2ea/0x308
[<ffffffff810662a0>] ? test_ti_thread_flag+0xc/0xc
[<ffffffff810662a0>] ? test_ti_thread_flag+0xc/0xc
[<ffffffff8105eb18>] kthread+0x8d/0x95
[<ffffffff8105ea8b>] ? __kthread_parkme+0x65/0x65
[<ffffffff814d7afc>] ret_from_fork+0x7c/0xb0
[<ffffffff8105ea8b>] ? __kthread_parkme+0x65/0x65
Apparently, the softirq demon and the ata_piix IRQ handler were waiting
for each other to finish ending up in a livelock. After the below patch
was applied, the system no longer crashes.
Reported-by: Carsten Emde <C.Emde@osadl.org>
Proposed-by: Thomas Gleixner <tglx@linutronix.de>
Tested by: Carsten Emde <C.Emde@osadl.org>
Signed-off-by: Carsten Emde <C.Emde@osadl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
193/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Distangle worker accounting from rqlock
Date: Wed, 22 Jun 2011 19:47:03 +0200
The worker accounting for cpu bound workers is plugged into the core
scheduler code and the wakeup code. This is not a hard requirement and
can be avoided by keeping track of the state in the workqueue code
itself.
Keep track of the sleeping state in the worker itself and call the
notifier before entering the core scheduler. There might be false
positives when the task is woken between that call and actually
scheduling, but that's not really different from scheduling and being
woken immediately after switching away. There is also no harm from
updating nr_running when the task returns from scheduling instead of
accounting it in the wakeup code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20110622174919.135236139@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy: preempt_disable() around wq_worker_sleeping() by Daniel Bristot de
Oliveira]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
194/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: debugobjects: Make RT aware
Date: Sun, 17 Jul 2011 21:41:35 +0200
Avoid filling the pool / allocating memory with irqs off().
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
195/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: seqlock: Prevent rt starvation
Date: Wed, 22 Feb 2012 12:03:30 +0100
If a low prio writer gets preempted while holding the seqlock write
locked, a high prio reader spins forever on RT.
To prevent this let the reader grab the spinlock, so it blocks and
eventually boosts the writer. This way the writer can proceed and
endless spinning is prevented.
For seqcount writers we disable preemption over the update code
path. Thanks to Al Viro for distangling some VFS code to make that
possible.
Nicholas Mc Guire:
- spin_lock+unlock => spin_unlock_wait
- __write_seqcount_begin => __raw_write_seqcount_begin
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
196/258 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: sunrpc: Make svc_xprt_do_enqueue() use get_cpu_light()
Date: Wed, 18 Feb 2015 16:05:28 +0100
|BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:915
|in_atomic(): 1, irqs_disabled(): 0, pid: 3194, name: rpc.nfsd
|Preemption disabled at:[<ffffffffa06bf0bb>] svc_xprt_received+0x4b/0xc0 [sunrpc]
|CPU: 6 PID: 3194 Comm: rpc.nfsd Not tainted 3.18.7-rt1 #9
|Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.404 11/06/2014
| ffff880409630000 ffff8800d9a33c78 ffffffff815bdeb5 0000000000000002
| 0000000000000000 ffff8800d9a33c98 ffffffff81073c86 ffff880408dd6008
| ffff880408dd6000 ffff8800d9a33cb8 ffffffff815c3d84 ffff88040b3ac000
|Call Trace:
| [<ffffffff815bdeb5>] dump_stack+0x4f/0x9e
| [<ffffffff81073c86>] __might_sleep+0xe6/0x150
| [<ffffffff815c3d84>] rt_spin_lock+0x24/0x50
| [<ffffffffa06beec0>] svc_xprt_do_enqueue+0x80/0x230 [sunrpc]
| [<ffffffffa06bf0bb>] svc_xprt_received+0x4b/0xc0 [sunrpc]
| [<ffffffffa06c03ed>] svc_add_new_perm_xprt+0x6d/0x80 [sunrpc]
| [<ffffffffa06b2693>] svc_addsock+0x143/0x200 [sunrpc]
| [<ffffffffa072e69c>] write_ports+0x28c/0x340 [nfsd]
| [<ffffffffa072d2ac>] nfsctl_transaction_write+0x4c/0x80 [nfsd]
| [<ffffffff8117ee83>] vfs_write+0xb3/0x1d0
| [<ffffffff8117f889>] SyS_write+0x49/0xb0
| [<ffffffff815c4556>] system_call_fastpath+0x16/0x1b
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
197/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: net: Use skbufhead with raw lock
Date: Tue, 12 Jul 2011 15:38:34 +0200
Use the rps lock as rawlock so we can keep irq-off regions. It looks low
latency. However we can't kfree() from this context therefore we defer this
to the softirq and use the tofree_queue list for it (similar to process_queue).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
198/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net: move xmit_recursion to per-task variable on -RT
Date: Wed, 13 Jan 2016 15:55:02 +0100
A softirq on -RT can be preempted. That means one task is in
__dev_queue_xmit(), gets preempted and another task may enter
__dev_queue_xmit() aw well. netperf together with a bridge device
will then trigger the `recursion alert` because each task increments
the xmit_recursion variable which is per-CPU.
A virtual device like br0 is required to trigger this warning.
This patch moves the lock owner and counter to be per task instead per-CPU so
it counts the recursion properly on -RT. The owner is also a task now and not a
CPU number.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
199/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net: provide a way to delegate processing a softirq to ksoftirqd
Date: Wed, 20 Jan 2016 15:39:05 +0100
If the NET_RX uses up all of his budget it moves the following NAPI
invocations into the `ksoftirqd`. On -RT it does not do so. Instead it
rises the NET_RX softirq in its current context again.
In order to get closer to mainline's behaviour this patch provides
__raise_softirq_irqoff_ksoft() which raises the softirq in the ksoftird.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
200/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net: dev: always take qdisc's busylock in __dev_xmit_skb()
Date: Wed, 30 Mar 2016 13:36:29 +0200
The root-lock is dropped before dev_hard_start_xmit() is invoked and after
setting the __QDISC___STATE_RUNNING bit. If this task is now pushed away
by a task with a higher priority then the task with the higher priority
won't be able to submit packets to the NIC directly instead they will be
enqueued into the Qdisc. The NIC will remain idle until the task(s) with
higher priority leave the CPU and the task with lower priority gets back
and finishes the job.
If we take always the busylock we ensure that the RT task can boost the
low-prio task and submit the packet.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
201/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net/Qdisc: use a seqlock instead seqcount
Date: Wed, 14 Sep 2016 17:36:35 +0200
The seqcount disables preemption on -RT while it is held which can't
remove. Also we don't want the reader to spin for ages if the writer is
scheduled out. The seqlock on the other hand will serialize / sleep on
the lock while writer is active.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
202/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net: add back the missing serialization in ip_send_unicast_reply()
Date: Wed, 31 Aug 2016 17:21:56 +0200
Some time ago Sami Pietikäinen reported a crash on -RT in
ip_send_unicast_reply() which was later fixed by Nicholas Mc Guire
(v3.12.8-rt11). Later (v3.18.8) the code was reworked and I dropped the
patch. As it turns out it was mistake.
I have reports that the same crash is possible with a similar backtrace.
It seems that vanilla protects access to this_cpu_ptr() via
local_bh_disable(). This does not work the on -RT since we can have
NET_RX and NET_TX running in parallel on the same CPU.
This is brings back the old locks.
|Unable to handle kernel NULL pointer dereference at virtual address 00000010
|PC is at __ip_make_skb+0x198/0x3e8
|[<c04e39d8>] (__ip_make_skb) from [<c04e3ca8>] (ip_push_pending_frames+0x20/0x40)
|[<c04e3ca8>] (ip_push_pending_frames) from [<c04e3ff0>] (ip_send_unicast_reply+0x210/0x22c)
|[<c04e3ff0>] (ip_send_unicast_reply) from [<c04fbb54>] (tcp_v4_send_reset+0x190/0x1c0)
|[<c04fbb54>] (tcp_v4_send_reset) from [<c04fcc1c>] (tcp_v4_do_rcv+0x22c/0x288)
|[<c04fcc1c>] (tcp_v4_do_rcv) from [<c0474364>] (release_sock+0xb4/0x150)
|[<c0474364>] (release_sock) from [<c04ed904>] (tcp_close+0x240/0x454)
|[<c04ed904>] (tcp_close) from [<c0511408>] (inet_release+0x74/0x7c)
|[<c0511408>] (inet_release) from [<c0470728>] (sock_release+0x30/0xb0)
|[<c0470728>] (sock_release) from [<c0470abc>] (sock_close+0x1c/0x24)
|[<c0470abc>] (sock_close) from [<c0115ec4>] (__fput+0xe8/0x20c)
|[<c0115ec4>] (__fput) from [<c0116050>] (____fput+0x18/0x1c)
|[<c0116050>] (____fput) from [<c0058138>] (task_work_run+0xa4/0xb8)
|[<c0058138>] (task_work_run) from [<c0011478>] (do_work_pending+0xd0/0xe4)
|[<c0011478>] (do_work_pending) from [<c000e740>] (work_pending+0xc/0x20)
|Code: e3530001 8a000001 e3a00040 ea000011 (e5973010)
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
203/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net: add a lock around icmp_sk()
Date: Wed, 31 Aug 2016 17:54:09 +0200
It looks like the this_cpu_ptr() access in icmp_sk() is protected with
local_bh_disable(). To avoid missing serialization in -RT I am adding
here a local lock. No crash has been observed, this is just precaution.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
204/258 [
Author: Steven Rostedt
Email: rostedt@goodmis.org
Subject: net: Have __napi_schedule_irqoff() disable interrupts on RT
Date: Tue, 6 Dec 2016 17:50:30 -0500
A customer hit a crash where the napi sd->poll_list became corrupted.
The customer had the bnx2x driver, which does a
__napi_schedule_irqoff() in its interrupt handler. Unfortunately, when
running with CONFIG_PREEMPT_RT_FULL, this interrupt handler is run as a
thread and is preemptable. The call to ____napi_schedule() must be done
with interrupts disabled to protect the per cpu softnet_data's
"poll_list, which is protected by disabling interrupts (disabling
preemption is enough when all interrupts are threaded and
local_bh_disable() can't preempt)."
As bnx2x isn't the only driver that does this, the safest thing to do
is to make __napi_schedule_irqoff() call __napi_schedule() instead when
CONFIG_PREEMPT_RT_FULL is enabled, which will call local_irq_save()
before calling ____napi_schedule().
Cc: stable-rt@vger.kernel.org
Signed-off-by: Steven Rostedt (Red Hat) <rostedt@goodmis.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
205/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: irqwork: push most work into softirq context
Date: Tue, 23 Jun 2015 15:32:51 +0200
Initially we defered all irqwork into softirq because we didn't want the
latency spikes if perf or another user was busy and delayed the RT task.
The NOHZ trigger (nohz_full_kick_work) was the first user that did not work
as expected if it did not run in the original irqwork context so we had to
bring it back somehow for it. push_irq_work_func is the second one that
requires this.
This patch adds the IRQ_WORK_HARD_IRQ which makes sure the callback runs
in raw-irq context. Everything else is defered into softirq context. Without
-RT we have the orignal behavior.
This patch incorporates tglx orignal work which revoked a little bringing back
the arch_irq_work_raise() if possible and a few fixes from Steven Rostedt and
Mike Galbraith,
[bigeasy: melt tglx's irq_work_tick_soft() which splits irq_work_tick() into a
hard and soft variant]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
206/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: printk: Make rt aware
Date: Wed, 19 Sep 2012 14:50:37 +0200
Drop the lock before calling the console driver and do not disable
interrupts while printing to a serial console.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
207/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: kernel/printk: Don't try to print from IRQ/NMI region
Date: Thu, 19 May 2016 17:45:27 +0200
On -RT we try to acquire sleeping locks which might lead to warnings
from lockdep or a warn_on() from spin_try_lock() (which is a rtmutex on
RT).
We don't print in general from a IRQ off region so we should not try
this via console_unblank() / bust_spinlocks() as well.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
208/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: printk: Drop the logbuf_lock more often
Date: Thu, 21 Mar 2013 19:01:05 +0100
The lock is hold with irgs off. The latency drops 500us+ on my arm bugs
with a "full" buffer after executing "dmesg" on the shell.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
209/258 [
Author: Paul Gortmaker
Email: paul.gortmaker@windriver.com
Subject: powerpc: ps3/device-init.c - adapt to completions using swait vs wait
Date: Sun, 31 May 2015 14:44:42 -0400
To fix:
cc1: warnings being treated as errors
arch/powerpc/platforms/ps3/device-init.c: In function 'ps3_notification_read_write':
arch/powerpc/platforms/ps3/device-init.c:755:2: error: passing argument 1 of 'prepare_to_wait_event' from incompatible pointer type
arch/powerpc/platforms/ps3/device-init.c:755:2: error: passing argument 1 of 'abort_exclusive_wait' from incompatible pointer type
arch/powerpc/platforms/ps3/device-init.c:755:2: error: passing argument 1 of 'finish_wait' from incompatible pointer type
arch/powerpc/platforms/ps3/device-init.o] Error 1
make[3]: *** Waiting for unfinished jobs....
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
210/258 [
Author: "Yadi.hu"
Email: yadi.hu@windriver.com
Subject: ARM: enable irq in translation/section permission fault handlers
Date: Wed, 10 Dec 2014 10:32:09 +0800
Probably happens on all ARM, with
CONFIG_PREEMPT_RT_FULL
CONFIG_DEBUG_ATOMIC_SLEEP
This simple program....
int main() {
*((char*)0xc0001000) = 0;
};
[ 512.742724] BUG: sleeping function called from invalid context at kernel/rtmutex.c:658
[ 512.743000] in_atomic(): 0, irqs_disabled(): 128, pid: 994, name: a
[ 512.743217] INFO: lockdep is turned off.
[ 512.743360] irq event stamp: 0
[ 512.743482] hardirqs last enabled at (0): [< (null)>] (null)
[ 512.743714] hardirqs last disabled at (0): [<c0426370>] copy_process+0x3b0/0x11c0
[ 512.744013] softirqs last enabled at (0): [<c0426370>] copy_process+0x3b0/0x11c0
[ 512.744303] softirqs last disabled at (0): [< (null)>] (null)
[ 512.744631] [<c041872c>] (unwind_backtrace+0x0/0x104)
[ 512.745001] [<c09af0c4>] (dump_stack+0x20/0x24)
[ 512.745355] [<c0462490>] (__might_sleep+0x1dc/0x1e0)
[ 512.745717] [<c09b6770>] (rt_spin_lock+0x34/0x6c)
[ 512.746073] [<c0441bf0>] (do_force_sig_info+0x34/0xf0)
[ 512.746457] [<c0442668>] (force_sig_info+0x18/0x1c)
[ 512.746829] [<c041d880>] (__do_user_fault+0x9c/0xd8)
[ 512.747185] [<c041d938>] (do_bad_area+0x7c/0x94)
[ 512.747536] [<c041d990>] (do_sect_fault+0x40/0x48)
[ 512.747898] [<c040841c>] (do_DataAbort+0x40/0xa0)
[ 512.748181] Exception stack(0xecaa1fb0 to 0xecaa1ff8)
Oxc0000000 belongs to kernel address space, user task can not be
allowed to access it. For above condition, correct result is that
test case should receive a “segment fault” and exits but not stacks.
the root cause is commit 02fe2845d6a8 ("avoid enabling interrupts in
prefetch/data abort handlers"),it deletes irq enable block in Data
abort assemble code and move them into page/breakpiont/alignment fault
handlers instead. But author does not enable irq in translation/section
permission fault handlers. ARM disables irq when it enters exception/
interrupt mode, if kernel doesn't enable irq, it would be still disabled
during translation/section permission fault.
We see the above splat because do_force_sig_info is still called with
IRQs off, and that code eventually does a:
spin_lock_irqsave(&t->sighand->siglock, flags);
As this is architecture independent code, and we've not seen any other
need for other arch to have the siglock converted to raw lock, we can
conclude that we should enable irq for ARM translation/section
permission exception.
Signed-off-by: Yadi.hu <yadi.hu@windriver.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
211/258 [
Author: Josh Cartwright
Email: joshc@ni.com
Subject: genirq: update irq_set_irqchip_state documentation
Date: Thu, 11 Feb 2016 11:54:00 -0600
On -rt kernels, the use of migrate_disable()/migrate_enable() is
sufficient to guarantee a task isn't moved to another CPU. Update the
irq_set_irqchip_state() documentation to reflect this.
Signed-off-by: Josh Cartwright <joshc@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
212/258 [
Author: Josh Cartwright
Email: joshc@ni.com
Subject: KVM: arm/arm64: downgrade preempt_disable()d region to migrate_disable()
Date: Thu, 11 Feb 2016 11:54:01 -0600
kvm_arch_vcpu_ioctl_run() disables the use of preemption when updating
the vgic and timer states to prevent the calling task from migrating to
another CPU. It does so to prevent the task from writing to the
incorrect per-CPU GIC distributor registers.
On -rt kernels, it's possible to maintain the same guarantee with the
use of migrate_{disable,enable}(), with the added benefit that the
migrate-disabled region is preemptible. Update
kvm_arch_vcpu_ioctl_run() to do so.
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Reported-by: Manish Jaggi <Manish.Jaggi@caviumnetworks.com>
Signed-off-by: Josh Cartwright <joshc@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
213/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: arm64: fpsimd: use preemp_disable in addition to local_bh_disable()
Date: Wed, 25 Jul 2018 14:02:38 +0200
In v4.16-RT I noticed a number of warnings from task_fpsimd_load(). The
code disables BH and expects that it is not preemptible. On -RT the
task remains preemptible but remains the same CPU. This may corrupt the
content of the SIMD registers if the task is preempted during
saving/restoring those registers.
Add preempt_disable()/enable() to enfore the required semantic on -RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
214/258 [
Author: Jason Wessel
Email: jason.wessel@windriver.com
Subject: kgdb/serial: Short term workaround
Date: Thu, 28 Jul 2011 12:42:23 -0500
On 07/27/2011 04:37 PM, Thomas Gleixner wrote:
> - KGDB (not yet disabled) is reportedly unusable on -rt right now due
> to missing hacks in the console locking which I dropped on purpose.
>
To work around this in the short term you can use this patch, in
addition to the clocksource watchdog patch that Thomas brewed up.
Comments are welcome of course. Ultimately the right solution is to
change separation between the console and the HW to have a polled mode
+ work queue so as not to introduce any kind of latency.
Thanks,
Jason.
]
215/258 [
Author: Clark Williams
Email: williams@redhat.com
Subject: sysfs: Add /sys/kernel/realtime entry
Date: Sat, 30 Jul 2011 21:55:53 -0500
Add a /sys/kernel entry to indicate that the kernel is a
realtime kernel.
Clark says that he needs this for udev rules, udev needs to evaluate
if its a PREEMPT_RT kernel a few thousand times and parsing uname
output is too slow or so.
Are there better solutions? Should it exist and return 0 on !-rt?
Signed-off-by: Clark Williams <williams@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
]
216/258 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: mm, rt: kmap_atomic scheduling
Date: Thu, 28 Jul 2011 10:43:51 +0200
In fact, with migrate_disable() existing one could play games with
kmap_atomic. You could save/restore the kmap_atomic slots on context
switch (if there are any in use of course), this should be esp easy now
that we have a kmap_atomic stack.
Something like the below.. it wants replacing all the preempt_disable()
stuff with pagefault_disable() && migrate_disable() of course, but then
you can flip kmaps around like below.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
[dvhart@linux.intel.com: build fix]
Link: http://lkml.kernel.org/r/1311842631.5890.208.camel@twins
[tglx@linutronix.de: Get rid of the per cpu variable and store the idx
and the pte content right away in the task struct.
Shortens the context switch code. ]
]
217/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: x86/highmem: Add a "already used pte" check
Date: Mon, 11 Mar 2013 17:09:55 +0100
This is a copy from kmap_atomic_prot().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
218/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: arm/highmem: Flush tlb on unmap
Date: Mon, 11 Mar 2013 21:37:27 +0100
The tlb should be flushed on unmap and thus make the mapping entry
invalid. This is only done in the non-debug case which does not look
right.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
219/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: arm: Enable highmem for rt
Date: Wed, 13 Feb 2013 11:03:11 +0100
fixup highmem for ARM.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
220/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: scsi/fcoe: Make RT aware.
Date: Sat, 12 Nov 2011 14:00:48 +0100
Do not disable preemption while taking sleeping locks. All user look safe
for migrate_diable() only.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
221/258 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: x86: crypto: Reduce preempt disabled regions
Date: Mon, 14 Nov 2011 18:19:27 +0100
Restrict the preempt disabled regions to the actual floating point
operations and enable preemption for the administrative actions.
This is necessary on RT to avoid that kfree and other operations are
called with preemption disabled.
Reported-and-tested-by: Carsten Emde <cbe@osadl.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
222/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: crypto: Reduce preempt disabled regions, more algos
Date: Fri, 21 Feb 2014 17:24:04 +0100
Don Estabrook reported
| kernel: WARNING: CPU: 2 PID: 858 at kernel/sched/core.c:2428 migrate_disable+0xed/0x100()
| kernel: WARNING: CPU: 2 PID: 858 at kernel/sched/core.c:2462 migrate_enable+0x17b/0x200()
| kernel: WARNING: CPU: 3 PID: 865 at kernel/sched/core.c:2428 migrate_disable+0xed/0x100()
and his backtrace showed some crypto functions which looked fine.
The problem is the following sequence:
glue_xts_crypt_128bit()
{
blkcipher_walk_virt(); /* normal migrate_disable() */
glue_fpu_begin(); /* get atomic */
while (nbytes) {
__glue_xts_crypt_128bit();
blkcipher_walk_done(); /* with nbytes = 0, migrate_enable()
* while we are atomic */
};
glue_fpu_end() /* no longer atomic */
}
and this is why the counter get out of sync and the warning is printed.
The other problem is that we are non-preemptible between
glue_fpu_begin() and glue_fpu_end() and the latency grows. To fix this,
I shorten the FPU off region and ensure blkcipher_walk_done() is called
with preemption enabled. This might hurt the performance because we now
enable/disable the FPU state more often but we gain lower latency and
the bug is gone.
Reported-by: Don Estabrook <don.estabrook@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
223/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: crypto: limit more FPU-enabled sections
Date: Thu, 30 Nov 2017 13:40:10 +0100
Those crypto drivers use SSE/AVX/… for their crypto work and in order to
do so in kernel they need to enable the "FPU" in kernel mode which
disables preemption.
There are two problems with the way they are used:
- the while loop which processes X bytes may create latency spikes and
should be avoided or limited.
- the cipher-walk-next part may allocate/free memory and may use
kmap_atomic().
The whole kernel_fpu_begin()/end() processing isn't probably that cheap.
It most likely makes sense to process as much of those as possible in one
go. The new *_fpu_sched_rt() schedules only if a RT task is pending.
Probably we should measure the performance those ciphers in pure SW
mode and with this optimisations to see if it makes sense to keep them
for RT.
This kernel_fpu_resched() makes the code more preemptible which might hurt
performance.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
224/258 [
Author: Mike Galbraith
Email: efault@gmx.de
Subject: crypto: scompress - serialize RT percpu scratch buffer access with a local lock
Date: Wed, 11 Jul 2018 17:14:47 +0200
| BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:974
| in_atomic(): 1, irqs_disabled(): 0, pid: 1401, name: cryptomgr_test
| Preemption disabled at:
| [<ffff00000849941c>] scomp_acomp_comp_decomp+0x34/0x1a0
| CPU: 21 PID: 1401 Comm: cryptomgr_test Tainted: G W 4.16.18-rt9-rt #1
| Hardware name: www.cavium.com crb-1s/crb-1s, BIOS 0.3 Apr 25 2017
| Call trace:
| dump_backtrace+0x0/0x1c8
| show_stack+0x24/0x30
| dump_stack+0xac/0xe8
| ___might_sleep+0x124/0x188
| rt_spin_lock+0x40/0x88
| zip_load_instr+0x44/0x170 [thunderx_zip]
| zip_deflate+0x184/0x378 [thunderx_zip]
| zip_compress+0xb0/0x130 [thunderx_zip]
| zip_scomp_compress+0x48/0x60 [thunderx_zip]
| scomp_acomp_comp_decomp+0xd8/0x1a0
| scomp_acomp_compress+0x24/0x30
| test_acomp+0x15c/0x558
| alg_test_comp+0xc0/0x128
| alg_test.part.6+0x120/0x2c0
| alg_test+0x6c/0xa0
| cryptomgr_test+0x50/0x58
| kthread+0x134/0x138
| ret_from_fork+0x10/0x18
Mainline disables preemption to serialize percpu scratch buffer access,
causing the splat above. Serialize with a local lock for RT instead.
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
225/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: crypto: cryptd - add a lock instead preempt_disable/local_bh_disable
Date: Thu, 26 Jul 2018 18:52:00 +0200
cryptd has a per-CPU lock which protected with local_bh_disable() and
preempt_disable().
Add an explicit spin_lock to make the locking context more obvious and
visible to lockdep. Since it is a per-CPU lock, there should be no lock
contention on the actual spinlock.
There is a small race-window where we could be migrated to another CPU
after the cpu_queue has been obtain. This is not a problem because the
actual ressource is protected by the spinlock.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
226/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: panic: skip get_random_bytes for RT_FULL in init_oops_id
Date: Tue, 14 Jul 2015 14:26:34 +0200
Disable on -RT. If this is invoked from irq-context we will have problems
to acquire the sleeping lock.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
227/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: x86: stackprotector: Avoid random pool on rt
Date: Thu, 16 Dec 2010 14:25:18 +0100
CPU bringup calls into the random pool to initialize the stack
canary. During boot that works nicely even on RT as the might sleep
checks are disabled. During CPU hotplug the might sleep checks
trigger. Making the locks in random raw is a major PITA, so avoid the
call on RT is the only sensible solution. This is basically the same
randomness which we get during boot where the random pool has no
entropy and we rely on the TSC randomnness.
Reported-by: Carsten Emde <carsten.emde@osadl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
228/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: random: Make it work on rt
Date: Tue, 21 Aug 2012 20:38:50 +0200
Delegate the random insertion to the forced threaded interrupt
handler. Store the return IP of the hard interrupt handler in the irq
descriptor and feed it into the random generator as a source of
entropy.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
229/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: random: avoid preempt_disable()ed section
Date: Fri, 12 May 2017 15:46:17 +0200
extract_crng() will use sleeping locks while in a preempt_disable()
section due to get_cpu_var().
Work around it with local_locks.
Cc: stable-rt@vger.kernel.org # where it applies to
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
230/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: cpu/hotplug: Implement CPU pinning
Date: Wed, 19 Jul 2017 17:31:20 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
231/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: hotplug: duct-tape RT-rwlock usage for non-RT
Date: Fri, 4 Aug 2017 18:31:00 +0200
This type is only available on -RT. We need to craft something for
non-RT. Since the only migrate_disable() user is -RT only, there is no
damage.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
232/258 [
Author: Priyanka Jain
Email: Priyanka.Jain@freescale.com
Subject: net: Remove preemption disabling in netif_rx()
Date: Thu, 17 May 2012 09:35:11 +0530
1)enqueue_to_backlog() (called from netif_rx) should be
bind to a particluar CPU. This can be achieved by
disabling migration. No need to disable preemption
2)Fixes crash "BUG: scheduling while atomic: ksoftirqd"
in case of RT.
If preemption is disabled, enqueue_to_backog() is called
in atomic context. And if backlog exceeds its count,
kfree_skb() is called. But in RT, kfree_skb() might
gets scheduled out, so it expects non atomic context.
3)When CONFIG_PREEMPT_RT_FULL is not defined,
migrate_enable(), migrate_disable() maps to
preempt_enable() and preempt_disable(), so no
change in functionality in case of non-RT.
-Replace preempt_enable(), preempt_disable() with
migrate_enable(), migrate_disable() respectively
-Replace get_cpu(), put_cpu() with get_cpu_light(),
put_cpu_light() respectively
Signed-off-by: Priyanka Jain <Priyanka.Jain@freescale.com>
Acked-by: Rajan Srivastava <Rajan.Srivastava@freescale.com>
Cc: <rostedt@goodmis.orgn>
Link: http://lkml.kernel.org/r/1337227511-2271-1-git-send-email-Priyanka.Jain@freescale.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
233/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: net: Another local_irq_disable/kmalloc headache
Date: Wed, 26 Sep 2012 16:21:08 +0200
Replace it by a local lock. Though that's pretty inefficient :(
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
234/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net/core: protect users of napi_alloc_cache against reentrance
Date: Fri, 15 Jan 2016 16:33:34 +0100
On -RT the code running in BH can not be moved to another CPU so CPU
local variable remain local. However the code can be preempted
and another task may enter BH accessing the same CPU using the same
napi_alloc_cache variable.
This patch ensures that each user of napi_alloc_cache uses a local lock.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
235/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: net: netfilter: Serialize xt_write_recseq sections on RT
Date: Sun, 28 Oct 2012 11:18:08 +0100
The netfilter code relies only on the implicit semantics of
local_bh_disable() for serializing wt_write_recseq sections. RT breaks
that and needs explicit serialization here.
Reported-by: Peter LaDow <petela@gocougs.wsu.edu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
236/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net: Add a mutex around devnet_rename_seq
Date: Wed, 20 Mar 2013 18:06:20 +0100
On RT write_seqcount_begin() disables preemption and device_rename()
allocates memory with GFP_KERNEL and grabs later the sysfs_mutex
mutex. Serialize with a mutex and add use the non preemption disabling
__write_seqcount_begin().
To avoid writer starvation, let the reader grab the mutex and release
it when it detects a writer in progress. This keeps the normal case
(no reader on the fly) fast.
[ tglx: Instead of replacing the seqcount by a mutex, add the mutex ]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
237/258 [
Author: Yong Zhang
Email: yong.zhang@windriver.com
Subject: lockdep: selftest: Only do hardirq context test for raw spinlock
Date: Mon, 16 Apr 2012 15:01:56 +0800
On -rt there is no softirq context any more and rwlock is sleepable,
disable softirq context test and rwlock+irq test.
Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Cc: Yong Zhang <yong.zhang@windriver.com>
Link: http://lkml.kernel.org/r/1334559716-18447-3-git-send-email-yong.zhang0@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
238/258 [
Author: Josh Cartwright
Email: josh.cartwright@ni.com
Subject: lockdep: selftest: fix warnings due to missing PREEMPT_RT conditionals
Date: Wed, 28 Jan 2015 13:08:45 -0600
"lockdep: Selftest: Only do hardirq context test for raw spinlock"
disabled the execution of certain tests with PREEMPT_RT_FULL, but did
not prevent the tests from still being defined. This leads to warnings
like:
./linux/lib/locking-selftest.c:574:1: warning: 'irqsafe1_hard_rlock_12' defined but not used [-Wunused-function]
./linux/lib/locking-selftest.c:574:1: warning: 'irqsafe1_hard_rlock_21' defined but not used [-Wunused-function]
./linux/lib/locking-selftest.c:577:1: warning: 'irqsafe1_hard_wlock_12' defined but not used [-Wunused-function]
./linux/lib/locking-selftest.c:577:1: warning: 'irqsafe1_hard_wlock_21' defined but not used [-Wunused-function]
./linux/lib/locking-selftest.c:580:1: warning: 'irqsafe1_soft_spin_12' defined but not used [-Wunused-function]
...
Fixed by wrapping the test definitions in #ifndef CONFIG_PREEMPT_RT_FULL
conditionals.
Signed-off-by: Josh Cartwright <josh.cartwright@ni.com>
Signed-off-by: Xander Huff <xander.huff@ni.com>
Acked-by: Gratian Crisan <gratian.crisan@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
239/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Add support for lazy preemption
Date: Fri, 26 Oct 2012 18:50:54 +0100
It has become an obsession to mitigate the determinism vs. throughput
loss of RT. Looking at the mainline semantics of preemption points
gives a hint why RT sucks throughput wise for ordinary SCHED_OTHER
tasks. One major issue is the wakeup of tasks which are right away
preempting the waking task while the waking task holds a lock on which
the woken task will block right after having preempted the wakee. In
mainline this is prevented due to the implicit preemption disable of
spin/rw_lock held regions. On RT this is not possible due to the fully
preemptible nature of sleeping spinlocks.
Though for a SCHED_OTHER task preempting another SCHED_OTHER task this
is really not a correctness issue. RT folks are concerned about
SCHED_FIFO/RR tasks preemption and not about the purely fairness
driven SCHED_OTHER preemption latencies.
So I introduced a lazy preemption mechanism which only applies to
SCHED_OTHER tasks preempting another SCHED_OTHER task. Aside of the
existing preempt_count each tasks sports now a preempt_lazy_count
which is manipulated on lock acquiry and release. This is slightly
incorrect as for lazyness reasons I coupled this on
migrate_disable/enable so some other mechanisms get the same treatment
(e.g. get_cpu_light).
Now on the scheduler side instead of setting NEED_RESCHED this sets
NEED_RESCHED_LAZY in case of a SCHED_OTHER/SCHED_OTHER preemption and
therefor allows to exit the waking task the lock held region before
the woken task preempts. That also works better for cross CPU wakeups
as the other side can stay in the adaptive spinning loop.
For RT class preemption there is no change. This simply sets
NEED_RESCHED and forgoes the lazy preemption counter.
Initial test do not expose any observable latency increasement, but
history shows that I've been proven wrong before :)
The lazy preemption mode is per default on, but with
CONFIG_SCHED_DEBUG enabled it can be disabled via:
# echo NO_PREEMPT_LAZY >/sys/kernel/debug/sched_features
and reenabled via
# echo PREEMPT_LAZY >/sys/kernel/debug/sched_features
The test results so far are very machine and workload dependent, but
there is a clear trend that it enhances the non RT workload
performance.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
240/258 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: ftrace: Fix trace header alignment
Date: Sun, 16 Oct 2016 05:08:30 +0200
Line up helper arrows to the right column.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
[bigeasy: fixup function tracer header]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
241/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: x86: Support for lazy preemption
Date: Thu, 1 Nov 2012 11:03:47 +0100
Implement the x86 pieces for lazy preempt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
242/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: arm: Add support for lazy preemption
Date: Wed, 31 Oct 2012 12:04:11 +0100
Implement the arm pieces for lazy preempt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
243/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: powerpc: Add support for lazy preemption
Date: Thu, 1 Nov 2012 10:14:11 +0100
Implement the powerpc pieces for lazy preempt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
244/258 [
Author: Anders Roxell
Email: anders.roxell@linaro.org
Subject: arch/arm64: Add lazy preempt support
Date: Thu, 14 May 2015 17:52:17 +0200
arm64 is missing support for PREEMPT_RT. The main feature which is
lacking is support for lazy preemption. The arch-specific entry code,
thread information structure definitions, and associated data tables
have to be extended to provide this support. Then the Kconfig file has
to be extended to indicate the support is available, and also to
indicate that support for full RT preemption is now available.
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
]
245/258 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: connector/cn_proc: Protect send_msg() with a local lock on RT
Date: Sun, 16 Oct 2016 05:11:54 +0200
|BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:931
|in_atomic(): 1, irqs_disabled(): 0, pid: 31807, name: sleep
|Preemption disabled at:[<ffffffff8148019b>] proc_exit_connector+0xbb/0x140
|
|CPU: 4 PID: 31807 Comm: sleep Tainted: G W E 4.8.0-rt11-rt #106
|Call Trace:
| [<ffffffff813436cd>] dump_stack+0x65/0x88
| [<ffffffff8109c425>] ___might_sleep+0xf5/0x180
| [<ffffffff816406b0>] __rt_spin_lock+0x20/0x50
| [<ffffffff81640978>] rt_read_lock+0x28/0x30
| [<ffffffff8156e209>] netlink_broadcast_filtered+0x49/0x3f0
| [<ffffffff81522621>] ? __kmalloc_reserve.isra.33+0x31/0x90
| [<ffffffff8156e5cd>] netlink_broadcast+0x1d/0x20
| [<ffffffff8147f57a>] cn_netlink_send_mult+0x19a/0x1f0
| [<ffffffff8147f5eb>] cn_netlink_send+0x1b/0x20
| [<ffffffff814801d8>] proc_exit_connector+0xf8/0x140
| [<ffffffff81077f71>] do_exit+0x5d1/0xba0
| [<ffffffff810785cc>] do_group_exit+0x4c/0xc0
| [<ffffffff81078654>] SyS_exit_group+0x14/0x20
| [<ffffffff81640a72>] entry_SYSCALL_64_fastpath+0x1a/0xa4
Since ab8ed951080e ("connector: fix out-of-order cn_proc netlink message
delivery") which is v4.7-rc6.
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
246/258 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: drivers/block/zram: Replace bit spinlocks with rtmutex for -rt
Date: Thu, 31 Mar 2016 04:08:28 +0200
They're nondeterministic, and lead to ___might_sleep() splats in -rt.
OTOH, they're a lot less wasteful than an rtmutex per page.
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
247/258 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: drivers/zram: Don't disable preemption in zcomp_stream_get/put()
Date: Thu, 20 Oct 2016 11:15:22 +0200
In v4.7, the driver switched to percpu compression streams, disabling
preemption via get/put_cpu_ptr(). Use a per-zcomp_strm lock here. We
also have to fix an lock order issue in zram_decompress_page() such
that zs_map_object() nests inside of zcomp_stream_put() as it does in
zram_bvec_write().
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
[bigeasy: get_locked_var() -> per zcomp_strm lock]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
248/258 [
Author: Mike Galbraith
Email: efault@gmx.de
Subject: drivers/zram: fix zcomp_stream_get() smp_processor_id() use in preemptible code
Date: Wed, 23 Aug 2017 11:57:29 +0200
Use get_local_ptr() instead this_cpu_ptr() to avoid a warning regarding
smp_processor_id() in preemptible code.
raw_cpu_ptr() would be fine, too because the per-CPU data structure is
protected with a spin lock so it does not matter much if we take the
other one.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
249/258 [
Author: Haris Okanovic
Email: haris.okanovic@ni.com
Subject: tpm_tis: fix stall after iowrite*()s
Date: Tue, 15 Aug 2017 15:13:08 -0500
ioread8() operations to TPM MMIO addresses can stall the cpu when
immediately following a sequence of iowrite*()'s to the same region.
For example, cyclitest measures ~400us latency spikes when a non-RT
usermode application communicates with an SPI-based TPM chip (Intel Atom
E3940 system, PREEMPT_RT_FULL kernel). The spikes are caused by a
stalling ioread8() operation following a sequence of 30+ iowrite8()s to
the same address. I believe this happens because the write sequence is
buffered (in cpu or somewhere along the bus), and gets flushed on the
first LOAD instruction (ioread*()) that follows.
The enclosed change appears to fix this issue: read the TPM chip's
access register (status code) after every iowrite*() operation to
amortize the cost of flushing data to chip across multiple instructions.
Signed-off-by: Haris Okanovic <haris.okanovic@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
250/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: pci/switchtec: Don't use completion's wait queue
Date: Wed, 4 Oct 2017 10:24:23 +0200
The poll callback is using completion's wait_queue_head_t member and
puts it in poll_wait() so the poll() caller gets a wakeup after command
completed. This does not work on RT because we don't have a
wait_queue_head_t in our completion implementation. Nobody in tree does
like that in tree so this is the only driver that breaks.
Instead of using the completion here is waitqueue with a status flag as
suggested by Logan.
I don't have the HW so I have no idea if it works as expected, so please
test it.
Cc: Kurt Schwemmer <kurt.schwemmer@microsemi.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
251/258 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: drm,radeon,i915: Use preempt_disable/enable_rt() where recommended
Date: Sat, 27 Feb 2016 08:09:11 +0100
DRM folks identified the spots, so use them.
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-rt-users <linux-rt-users@vger.kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
252/258 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: drm,i915: Use local_lock/unlock_irq() in intel_pipe_update_start/end()
Date: Sat, 27 Feb 2016 09:01:42 +0100
[ 8.014039] BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:918
[ 8.014041] in_atomic(): 0, irqs_disabled(): 1, pid: 78, name: kworker/u4:4
[ 8.014045] CPU: 1 PID: 78 Comm: kworker/u4:4 Not tainted 4.1.7-rt7 #5
[ 8.014055] Workqueue: events_unbound async_run_entry_fn
[ 8.014059] 0000000000000000 ffff880037153748 ffffffff815f32c9 0000000000000002
[ 8.014063] ffff88013a50e380 ffff880037153768 ffffffff815ef075 ffff8800372c06c8
[ 8.014066] ffff8800372c06c8 ffff880037153778 ffffffff8107c0b3 ffff880037153798
[ 8.014067] Call Trace:
[ 8.014074] [<ffffffff815f32c9>] dump_stack+0x4a/0x61
[ 8.014078] [<ffffffff815ef075>] ___might_sleep.part.93+0xe9/0xee
[ 8.014082] [<ffffffff8107c0b3>] ___might_sleep+0x53/0x80
[ 8.014086] [<ffffffff815f9064>] rt_spin_lock+0x24/0x50
[ 8.014090] [<ffffffff8109368b>] prepare_to_wait+0x2b/0xa0
[ 8.014152] [<ffffffffa016c04c>] intel_pipe_update_start+0x17c/0x300 [i915]
[ 8.014156] [<ffffffff81093b40>] ? prepare_to_wait_event+0x120/0x120
[ 8.014201] [<ffffffffa0158f36>] intel_begin_crtc_commit+0x166/0x1e0 [i915]
[ 8.014215] [<ffffffffa00c806d>] drm_atomic_helper_commit_planes+0x5d/0x1a0 [drm_kms_helper]
[ 8.014260] [<ffffffffa0171e9b>] intel_atomic_commit+0xab/0xf0 [i915]
[ 8.014288] [<ffffffffa00654c7>] drm_atomic_commit+0x37/0x60 [drm]
[ 8.014298] [<ffffffffa00c6fcd>] drm_atomic_helper_plane_set_property+0x8d/0xd0 [drm_kms_helper]
[ 8.014301] [<ffffffff815f77d9>] ? __ww_mutex_lock+0x39/0x40
[ 8.014319] [<ffffffffa0053b3d>] drm_mode_plane_set_obj_prop+0x2d/0x90 [drm]
[ 8.014328] [<ffffffffa00c8edb>] restore_fbdev_mode+0x6b/0xf0 [drm_kms_helper]
[ 8.014337] [<ffffffffa00cae49>] drm_fb_helper_restore_fbdev_mode_unlocked+0x29/0x80 [drm_kms_helper]
[ 8.014346] [<ffffffffa00caec2>] drm_fb_helper_set_par+0x22/0x50 [drm_kms_helper]
[ 8.014390] [<ffffffffa016dfba>] intel_fbdev_set_par+0x1a/0x60 [i915]
[ 8.014394] [<ffffffff81327dc4>] fbcon_init+0x4f4/0x580
[ 8.014398] [<ffffffff8139ef4c>] visual_init+0xbc/0x120
[ 8.014401] [<ffffffff813a1623>] do_bind_con_driver+0x163/0x330
[ 8.014405] [<ffffffff813a1b2c>] do_take_over_console+0x11c/0x1c0
[ 8.014408] [<ffffffff813236e3>] do_fbcon_takeover+0x63/0xd0
[ 8.014410] [<ffffffff81328965>] fbcon_event_notify+0x785/0x8d0
[ 8.014413] [<ffffffff8107c12d>] ? __might_sleep+0x4d/0x90
[ 8.014416] [<ffffffff810775fe>] notifier_call_chain+0x4e/0x80
[ 8.014419] [<ffffffff810779cd>] __blocking_notifier_call_chain+0x4d/0x70
[ 8.014422] [<ffffffff81077a06>] blocking_notifier_call_chain+0x16/0x20
[ 8.014425] [<ffffffff8132b48b>] fb_notifier_call_chain+0x1b/0x20
[ 8.014428] [<ffffffff8132d8fa>] register_framebuffer+0x21a/0x350
[ 8.014439] [<ffffffffa00cb164>] drm_fb_helper_initial_config+0x274/0x3e0 [drm_kms_helper]
[ 8.014483] [<ffffffffa016f1cb>] intel_fbdev_initial_config+0x1b/0x20 [i915]
[ 8.014486] [<ffffffff8107912c>] async_run_entry_fn+0x4c/0x160
[ 8.014490] [<ffffffff81070ffa>] process_one_work+0x14a/0x470
[ 8.014493] [<ffffffff81071489>] worker_thread+0x169/0x4c0
[ 8.014496] [<ffffffff81071320>] ? process_one_work+0x470/0x470
[ 8.014499] [<ffffffff81076606>] kthread+0xc6/0xe0
[ 8.014502] [<ffffffff81070000>] ? queue_work_on+0x80/0x110
[ 8.014506] [<ffffffff81076540>] ? kthread_worker_fn+0x1c0/0x1c0
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-rt-users <linux-rt-users@vger.kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
253/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: cgroups: use simple wait in css_release()
Date: Fri, 13 Feb 2015 15:52:24 +0100
To avoid:
|BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:914
|in_atomic(): 1, irqs_disabled(): 0, pid: 92, name: rcuc/11
|2 locks held by rcuc/11/92:
| #0: (rcu_callback){......}, at: [<ffffffff810e037e>] rcu_cpu_kthread+0x3de/0x940
| #1: (rcu_read_lock_sched){......}, at: [<ffffffff81328390>] percpu_ref_call_confirm_rcu+0x0/0xd0
|Preemption disabled at:[<ffffffff813284e2>] percpu_ref_switch_to_atomic_rcu+0x82/0xc0
|CPU: 11 PID: 92 Comm: rcuc/11 Not tainted 3.18.7-rt0+ #1
| ffff8802398cdf80 ffff880235f0bc28 ffffffff815b3a12 0000000000000000
| 0000000000000000 ffff880235f0bc48 ffffffff8109aa16 0000000000000000
| ffff8802398cdf80 ffff880235f0bc78 ffffffff815b8dd4 000000000000df80
|Call Trace:
| [<ffffffff815b3a12>] dump_stack+0x4f/0x7c
| [<ffffffff8109aa16>] __might_sleep+0x116/0x190
| [<ffffffff815b8dd4>] rt_spin_lock+0x24/0x60
| [<ffffffff8108d2cd>] queue_work_on+0x6d/0x1d0
| [<ffffffff8110c881>] css_release+0x81/0x90
| [<ffffffff8132844e>] percpu_ref_call_confirm_rcu+0xbe/0xd0
| [<ffffffff813284e2>] percpu_ref_switch_to_atomic_rcu+0x82/0xc0
| [<ffffffff810e03e5>] rcu_cpu_kthread+0x445/0x940
| [<ffffffff81098a2d>] smpboot_thread_fn+0x18d/0x2d0
| [<ffffffff810948d8>] kthread+0xe8/0x100
| [<ffffffff815b9c3c>] ret_from_fork+0x7c/0xb0
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
254/258 [
Author: Mike Galbraith
Email: efault@gmx.de
Subject: cpuset: Convert callback_lock to raw_spinlock_t
Date: Sun, 8 Jan 2017 09:32:25 +0100
The two commits below add up to a cpuset might_sleep() splat for RT:
8447a0fee974 cpuset: convert callback_mutex to a spinlock
344736f29b35 cpuset: simplify cpuset_node_allowed API
BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:995
in_atomic(): 0, irqs_disabled(): 1, pid: 11718, name: cset
CPU: 135 PID: 11718 Comm: cset Tainted: G E 4.10.0-rt1-rt #4
Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0056.R01.1409242327 09/24/2014
Call Trace:
? dump_stack+0x5c/0x81
? ___might_sleep+0xf4/0x170
? rt_spin_lock+0x1c/0x50
? __cpuset_node_allowed+0x66/0xc0
? ___slab_alloc+0x390/0x570 <disables IRQs>
? anon_vma_fork+0x8f/0x140
? copy_page_range+0x6cf/0xb00
? anon_vma_fork+0x8f/0x140
? __slab_alloc.isra.74+0x5a/0x81
? anon_vma_fork+0x8f/0x140
? kmem_cache_alloc+0x1b5/0x1f0
? anon_vma_fork+0x8f/0x140
? copy_process.part.35+0x1670/0x1ee0
? _do_fork+0xdd/0x3f0
? _do_fork+0xdd/0x3f0
? do_syscall_64+0x61/0x170
? entry_SYSCALL64_slow_path+0x25/0x25
The later ensured that a NUMA box WILL take callback_lock in atomic
context by removing the allocator and reclaim path __GFP_HARDWALL
usage which prevented such contexts from taking callback_mutex.
One option would be to reinstate __GFP_HARDWALL protections for
RT, however, as the 8447a0fee974 changelog states:
The callback_mutex is only used to synchronize reads/updates of cpusets'
flags and cpu/node masks. These operations should always proceed fast so
there's no reason why we can't use a spinlock instead of the mutex.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
255/258 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: apparmor: use a locallock instead preempt_disable()
Date: Wed, 11 Oct 2017 17:43:49 +0200
get_buffers() disables preemption which acts as a lock for the per-CPU
variable. Since we can't disable preemption here on RT, a local_lock is
lock is used in order to remain on the same CPU and not to have more
than one user within the critical section.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
256/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: workqueue: Prevent deadlock/stall on RT
Date: Fri, 27 Jun 2014 16:24:52 +0200
Austin reported a XFS deadlock/stall on RT where scheduled work gets
never exececuted and tasks are waiting for each other for ever.
The underlying problem is the modification of the RT code to the
handling of workers which are about to go to sleep. In mainline a
worker thread which goes to sleep wakes an idle worker if there is
more work to do. This happens from the guts of the schedule()
function. On RT this must be outside and the accessed data structures
are not protected against scheduling due to the spinlock to rtmutex
conversion. So the naive solution to this was to move the code outside
of the scheduler and protect the data structures by the pool
lock. That approach turned out to be a little naive as we cannot call
into that code when the thread blocks on a lock, as it is not allowed
to block on two locks in parallel. So we dont call into the worker
wakeup magic when the worker is blocked on a lock, which causes the
deadlock/stall observed by Austin and Mike.
Looking deeper into that worker code it turns out that the only
relevant data structure which needs to be protected is the list of
idle workers which can be woken up.
So the solution is to protect the list manipulation operations with
preempt_enable/disable pairs on RT and call unconditionally into the
worker code even when the worker is blocked on a lock. The preemption
protection is safe as there is nothing which can fiddle with the list
outside of thread context.
Reported-and_tested-by: Austin Schuh <austin@peloton-tech.com>
Reported-and_tested-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://vger.kernel.org/r/alpine.DEB.2.10.1406271249510.5170@nanos
Cc: Richard Weinberger <richard.weinberger@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
]
257/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: signals: Allow rt tasks to cache one sigqueue struct
Date: Fri, 3 Jul 2009 08:44:56 -0500
To avoid allocation allow rt tasks to cache one sigqueue struct in
task struct.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
258/258 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: Add localversion for -RT release
Date: Fri, 8 Jul 2011 20:25:16 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
Signed-off-by: Bruce Ashfield <bruce.ashfield@windriver.com>
|
|
The upstream -rt project uses tools which don't default to the "normal"
file names created by "git send-email". Re-exporting the patches as
was done here previously does solve that but it also means that we
need to manually track mappings between the old names and and the names
used here when handling updates. So just re-import the patches as-is
from 4.9-rt1 and then we'll go from there with individual refreshes,
additions and drops all as separate commits.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
1/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: timer: make the base lock raw
Date: Wed, 13 Jul 2016 18:22:23 +0200
The part where the base lock is held got more predictable / shorter after the
timer rework. One reason is the lack of re-cascading.
That means the lock can be made raw and held in IRQ context.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
2/286 [
Author: Josh Cartwright
Email: joshc@ni.com
Subject: sc16is7xx: Drop bogus use of IRQF_ONESHOT
Date: Thu, 18 Feb 2016 11:26:12 -0600
The use of IRQF_ONESHOT when registering an interrupt handler with
request_irq() is non-sensical.
Not only that, it also prevents the handler from being threaded when it
otherwise should be w/ IRQ_FORCED_THREADING is enabled. This causes the
following deadlock observed by Sean Nyekjaer on -rt:
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
[..]
rt_spin_lock_slowlock from (queue_kthread_work+0x18/0x74)
queue_kthread_work) from (sc16is7xx_irq+0x10/0x18 [sc16is7xx])
sc16is7xx_irq [sc16is7xx]) from (handle_irq_event_percpu+0x70/0x158)
handle_irq_event_percpu) from (handle_irq_event+0x68/0xa8)
handle_irq_event) from (handle_level_irq+0x10c/0x184)
handle_level_irq) from (generic_handle_irq+0x2c/0x3c)
generic_handle_irq) from (mxc_gpio_irq_handler+0x3c/0x108)
mxc_gpio_irq_handler) from (mx3_gpio_irq_handler+0x80/0xcc)
mx3_gpio_irq_handler) from (generic_handle_irq+0x2c/0x3c)
generic_handle_irq) from (__handle_domain_irq+0x7c/0xe8)
__handle_domain_irq) from (gic_handle_irq+0x24/0x5c)
gic_handle_irq) from (__irq_svc+0x40/0x88)
(__irq_svc) from (rt_spin_unlock+0x1c/0x68)
(rt_spin_unlock) from (kthread_worker_fn+0x104/0x17c)
(kthread_worker_fn) from (kthread+0xd0/0xe8)
(kthread) from (ret_from_fork+0x14/0x2c)
Fixes: 9e6f4ca3e567 ("sc16is7xx: use kthread_worker for tx_work and irq")
Reported-by: Sean Nyekjaer <sean.nyekjaer@prevas.dk>
Signed-off-by: Josh Cartwright <joshc@ni.com>
Cc: linux-rt-users@vger.kernel.org
Cc: Jakub Kicinski <moorray3@wp.pl>
Cc: stable@vger.kernel.org
Cc: linux-serial@vger.kernel.org
Link: http://lkml.kernel.org/r/1455816372-13989-1-git-send-email-joshc@ni.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
3/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: arm: at91: do not disable/enable clocks in a row
Date: Wed, 9 Mar 2016 10:51:06 +0100
Currently the driver will disable the clock and enable it one line later
if it is switching from periodic mode into one shot.
This can be avoided and causes a needless warning on -RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
4/286 [
Author: Grygorii Strashko
Email: grygorii.strashko@ti.com
Subject: ARM: smp: Move clear_tasks_mm_cpumask() call to __cpu_die()
Date: Fri, 11 Sep 2015 21:21:23 +0300
When running with the RT-kernel (4.1.5-rt5) on TI OMAP dra7-evm and trying
to do Suspend to RAM, the following backtrace occurs:
Disabling non-boot CPUs ...
PM: noirq suspend of devices complete after 7.295 msecs
Disabling non-boot CPUs ...
BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:917
in_atomic(): 1, irqs_disabled(): 128, pid: 18, name: migration/1
INFO: lockdep is turned off.
irq event stamp: 122
hardirqs last enabled at (121): [<c06ac0ac>] _raw_spin_unlock_irqrestore+0x88/0x90
hardirqs last disabled at (122): [<c06abed0>] _raw_spin_lock_irq+0x28/0x5c
softirqs last enabled at (0): [<c003d294>] copy_process.part.52+0x410/0x19d8
softirqs last disabled at (0): [< (null)>] (null)
Preemption disabled at:[< (null)>] (null)
CPU: 1 PID: 18 Comm: migration/1 Tainted: G W 4.1.4-rt3-01046-g96ac8da #204
Hardware name: Generic DRA74X (Flattened Device Tree)
[<c0019134>] (unwind_backtrace) from [<c0014774>] (show_stack+0x20/0x24)
[<c0014774>] (show_stack) from [<c06a70f4>] (dump_stack+0x88/0xdc)
[<c06a70f4>] (dump_stack) from [<c006cab8>] (___might_sleep+0x198/0x2a8)
[<c006cab8>] (___might_sleep) from [<c06ac4dc>] (rt_spin_lock+0x30/0x70)
[<c06ac4dc>] (rt_spin_lock) from [<c013f790>] (find_lock_task_mm+0x9c/0x174)
[<c013f790>] (find_lock_task_mm) from [<c00409ac>] (clear_tasks_mm_cpumask+0xb4/0x1ac)
[<c00409ac>] (clear_tasks_mm_cpumask) from [<c00166a4>] (__cpu_disable+0x98/0xbc)
[<c00166a4>] (__cpu_disable) from [<c06a2e8c>] (take_cpu_down+0x1c/0x50)
[<c06a2e8c>] (take_cpu_down) from [<c00f2600>] (multi_cpu_stop+0x11c/0x158)
[<c00f2600>] (multi_cpu_stop) from [<c00f2a9c>] (cpu_stopper_thread+0xc4/0x184)
[<c00f2a9c>] (cpu_stopper_thread) from [<c0069058>] (smpboot_thread_fn+0x18c/0x324)
[<c0069058>] (smpboot_thread_fn) from [<c00649c4>] (kthread+0xe8/0x104)
[<c00649c4>] (kthread) from [<c0010058>] (ret_from_fork+0x14/0x3c)
CPU1: shutdown
PM: Calling sched_clock_suspend+0x0/0x40
PM: Calling timekeeping_suspend+0x0/0x2e0
PM: Calling irq_gc_suspend+0x0/0x68
PM: Calling fw_suspend+0x0/0x2c
PM: Calling cpu_pm_suspend+0x0/0x28
Also, sometimes system stucks right after displaying "Disabling non-boot
CPUs ...". The root cause of above backtrace is task_lock() which takes
a sleeping lock on -RT.
To fix the issue, move clear_tasks_mm_cpumask() call from __cpu_disable()
to __cpu_die() which is called on the thread which is asking for a target
CPU to be shutdown. In addition, this change restores CPUhotplug functionality
on TI OMAP dra7-evm and CPU1 can be unplugged/plugged many times.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: <linux-arm-kernel@lists.infradead.org>
Cc: Sekhar Nori <nsekhar@ti.com>
Cc: Austin Schuh <austin@peloton-tech.com>
Cc: <philipp@peloton-tech.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: <bigeasy@linutronix.de>
Cc: stable-rt@vger.kernel.org
Link: http://lkml.kernel.org/r/1441995683-30817-1-git-send-email-grygorii.strashko@ti.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
5/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: Handle non enqueued waiters gracefully
Date: Fri, 6 Nov 2015 18:51:03 +0100
Yimin debugged that in case of a PI wakeup in progress when
rt_mutex_start_proxy_lock() calls task_blocks_on_rt_mutex() the latter
returns -EAGAIN and in consequence the remove_waiter() call runs into
a BUG_ON() because there is nothing to remove.
Guard it with rt_mutex_has_waiters(). This is a quick fix which is
easy to backport. The proper fix is to have a central check in
remove_waiter() so we can call it unconditionally.
Reported-and-debugged-by: Yimin Deng <yimin11.deng@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable-rt@vger.kernel.org
]
6/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: fs/dcache: include wait.h
Date: Wed, 14 Sep 2016 11:55:23 +0200
Since commit d9171b934526 ("parallel lookups machinery, part 4 (and
last)") dcache.h is using but does not include wait.h. It works as long
as it is included somehow earlier and fails otherwise.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
7/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: rbtree: include rcu.h because we use it
Date: Wed, 14 Sep 2016 11:52:17 +0200
Since commit c1adf20052d8 ("Introduce rb_replace_node_rcu()")
rbtree_augmented.h uses RCU related data structures but does not include
them. It works as long as gets somehow included before that and fails
otherwise.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
8/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: fs/dcache: init in_lookup_hashtable
Date: Wed, 14 Sep 2016 17:57:03 +0200
in_lookup_hashtable was introduced in commit 94bdd655caba ("parallel
lookups machinery, part 3") and never initialized but since it is in
the data it is all zeros. But we need this for -RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
9/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: iommu/iova: don't disable preempt around this_cpu_ptr()
Date: Thu, 15 Sep 2016 16:58:19 +0200
Commit 583248e6620a ("iommu/iova: Disable preemption around use of
this_cpu_ptr()") disables preemption while accessing a per-CPU variable.
This does keep lockdep quiet. However I don't see the point why it is
bad if we get migrated after its access to another CPU.
__iova_rcache_insert() and __iova_rcache_get() immediately locks the
variable after obtaining it - before accessing its members.
_If_ we get migrated away after retrieving the address of cpu_rcache
before taking the lock then the *other* task on the same CPU will
retrieve the same address of cpu_rcache and will spin on the lock.
alloc_iova_fast() disables preemption while invoking
free_cpu_cached_iovas() on each CPU. The function itself uses
per_cpu_ptr() which does not trigger a warning (like this_cpu_ptr()
does) because it assumes the caller knows what he does because he might
access the data structure from a different CPU (which means he needs
protection against concurrent access).
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
10/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: iommu/vt-d: don't disable preemption while accessing deferred_flush()
Date: Thu, 15 Sep 2016 17:16:44 +0200
get_cpu() disables preemption and returns the current CPU number. The
CPU number is later only used once while retrieving the address of the
local's CPU deferred_flush pointer.
We can instead use raw_cpu_ptr() while we remain preemptible. The worst
thing that can happen is that flush_unmaps_timeout() is invoked multiple
times: once by taskA after seeing HIGH_WATER_MARK and then preempted to
another CPU and then by taskB which saw HIGH_WATER_MARK on the same CPU
as taskA. It is also likely that ->size got from HIGH_WATER_MARK to 0
right after its read because another CPU invoked flush_unmaps_timeout()
for this CPU.
The access to flush_data is protected by a spinlock so even if we get
migrated to another CPU or preempted - the data structure is protected.
While at it, I marked deferred_flush static since I can't find a
reference to it outside of this file.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
11/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: x86/apic: get rid of "warning: 'acpi_ioapic_lock' defined but not used"
Date: Fri, 21 Oct 2016 10:29:11 +0200
kbuild test robot reported this against the -RT tree:
| In file included from include/linux/mutex.h:30:0,
| from include/linux/notifier.h:13,
| from include/linux/memory_hotplug.h:6,
| from include/linux/mmzone.h:777,
| from include/linux/gfp.h:5,
| from include/linux/slab.h:14,
| from include/linux/resource_ext.h:19,
| from include/linux/acpi.h:26,
| from arch/x86/kernel/acpi/boot.c:27:
|>> arch/x86/kernel/acpi/boot.c:90:21: warning: 'acpi_ioapic_lock' defined but not used [-Wunused-variable]
| static DEFINE_MUTEX(acpi_ioapic_lock);
| ^
| include/linux/mutex_rt.h:27:15: note: in definition of macro 'DEFINE_MUTEX'
| struct mutex mutexname = __MUTEX_INITIALIZER(mutexname)
^~~~~~~~~
which is also true (as in non-used) for !RT but the compiler does not
emit a warning.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
12/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: rxrpc: remove unused static variables
Date: Fri, 21 Oct 2016 10:54:50 +0200
The rxrpc_security_methods and rxrpc_security_sem user has been removed
in 648af7fca159 ("rxrpc: Absorb the rxkad security module"). This was
noticed by kbuild test robot for the -RT tree but is also true for !RT.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
13/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: rcu: update: make RCU_EXPEDITE_BOOT default
Date: Wed, 2 Nov 2016 16:45:58 +0100
RCU_EXPEDITE_BOOT should speed up the boot process by enforcing
synchronize_rcu_expedited() instead of synchronize_rcu() during the boot
process. There should be no reason why one does not want this and there
is no need worry about real time latency at this point.
Therefore make it default.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
14/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locking/percpu-rwsem: use swait for the wating writer
Date: Mon, 21 Nov 2016 19:26:15 +0100
Use struct swait_queue_head instead of wait_queue_head_t for the waiting
writer. The swait implementation is smaller and lightweight compared to
wait_queue_head_t.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
15/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: btrfs: drop trace_btrfs_all_work_done() from normal_work_helper()
Date: Wed, 14 Dec 2016 14:44:18 +0100
For btrfs_scrubparity_helper() the ->func() is set to
scrub_parity_bio_endio_worker(). This functions invokes invokes
scrub_free_parity() which kfrees() the worked object. All is good as
long as trace events are not enabled because we boom with a backtrace
like this:
| Workqueue: btrfs-endio btrfs_endio_helper
| RIP: 0010:[<ffffffff812f81ae>] [<ffffffff812f81ae>] trace_event_raw_event_btrfs__work__done+0x4e/0xa0
| Call Trace:
| [<ffffffff8136497d>] btrfs_scrubparity_helper+0x59d/0x780
| [<ffffffff81364c49>] btrfs_endio_helper+0x9/0x10
| [<ffffffff8108af8e>] process_one_work+0x26e/0x7b0
| [<ffffffff8108b516>] worker_thread+0x46/0x560
| [<ffffffff81091c4e>] kthread+0xee/0x110
| [<ffffffff818e166a>] ret_from_fork+0x2a/0x40
So in order to avoid this, I remove the trace point.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
16/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: btrfs: swap free() and trace point in run_ordered_work()
Date: Wed, 14 Dec 2016 12:28:52 +0100
The previous patch removed a trace point due to a use after free problem
with tracing enabled. While looking at the backtrace it took me a while
to find the right spot. While doing so I noticed that this trace point
could be used with two clean-up functions in run_ordered_work():
- run_one_async_free()
- async_cow_free()
Both of them free the `work' item so a later use in the tracepoint is
not possible.
This patches swaps the order so we first have the trace point and then
free the struct.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
17/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: NFSv4: replace seqcount_t with a seqlock_t
Date: Fri, 28 Oct 2016 23:05:11 +0200
The raw_write_seqcount_begin() in nfs4_reclaim_open_state() bugs me
because it maps to preempt_disable() in -RT which I can't have at this
point. So I took a look at the code.
It the lockdep part was removed in commit abbec2da13f0 ("NFS: Use
raw_write_seqcount_begin/end int nfs4_reclaim_open_state") because
lockdep complained. The whole seqcount thing was introduced in commit
c137afabe330 ("NFSv4: Allow the state manager to mark an open_owner as
being recovered").
The recovery threads runs only once.
write_seqlock() does not work on !RT because it disables preemption and it the
writer side is preemptible (has to remain so despite the fact that it will
block readers).
Reported-by: kernel test robot <xiaolong.ye@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
18/286 [
Author: Allen Pais
Email: allen.pais@oracle.com
Subject: sparc64: use generic rwsem spinlocks rt
Date: Fri, 13 Dec 2013 09:44:41 +0530
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
19/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: kernel/SRCU: provide a static initializer
Date: Tue, 19 Mar 2013 14:44:30 +0100
There are macros for static initializer for the three out of four
possible notifier types, that are:
ATOMIC_NOTIFIER_HEAD()
BLOCKING_NOTIFIER_HEAD()
RAW_NOTIFIER_HEAD()
This patch provides a static initilizer for the forth type to make it
complete.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
20/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: block: Shorten interrupt disabled regions
Date: Wed, 22 Jun 2011 19:47:02 +0200
Moving the blk_sched_flush_plug() call out of the interrupt/preempt
disabled region in the scheduler allows us to replace
local_irq_save/restore(flags) by local_irq_disable/enable() in
blk_flush_plug().
Now instead of doing this we disable interrupts explicitely when we
lock the request_queue and reenable them when we drop the lock. That
allows interrupts to be handled when the plug list contains requests
for more than one queue.
Aside of that this change makes the scope of the irq disabled region
more obvious. The current code confused the hell out of me when
looking at:
local_irq_save(flags);
spin_lock(q->queue_lock);
...
queue_unplugged(q...);
scsi_request_fn();
spin_unlock(q->queue_lock);
spin_lock(shost->host_lock);
spin_unlock_irq(shost->host_lock);
-------------------^^^ ????
spin_lock_irq(q->queue_lock);
spin_unlock(q->lock);
local_irq_restore(flags);
Also add a comment to __blk_run_queue() documenting that
q->request_fn() can drop q->queue_lock and reenable interrupts, but
must return with q->queue_lock held and interrupts disabled.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20110622174919.025446432@linutronix.de
]
21/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: timekeeping: Split jiffies seqlock
Date: Thu, 14 Feb 2013 22:36:59 +0100
Replace jiffies_lock seqlock with a simple seqcounter and a rawlock so
it can be taken in atomic context on RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
22/286 [
Author: Steven Rostedt
Email: rostedt@goodmis.org
Subject: tracing: Account for preempt off in preempt_schedule()
Date: Thu, 29 Sep 2011 12:24:30 -0500
The preempt_schedule() uses the preempt_disable_notrace() version
because it can cause infinite recursion by the function tracer as
the function tracer uses preempt_enable_notrace() which may call
back into the preempt_schedule() code as the NEED_RESCHED is still
set and the PREEMPT_ACTIVE has not been set yet.
See commit: d1f74e20b5b064a130cd0743a256c2d3cfe84010 that made this
change.
The preemptoff and preemptirqsoff latency tracers require the first
and last preempt count modifiers to enable tracing. But this skips
the checks. Since we can not convert them back to the non notrace
version, we can use the idle() hooks for the latency tracers here.
That is, the start/stop_critical_timings() works well to manually
start and stop the latency tracer for preempt off timings.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Clark Williams <williams@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
23/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: signal: Revert ptrace preempt magic
Date: Wed, 21 Sep 2011 19:57:12 +0200
Upstream commit '53da1d9456fe7f8 fix ptrace slowness' is nothing more
than a bandaid around the ptrace design trainwreck. It's not a
correctness issue, it's merily a cosmetic bandaid.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
24/286 [
Author: Frank Rowand
Email: frank.rowand@am.sony.com
Subject: arm: Convert arm boot_lock to raw
Date: Mon, 19 Sep 2011 14:51:14 -0700
The arm boot_lock is used by the secondary processor startup code. The locking
task is the idle thread, which has idle->sched_class == &idle_sched_class.
idle_sched_class->enqueue_task == NULL, so if the idle task blocks on the
lock, the attempt to wake it when the lock becomes available will fail:
try_to_wake_up()
...
activate_task()
enqueue_task()
p->sched_class->enqueue_task(rq, p, flags)
Fix by converting boot_lock to a raw spin lock.
Signed-off-by: Frank Rowand <frank.rowand@am.sony.com>
Link: http://lkml.kernel.org/r/4E77B952.3010606@am.sony.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
25/286 [
Author: Yang Shi
Email: yang.shi@linaro.org
Subject: arm: kprobe: replace patch_lock to raw lock
Date: Thu, 10 Nov 2016 16:17:55 -0800
When running kprobe on -rt kernel, the below bug is caught:
BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:931
in_atomic(): 1, irqs_disabled(): 128, pid: 14, name: migration/0
INFO: lockdep is turned off.
irq event stamp: 238
hardirqs last enabled at (237): [<80b5aecc>] _raw_spin_unlock_irqrestore+0x88/0x90
hardirqs last disabled at (238): [<80b56d88>] __schedule+0xec/0x94c
softirqs last enabled at (0): [<80225584>] copy_process.part.5+0x30c/0x1994
softirqs last disabled at (0): [< (null)>] (null)
Preemption disabled at:[<802f2b98>] cpu_stopper_thread+0xc0/0x140
CPU: 0 PID: 14 Comm: migration/0 Tainted: G O 4.8.3-rt2 #1
Hardware name: Freescale LS1021A
[<80212e7c>] (unwind_backtrace) from [<8020cd2c>] (show_stack+0x20/0x24)
[<8020cd2c>] (show_stack) from [<80689e14>] (dump_stack+0xa0/0xcc)
[<80689e14>] (dump_stack) from [<8025a43c>] (___might_sleep+0x1b8/0x2a4)
[<8025a43c>] (___might_sleep) from [<80b5b324>] (rt_spin_lock+0x34/0x74)
[<80b5b324>] (rt_spin_lock) from [<80b5c31c>] (__patch_text_real+0x70/0xe8)
[<80b5c31c>] (__patch_text_real) from [<80b5c3ac>] (patch_text_stop_machine+0x18/0x20)
[<80b5c3ac>] (patch_text_stop_machine) from [<802f2920>] (multi_cpu_stop+0xfc/0x134)
[<802f2920>] (multi_cpu_stop) from [<802f2ba0>] (cpu_stopper_thread+0xc8/0x140)
[<802f2ba0>] (cpu_stopper_thread) from [<802563a4>] (smpboot_thread_fn+0x1a4/0x354)
[<802563a4>] (smpboot_thread_fn) from [<80251d38>] (kthread+0x104/0x11c)
[<80251d38>] (kthread) from [<80207f70>] (ret_from_fork+0x14/0x24)
Since patch_text_stop_machine() is called in stop_machine() which disables IRQ,
sleepable lock should be not used in this atomic context, so replace patch_lock
to raw lock.
Signed-off-by: Yang Shi <yang.shi@linaro.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
26/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: posix-timers: Prevent broadcast signals
Date: Fri, 3 Jul 2009 08:29:20 -0500
Posix timers should not send broadcast signals and kernel only
signals. Prevent it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
27/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: signals: Allow rt tasks to cache one sigqueue struct
Date: Fri, 3 Jul 2009 08:44:56 -0500
To avoid allocation allow rt tasks to cache one sigqueue struct in
task struct.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
28/286 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: drivers: random: Reduce preempt disabled region
Date: Fri, 3 Jul 2009 08:29:30 -0500
No need to keep preemption disabled across the whole function.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
29/286 [
Author: Benedikt Spranger
Email: b.spranger@linutronix.de
Subject: ARM: AT91: PIT: Remove irq handler when clock event is unused
Date: Sat, 6 Mar 2010 17:47:10 +0100
Setup and remove the interrupt handler in clock event mode selection.
This avoids calling the (shared) interrupt handler when the device is
not used.
Signed-off-by: Benedikt Spranger <b.spranger@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy: redo the patch with NR_IRQS_LEGACY which is probably required since
commit 8fe82a55 ("ARM: at91: sparse irq support") which is included since v3.6.
Patch based on what Sami Pietikäinen <Sami.Pietikainen@wapice.com> suggested].
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
30/286 [
Author: Alexandre Belloni
Email: alexandre.belloni@free-electrons.com
Subject: clockevents/drivers/timer-atmel-pit: fix double free_irq
Date: Thu, 17 Mar 2016 21:09:43 +0100
clockevents_exchange_device() changes the state from detached to shutdown
and so at that point the IRQ has not yet been requested.
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
31/286 [
Author: Benedikt Spranger
Email: b.spranger@linutronix.de
Subject: clocksource: TCLIB: Allow higher clock rates for clock events
Date: Mon, 8 Mar 2010 18:57:04 +0100
As default the TCLIB uses the 32KiHz base clock rate for clock events.
Add a compile time selection to allow higher clock resulution.
(fixed up by Sami Pietikäinen <Sami.Pietikainen@wapice.com>)
Signed-off-by: Benedikt Spranger <b.spranger@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
32/286 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: drivers/net: Use disable_irq_nosync() in 8139too
Date: Fri, 3 Jul 2009 08:29:24 -0500
Use disable_irq_nosync() instead of disable_irq() as this might be
called in atomic context with netpoll.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
33/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: suspend: Prevent might sleep splats
Date: Thu, 15 Jul 2010 10:29:00 +0200
timekeeping suspend/resume calls read_persistant_clock() which takes
rtc_lock. That results in might sleep warnings because at that point
we run with interrupts disabled.
We cannot convert rtc_lock to a raw spinlock as that would trigger
other might sleep warnings.
As a temporary workaround we disable the might sleep warnings by
setting system_state to SYSTEM_SUSPEND before calling sysdev_suspend()
and restoring it to SYSTEM_RUNNING afer sysdev_resume().
Needs to be revisited.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
34/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: net-flip-lock-dep-thingy.patch
Date: Tue, 28 Jun 2011 10:59:58 +0200
=======================================================
[ INFO: possible circular locking dependency detected ]
3.0.0-rc3+ #26
-------------------------------------------------------
ip/1104 is trying to acquire lock:
(local_softirq_lock){+.+...}, at: [<ffffffff81056d12>] __local_lock+0x25/0x68
but task is already holding lock:
(sk_lock-AF_INET){+.+...}, at: [<ffffffff81433308>] lock_sock+0x10/0x12
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (sk_lock-AF_INET){+.+...}:
[<ffffffff810836e5>] lock_acquire+0x103/0x12e
[<ffffffff813e2781>] lock_sock_nested+0x82/0x92
[<ffffffff81433308>] lock_sock+0x10/0x12
[<ffffffff81433afa>] tcp_close+0x1b/0x355
[<ffffffff81453c99>] inet_release+0xc3/0xcd
[<ffffffff813dff3f>] sock_release+0x1f/0x74
[<ffffffff813dffbb>] sock_close+0x27/0x2b
[<ffffffff81129c63>] fput+0x11d/0x1e3
[<ffffffff81126577>] filp_close+0x70/0x7b
[<ffffffff8112667a>] sys_close+0xf8/0x13d
[<ffffffff814ae882>] system_call_fastpath+0x16/0x1b
-> #0 (local_softirq_lock){+.+...}:
[<ffffffff81082ecc>] __lock_acquire+0xacc/0xdc8
[<ffffffff810836e5>] lock_acquire+0x103/0x12e
[<ffffffff814a7e40>] _raw_spin_lock+0x3b/0x4a
[<ffffffff81056d12>] __local_lock+0x25/0x68
[<ffffffff81056d8b>] local_bh_disable+0x36/0x3b
[<ffffffff814a7fc4>] _raw_write_lock_bh+0x16/0x4f
[<ffffffff81433c38>] tcp_close+0x159/0x355
[<ffffffff81453c99>] inet_release+0xc3/0xcd
[<ffffffff813dff3f>] sock_release+0x1f/0x74
[<ffffffff813dffbb>] sock_close+0x27/0x2b
[<ffffffff81129c63>] fput+0x11d/0x1e3
[<ffffffff81126577>] filp_close+0x70/0x7b
[<ffffffff8112667a>] sys_close+0xf8/0x13d
[<ffffffff814ae882>] system_call_fastpath+0x16/0x1b
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(sk_lock-AF_INET);
lock(local_softirq_lock);
lock(sk_lock-AF_INET);
lock(local_softirq_lock);
*** DEADLOCK ***
1 lock held by ip/1104:
#0: (sk_lock-AF_INET){+.+...}, at: [<ffffffff81433308>] lock_sock+0x10/0x12
stack backtrace:
Pid: 1104, comm: ip Not tainted 3.0.0-rc3+ #26
Call Trace:
[<ffffffff81081649>] print_circular_bug+0x1f8/0x209
[<ffffffff81082ecc>] __lock_acquire+0xacc/0xdc8
[<ffffffff81056d12>] ? __local_lock+0x25/0x68
[<ffffffff810836e5>] lock_acquire+0x103/0x12e
[<ffffffff81056d12>] ? __local_lock+0x25/0x68
[<ffffffff81046c75>] ? get_parent_ip+0x11/0x41
[<ffffffff814a7e40>] _raw_spin_lock+0x3b/0x4a
[<ffffffff81056d12>] ? __local_lock+0x25/0x68
[<ffffffff81046c8c>] ? get_parent_ip+0x28/0x41
[<ffffffff81056d12>] __local_lock+0x25/0x68
[<ffffffff81056d8b>] local_bh_disable+0x36/0x3b
[<ffffffff81433308>] ? lock_sock+0x10/0x12
[<ffffffff814a7fc4>] _raw_write_lock_bh+0x16/0x4f
[<ffffffff81433c38>] tcp_close+0x159/0x355
[<ffffffff81453c99>] inet_release+0xc3/0xcd
[<ffffffff813dff3f>] sock_release+0x1f/0x74
[<ffffffff813dffbb>] sock_close+0x27/0x2b
[<ffffffff81129c63>] fput+0x11d/0x1e3
[<ffffffff81126577>] filp_close+0x70/0x7b
[<ffffffff8112667a>] sys_close+0xf8/0x13d
[<ffffffff814ae882>] system_call_fastpath+0x16/0x1b
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
35/286 [
Author: Marc Kleine-Budde
Email: mkl@pengutronix.de
Subject: net: sched: Use msleep() instead of yield()
Date: Wed, 5 Mar 2014 00:49:47 +0100
On PREEMPT_RT enabled systems the interrupt handler run as threads at prio 50
(by default). If a high priority userspace process tries to shut down a busy
network interface it might spin in a yield loop waiting for the device to
become idle. With the interrupt thread having a lower priority than the
looping process it might never be scheduled and so result in a deadlock on UP
systems.
With Magic SysRq the following backtrace can be produced:
> test_app R running 0 174 168 0x00000000
> [<c02c7070>] (__schedule+0x220/0x3fc) from [<c02c7870>] (preempt_schedule_irq+0x48/0x80)
> [<c02c7870>] (preempt_schedule_irq+0x48/0x80) from [<c0008fa8>] (svc_preempt+0x8/0x20)
> [<c0008fa8>] (svc_preempt+0x8/0x20) from [<c001a984>] (local_bh_enable+0x18/0x88)
> [<c001a984>] (local_bh_enable+0x18/0x88) from [<c025316c>] (dev_deactivate_many+0x220/0x264)
> [<c025316c>] (dev_deactivate_many+0x220/0x264) from [<c023be04>] (__dev_close_many+0x64/0xd4)
> [<c023be04>] (__dev_close_many+0x64/0xd4) from [<c023be9c>] (__dev_close+0x28/0x3c)
> [<c023be9c>] (__dev_close+0x28/0x3c) from [<c023f7f0>] (__dev_change_flags+0x88/0x130)
> [<c023f7f0>] (__dev_change_flags+0x88/0x130) from [<c023f904>] (dev_change_flags+0x10/0x48)
> [<c023f904>] (dev_change_flags+0x10/0x48) from [<c024c140>] (do_setlink+0x370/0x7ec)
> [<c024c140>] (do_setlink+0x370/0x7ec) from [<c024d2f0>] (rtnl_newlink+0x2b4/0x450)
> [<c024d2f0>] (rtnl_newlink+0x2b4/0x450) from [<c024cfa0>] (rtnetlink_rcv_msg+0x158/0x1f4)
> [<c024cfa0>] (rtnetlink_rcv_msg+0x158/0x1f4) from [<c0256740>] (netlink_rcv_skb+0xac/0xc0)
> [<c0256740>] (netlink_rcv_skb+0xac/0xc0) from [<c024bbd8>] (rtnetlink_rcv+0x18/0x24)
> [<c024bbd8>] (rtnetlink_rcv+0x18/0x24) from [<c02561b8>] (netlink_unicast+0x13c/0x198)
> [<c02561b8>] (netlink_unicast+0x13c/0x198) from [<c025651c>] (netlink_sendmsg+0x264/0x2e0)
> [<c025651c>] (netlink_sendmsg+0x264/0x2e0) from [<c022af98>] (sock_sendmsg+0x78/0x98)
> [<c022af98>] (sock_sendmsg+0x78/0x98) from [<c022bb50>] (___sys_sendmsg.part.25+0x268/0x278)
> [<c022bb50>] (___sys_sendmsg.part.25+0x268/0x278) from [<c022cf08>] (__sys_sendmsg+0x48/0x78)
> [<c022cf08>] (__sys_sendmsg+0x48/0x78) from [<c0009320>] (ret_fast_syscall+0x0/0x2c)
This patch works around the problem by replacing yield() by msleep(1), giving
the interrupt thread time to finish, similar to other changes contained in the
rt patch set. Using wait_for_completion() instead would probably be a better
solution.
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
36/286 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: x86/ioapic: Do not unmask io_apic when interrupt is in progress
Date: Fri, 3 Jul 2009 08:29:27 -0500
With threaded interrupts we might see an interrupt in progress on
migration. Do not unmask it when this is the case.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
37/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: pci: Use __wake_up_all_locked in pci_unblock_user_cfg_access()
Date: Thu, 1 Dec 2011 00:07:16 +0100
The waitqueue is protected by the pci_lock, so we can just avoid to
lock the waitqueue lock itself. That prevents the
might_sleep()/scheduling while atomic problem on RT
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
38/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: latencyhist: disable jump-labels
Date: Thu, 4 Feb 2016 14:08:06 +0100
Atleast on X86 we die a recursive death
|CPU: 3 PID: 585 Comm: bash Not tainted 4.4.1-rt4+ #198
|Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Debian-1.8.2-1 04/01/2014
|task: ffff88007ab4cd00 ti: ffff88007ab94000 task.ti: ffff88007ab94000
|RIP: 0010:[<ffffffff81684870>] [<ffffffff81684870>] int3+0x0/0x10
|RSP: 0018:ffff88013c107fd8 EFLAGS: 00010082
|RAX: ffff88007ab4cd00 RBX: ffffffff8100ceab RCX: 0000000080202001
|RDX: 0000000000000000 RSI: ffffffff8100ceab RDI: ffffffff810c78b2
|RBP: ffff88007ab97c10 R08: ffffffffff57b000 R09: 0000000000000000
|R10: ffff88013bb64790 R11: ffff88007ab4cd68 R12: ffffffff8100ceab
|R13: ffffffff810c78b2 R14: ffffffff810f8158 R15: ffffffff810f9120
|FS: 0000000000000000(0000) GS:ffff88013c100000(0063) knlGS:00000000f74e3940
|CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b
|CR2: 0000000008cf6008 CR3: 000000013b169000 CR4: 00000000000006e0
|Call Trace:
| <#DB>
| [<ffffffff810f8158>] ? trace_preempt_off+0x18/0x170
| <<EOE>>
| [<ffffffff81077745>] preempt_count_add+0xa5/0xc0
| [<ffffffff810c78b2>] on_each_cpu+0x22/0x90
| [<ffffffff8100ceab>] text_poke_bp+0x5b/0xc0
| [<ffffffff8100a29c>] arch_jump_label_transform+0x8c/0xf0
| [<ffffffff8111c77c>] __jump_label_update+0x6c/0x80
| [<ffffffff8111c83a>] jump_label_update+0xaa/0xc0
| [<ffffffff8111ca54>] static_key_slow_inc+0x94/0xa0
| [<ffffffff810e0d8d>] tracepoint_probe_register_prio+0x26d/0x2c0
| [<ffffffff810e0df3>] tracepoint_probe_register+0x13/0x20
| [<ffffffff810fca78>] trace_event_reg+0x98/0xd0
| [<ffffffff810fcc8b>] __ftrace_event_enable_disable+0x6b/0x180
| [<ffffffff810fd5b8>] event_enable_write+0x78/0xc0
| [<ffffffff8117a768>] __vfs_write+0x28/0xe0
| [<ffffffff8117b025>] vfs_write+0xa5/0x180
| [<ffffffff8117bb76>] SyS_write+0x46/0xa0
| [<ffffffff81002c91>] do_fast_syscall_32+0xa1/0x1d0
| [<ffffffff81684d57>] sysenter_flags_fixed+0xd/0x17
during
echo 1 > /sys/kernel/debug/tracing/events/hist/preemptirqsoff_hist/enable
Reported-By: Christoph Mathys <eraserix@gmail.com>
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
39/286 [
Author: Carsten Emde
Email: C.Emde@osadl.org
Subject: tracing: Add latency histograms
Date: Tue, 19 Jul 2011 14:03:41 +0100
This patch provides a recording mechanism to store data of potential
sources of system latencies. The recordings separately determine the
latency caused by a delayed timer expiration, by a delayed wakeup of the
related user space program and by the sum of both. The histograms can be
enabled and reset individually. The data are accessible via the debug
filesystem. For details please consult Documentation/trace/histograms.txt.
Signed-off-by: Carsten Emde <C.Emde@osadl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
40/286 [
Author: Mathieu Desnoyers
Email: mathieu.desnoyers@efficios.com
Subject: latency_hist: Update sched_wakeup probe
Date: Sun, 25 Oct 2015 18:06:05 -0400
"sched: Introduce the 'trace_sched_waking' tracepoint" introduces a
prototype change for the sched_wakeup probe: the "success" argument is
removed. Update the latency_hist probe following this change.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Julien Desfossez <jdesfossez@efficios.com>
Cc: Francis Giraldeau <francis.giraldeau@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1445810765-18732-1-git-send-email-mathieu.desnoyers@efficios.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
41/286 [
Author: Carsten Emde
Email: C.Emde@osadl.org
Subject: trace/latency-hist: Consider new argument when probing the sched_switch tracer
Date: Tue, 5 Jan 2016 10:21:59 +0100
The sched_switch tracer has got a new argument. Fix the latency tracer
accordingly.
Recently: c73464b1c843 ("sched/core: Fix trace_sched_switch()") since
v4.4-rc1.
Signed-off-by: Carsten Emde <C.Emde@osadl.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
42/286 [
Author: Yang Shi
Email: yang.shi@windriver.com
Subject: trace: Use rcuidle version for preemptoff_hist trace point
Date: Tue, 23 Feb 2016 13:23:23 -0800
When running -rt kernel with both PREEMPT_OFF_HIST and LOCKDEP enabled,
the below error is reported:
[ INFO: suspicious RCU usage. ]
4.4.1-rt6 #1 Not tainted
include/trace/events/hist.h:31 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
RCU used illegally from idle CPU!
rcu_scheduler_active = 1, debug_locks = 0
RCU used illegally from extended quiescent state!
no locks held by swapper/0/0.
stack backtrace:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.1-rt6-WR8.0.0.0_standard #1
Stack : 0000000000000006 0000000000000000 ffffffff81ca8c38 ffffffff81c8fc80
ffffffff811bdd68 ffffffff81cb0000 0000000000000000 ffffffff81cb0000
0000000000000000 0000000000000000 0000000000000004 0000000000000000
0000000000000004 ffffffff811bdf50 0000000000000000 ffffffff82b60000
0000000000000000 ffffffff812897ac ffffffff819f0000 000000000000000b
ffffffff811be460 ffffffff81b7c588 ffffffff81c8fc80 0000000000000000
0000000000000000 ffffffff81ec7f88 ffffffff81d70000 ffffffff81b70000
ffffffff81c90000 ffffffff81c3fb00 ffffffff81c3fc28 ffffffff815e6f98
0000000000000000 ffffffff81c8fa87 ffffffff81b70958 ffffffff811bf2c4
0707fe32e8d60ca5 ffffffff81126d60 0000000000000000 0000000000000000
...
Call Trace:
[<ffffffff81126d60>] show_stack+0xe8/0x108
[<ffffffff815e6f98>] dump_stack+0x88/0xb0
[<ffffffff8124b88c>] time_hardirqs_off+0x204/0x300
[<ffffffff811aa5dc>] trace_hardirqs_off_caller+0x24/0xe8
[<ffffffff811a4ec4>] cpu_startup_entry+0x39c/0x508
[<ffffffff81d7dc68>] start_kernel+0x584/0x5a0
Replace regular trace_preemptoff_hist to rcuidle version to avoid the error.
Signed-off-by: Yang Shi <yang.shi@windriver.com>
Cc: bigeasy@linutronix.de
Cc: rostedt@goodmis.org
Cc: linux-rt-users@vger.kernel.org
Link: http://lkml.kernel.org/r/1456262603-10075-1-git-send-email-yang.shi@windriver.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
43/286 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: printk: Add a printk kill switch
Date: Fri, 22 Jul 2011 17:58:40 +0200
Add a prinkt-kill-switch. This is used from (NMI) watchdog to ensure that
it does not dead-lock with the early printk code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
44/286 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: printk: Add "force_early_printk" boot param to help with debugging
Date: Fri, 2 Sep 2011 14:41:29 +0200
Gives me an option to screw printk and actually see what the machine
says.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1314967289.1301.11.camel@twins
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/n/tip-ykb97nsfmobq44xketrxs977@git.kernel.org
]
45/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rt: Provide PREEMPT_RT_BASE config switch
Date: Fri, 17 Jun 2011 12:39:57 +0200
Introduce PREEMPT_RT_BASE which enables parts of
PREEMPT_RT_FULL. Forces interrupt threading and enables some of the RT
substitutions for testing.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
46/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: kconfig: Disable config options which are not RT compatible
Date: Sun, 24 Jul 2011 12:11:43 +0200
Disable stuff which is known to have issues on RT
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
47/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: kconfig: Add PREEMPT_RT_FULL
Date: Wed, 29 Jun 2011 14:58:57 +0200
Introduce the final symbol for PREEMPT_RT_FULL.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
48/286 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: bug: BUG_ON/WARN_ON variants dependend on RT/!RT
Date: Fri, 3 Jul 2009 08:29:58 -0500
Introduce RT/NON-RT WARN/BUG statements to avoid ifdefs in the code.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
49/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: iommu/amd: Use WARN_ON_NORT in __attach_device()
Date: Sat, 27 Feb 2016 10:22:23 +0100
RT does not disable interrupts here, but the protection is still
correct. Fixup the WARN_ON so it won't yell on RT.
Note: This WARN_ON is bogus anyway. The real thing this needs to check is that
amd_iommu_devtable_lock is held.
Reported-by: DIXLOR <dixlor@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
50/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rt: local_irq_* variants depending on RT/!RT
Date: Tue, 21 Jul 2009 22:34:14 +0200
Add local_irq_*_(no)rt variant which are mainly used to break
interrupt disabled sections on PREEMPT_RT or to explicitely disable
interrupts on PREEMPT_RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
51/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: preempt: Provide preempt_*_(no)rt variants
Date: Fri, 24 Jul 2009 12:38:56 +0200
RT needs a few preempt_disable/enable points which are not necessary
otherwise. Implement variants to avoid #ifdeffery.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
52/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: Intrduce migrate_disable() + cpu_light()
Date: Fri, 17 Jun 2011 15:42:38 +0200
Introduce migrate_disable(). The task can't be pushed to another CPU but can
be preempted.
From: Peter Zijlstra <a.p.zijlstra@chello.nl>:
|Make migrate_disable() be a preempt_disable() for !rt kernels. This
|allows generic code to use it but still enforces that these code
|sections stay relatively small.
|
|A preemptible migrate_disable() accessible for general use would allow
|people growing arbitrary per-cpu crap instead of clean these things
|up.
From: Steven Rostedt <rostedt@goodmis.org>
| The migrate_disable() can cause a bit of a overhead to the RT kernel,
| as changing the affinity is expensive to do at every lock encountered.
| As a running task can not migrate, the actual disabling of migration
| does not need to occur until the task is about to schedule out.
|
| In most cases, a task that disables migration will enable it before
| it schedules making this change improve performance tremendously.
On top of this build get/put_cpu_light(). It is similar to get_cpu():
it uses migrate_disable() instead of preempt_disable(). That means the user
remains on the same CPU but the function using it may be preempted and
invoked again from another caller on the same CPU.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
53/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rt: Add local irq locks
Date: Mon, 20 Jun 2011 09:03:47 +0200
Introduce locallock. For !RT this maps to preempt_disable()/
local_irq_disable() so there is not much that changes. For RT this will
map to a spinlock. This makes preemption possible and locked "ressource"
gets the lockdep anotation it wouldn't have otherwise. The locks are
recursive for owner == current. Also, all locks user migrate_disable()
which ensures that the task is not migrated to another CPU while the lock
is held and the owner is preempted.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
54/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locallock: add local_lock_on()
Date: Fri, 27 May 2016 15:11:51 +0200
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
55/286 [
Author: Steven Rostedt
Email: srostedt@redhat.com
Subject: ata: Do not disable interrupts in ide code for preempt-rt
Date: Fri, 3 Jul 2009 08:44:29 -0500
Use the local_irq_*_nort variants.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
56/286 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: ide: Do not disable interrupts for PREEMPT-RT
Date: Fri, 3 Jul 2009 08:30:16 -0500
Use the local_irq_*_nort variants.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
57/286 [
Author: Sven-Thorsten Dietrich
Email: sdietrich@novell.com
Subject: infiniband: Mellanox IB driver patch use _nort() primitives
Date: Fri, 3 Jul 2009 08:30:35 -0500
Fixes in_atomic stack-dump, when Mellanox module is loaded into the RT
Kernel.
Michael S. Tsirkin <mst@dev.mellanox.co.il> sayeth:
"Basically, if you just make spin_lock_irqsave (and spin_lock_irq) not disable
interrupts for non-raw spinlocks, I think all of infiniband will be fine without
changes."
Signed-off-by: Sven-Thorsten Dietrich <sven@thebigcorporation.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
58/286 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: input: gameport: Do not disable interrupts on PREEMPT_RT
Date: Fri, 3 Jul 2009 08:30:16 -0500
Use the _nort() primitives.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
59/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: core: Do not disable interrupts on RT in kernel/users.c
Date: Tue, 21 Jul 2009 23:06:05 +0200
Use the local_irq_*_nort variants to reduce latencies in RT. The code
is serialized by the locks. No need to disable interrupts.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
60/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: usb: Use _nort in giveback function
Date: Fri, 8 Nov 2013 17:34:54 +0100
Since commit 94dfd7ed ("USB: HCD: support giveback of URB in tasklet
context") I see
|BUG: sleeping function called from invalid context at kernel/rtmutex.c:673
|in_atomic(): 0, irqs_disabled(): 1, pid: 109, name: irq/11-uhci_hcd
|no locks held by irq/11-uhci_hcd/109.
|irq event stamp: 440
|hardirqs last enabled at (439): [<ffffffff816a7555>] _raw_spin_unlock_irqrestore+0x75/0x90
|hardirqs last disabled at (440): [<ffffffff81514906>] __usb_hcd_giveback_urb+0x46/0xc0
|softirqs last enabled at (0): [<ffffffff81081821>] copy_process.part.52+0x511/0x1510
|softirqs last disabled at (0): [< (null)>] (null)
|CPU: 3 PID: 109 Comm: irq/11-uhci_hcd Not tainted 3.12.0-rt0-rc1+ #13
|Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
| 0000000000000000 ffff8800db9ffbe0 ffffffff8169f064 0000000000000000
| ffff8800db9ffbf8 ffffffff810b2122 ffff88020f03e888 ffff8800db9ffc18
| ffffffff816a6944 ffffffff810b5748 ffff88020f03c000 ffff8800db9ffc50
|Call Trace:
| [<ffffffff8169f064>] dump_stack+0x4e/0x8f
| [<ffffffff810b2122>] __might_sleep+0x112/0x190
| [<ffffffff816a6944>] rt_spin_lock+0x24/0x60
| [<ffffffff8158435b>] hid_ctrl+0x3b/0x190
| [<ffffffff8151490f>] __usb_hcd_giveback_urb+0x4f/0xc0
| [<ffffffff81514aaf>] usb_hcd_giveback_urb+0x3f/0x140
| [<ffffffff815346af>] uhci_giveback_urb+0xaf/0x280
| [<ffffffff8153666a>] uhci_scan_schedule+0x47a/0xb10
| [<ffffffff81537336>] uhci_irq+0xa6/0x1a0
| [<ffffffff81513c48>] usb_hcd_irq+0x28/0x40
| [<ffffffff810c8ba3>] irq_forced_thread_fn+0x23/0x70
| [<ffffffff810c918f>] irq_thread+0x10f/0x150
| [<ffffffff810a6fad>] kthread+0xcd/0xe0
| [<ffffffff816a842c>] ret_from_fork+0x7c/0xb0
on -RT we run threaded so no need to disable interrupts.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
61/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mm/scatterlist: Do not disable irqs on RT
Date: Fri, 3 Jul 2009 08:44:34 -0500
The local_irq_save() is not only used to get things done "fast" but
also to ensure that in case of SG_MITER_ATOMIC we are in "atomic"
context for kmap_atomic(). For -RT it is enough to keep pagefault
disabled (which is currently handled by kmap_atomic()).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
62/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: mm/workingset: Do not protect workingset_shadow_nodes with irq off
Date: Thu, 29 Jan 2015 17:19:44 +0100
workingset_shadow_nodes is protected by local_irq_disable(). Some users
use spin_lock_irq().
Replace the irq/on with a local_lock(). Rename workingset_shadow_nodes
so I catch users of it which will be introduced later.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
63/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: signal: Make __lock_task_sighand() RT aware
Date: Fri, 22 Jul 2011 08:07:08 +0200
local_irq_save() + spin_lock(&sighand->siglock) does not work on
-RT. Use the nort variants.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
64/286 [
Author: Oleg Nesterov
Email: oleg@redhat.com
Subject: signal/x86: Delay calling signals in atomic
Date: Tue, 14 Jul 2015 14:26:34 +0200
On x86_64 we must disable preemption before we enable interrupts
for stack faults, int3 and debugging, because the current task is using
a per CPU debug stack defined by the IST. If we schedule out, another task
can come in and use the same stack and cause the stack to be corrupted
and crash the kernel on return.
When CONFIG_PREEMPT_RT_FULL is enabled, spin_locks become mutexes, and
one of these is the spin lock used in signal handling.
Some of the debug code (int3) causes do_trap() to send a signal.
This function calls a spin lock that has been converted to a mutex
and has the possibility to sleep. If this happens, the above issues with
the corrupted stack is possible.
Instead of calling the signal right away, for PREEMPT_RT and x86_64,
the signal information is stored on the stacks task_struct and
TIF_NOTIFY_RESUME is set. Then on exit of the trap, the signal resume
code will send the signal when preemption is enabled.
[ rostedt: Switched from #ifdef CONFIG_PREEMPT_RT_FULL to
ARCH_RT_DELAYS_SIGNAL_SEND and added comments to the code. ]
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
65/286 [
Author: Yang Shi
Email: yang.shi@linaro.org
Subject: x86/signal: delay calling signals on 32bit
Date: Thu, 10 Dec 2015 10:58:51 -0800
When running some ptrace single step tests on x86-32 machine, the below problem
is triggered:
BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:917
in_atomic(): 1, irqs_disabled(): 0, pid: 1041, name: dummy2
Preemption disabled at:[<c100326f>] do_debug+0x1f/0x1a0
CPU: 10 PID: 1041 Comm: dummy2 Tainted: G W 4.1.13-rt13 #1
Call Trace:
[<c1aa8306>] dump_stack+0x46/0x5c
[<c1080517>] ___might_sleep+0x137/0x220
[<c1ab0eff>] rt_spin_lock+0x1f/0x80
[<c1064b5a>] do_force_sig_info+0x2a/0xc0
[<c106567d>] force_sig_info+0xd/0x10
[<c1010cff>] send_sigtrap+0x6f/0x80
[<c10033b1>] do_debug+0x161/0x1a0
[<c1ab2921>] debug_stack_correct+0x2e/0x35
This happens since 959274753857 ("x86, traps: Track entry into and exit
from IST context") which was merged in v4.1-rc1.
Signed-off-by: Yang Shi <yang.shi@linaro.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
66/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: net/wireless: Use WARN_ON_NORT()
Date: Thu, 21 Jul 2011 21:05:33 +0200
The softirq counter is meaningless on RT, so the check triggers a
false positive.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
67/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: buffer_head: Replace bh_uptodate_lock for -rt
Date: Fri, 18 Mar 2011 09:18:52 +0100
Wrap the bit_spin_lock calls into a separate inline and add the RT
replacements with a real spinlock.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
68/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: fs: jbd/jbd2: Make state lock and journal head lock rt safe
Date: Fri, 18 Mar 2011 10:11:25 +0100
bit_spin_locks break under RT.
Based on a previous patch from Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
--
include/linux/buffer_head.h | 8 ++++++++
include/linux/jbd2.h | 24 ++++++++++++++++++++++++
2 files changed, 32 insertions(+)
]
69/286 [
Author: Paul Gortmaker
Email: paul.gortmaker@windriver.com
Subject: list_bl: Make list head locking RT safe
Date: Fri, 21 Jun 2013 15:07:25 -0400
As per changes in include/linux/jbd_common.h for avoiding the
bit_spin_locks on RT ("fs: jbd/jbd2: Make state lock and journal
head lock rt safe") we do the same thing here.
We use the non atomic __set_bit and __clear_bit inside the scope of
the lock to preserve the ability of the existing LIST_DEBUG code to
use the zero'th bit in the sanity checks.
As a bit spinlock, we had no lockdep visibility into the usage
of the list head locking. Now, if we were to implement it as a
standard non-raw spinlock, we would see:
BUG: sleeping function called from invalid context at kernel/rtmutex.c:658
in_atomic(): 1, irqs_disabled(): 0, pid: 122, name: udevd
5 locks held by udevd/122:
#0: (&sb->s_type->i_mutex_key#7/1){+.+.+.}, at: [<ffffffff811967e8>] lock_rename+0xe8/0xf0
#1: (rename_lock){+.+...}, at: [<ffffffff811a277c>] d_move+0x2c/0x60
#2: (&dentry->d_lock){+.+...}, at: [<ffffffff811a0763>] dentry_lock_for_move+0xf3/0x130
#3: (&dentry->d_lock/2){+.+...}, at: [<ffffffff811a0734>] dentry_lock_for_move+0xc4/0x130
#4: (&dentry->d_lock/3){+.+...}, at: [<ffffffff811a0747>] dentry_lock_for_move+0xd7/0x130
Pid: 122, comm: udevd Not tainted 3.4.47-rt62 #7
Call Trace:
[<ffffffff810b9624>] __might_sleep+0x134/0x1f0
[<ffffffff817a24d4>] rt_spin_lock+0x24/0x60
[<ffffffff811a0c4c>] __d_shrink+0x5c/0xa0
[<ffffffff811a1b2d>] __d_drop+0x1d/0x40
[<ffffffff811a24be>] __d_move+0x8e/0x320
[<ffffffff811a278e>] d_move+0x3e/0x60
[<ffffffff81199598>] vfs_rename+0x198/0x4c0
[<ffffffff8119b093>] sys_renameat+0x213/0x240
[<ffffffff817a2de5>] ? _raw_spin_unlock+0x35/0x60
[<ffffffff8107781c>] ? do_page_fault+0x1ec/0x4b0
[<ffffffff817a32ca>] ? retint_swapgs+0xe/0x13
[<ffffffff813eb0e6>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[<ffffffff8119b0db>] sys_rename+0x1b/0x20
[<ffffffff817a3b96>] system_call_fastpath+0x1a/0x1f
Since we are only taking the lock during short lived list operations,
lets assume for now that it being raw won't be a significant latency
concern.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
70/286 [
Author: Josh Cartwright
Email: joshc@ni.com
Subject: list_bl: fixup bogus lockdep warning
Date: Thu, 31 Mar 2016 00:04:25 -0500
At first glance, the use of 'static inline' seems appropriate for
INIT_HLIST_BL_HEAD().
However, when a 'static inline' function invocation is inlined by gcc,
all callers share any static local data declared within that inline
function.
This presents a problem for how lockdep classes are setup. raw_spinlocks, for
example, when CONFIG_DEBUG_SPINLOCK,
# define raw_spin_lock_init(lock) \
do { \
static struct lock_class_key __key; \
\
__raw_spin_lock_init((lock), #lock, &__key); \
} while (0)
When this macro is expanded into a 'static inline' caller, like
INIT_HLIST_BL_HEAD():
static inline INIT_HLIST_BL_HEAD(struct hlist_bl_head *h)
{
h->first = NULL;
raw_spin_lock_init(&h->lock);
}
...the static local lock_class_key object is made a function static.
For compilation units which initialize invoke INIT_HLIST_BL_HEAD() more
than once, then, all of the invocations share this same static local
object.
This can lead to some very confusing lockdep splats (example below).
Solve this problem by forcing the INIT_HLIST_BL_HEAD() to be a macro,
which prevents the lockdep class object sharing.
=============================================
[ INFO: possible recursive locking detected ]
4.4.4-rt11 #4 Not tainted
---------------------------------------------
kswapd0/59 is trying to acquire lock:
(&h->lock#2){+.+.-.}, at: mb_cache_shrink_scan
but task is already holding lock:
(&h->lock#2){+.+.-.}, at: mb_cache_shrink_scan
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&h->lock#2);
lock(&h->lock#2);
*** DEADLOCK ***
May be due to missing lock nesting notation
2 locks held by kswapd0/59:
#0: (shrinker_rwsem){+.+...}, at: rt_down_read_trylock
#1: (&h->lock#2){+.+.-.}, at: mb_cache_shrink_scan
Reported-by: Luis Claudio R. Goncalves <lclaudio@uudg.org>
Tested-by: Luis Claudio R. Goncalves <lclaudio@uudg.org>
Signed-off-by: Josh Cartwright <joshc@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
71/286 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: genirq: Disable irqpoll on -rt
Date: Fri, 3 Jul 2009 08:29:57 -0500
Creates long latencies for no value
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
72/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: genirq: Force interrupt thread on RT
Date: Sun, 3 Apr 2011 11:57:29 +0200
Force threaded_irqs and optimize the code (force_irqthreads) in regard
to this.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
73/286 [
Author: Steven Rostedt
Email: rostedt@goodmis.org
Subject: drivers/net: vortex fix locking issues
Date: Fri, 3 Jul 2009 08:30:00 -0500
Argh, cut and paste wasn't enough...
Use this patch instead. It needs an irq disable. But, believe it or not,
on SMP this is actually better. If the irq is shared (as it is in Mark's
case), we don't stop the irq of other devices from being handled on
another CPU (unfortunately for Mark, he pinned all interrupts to one CPU).
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
drivers/net/ethernet/3com/3c59x.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
]
74/286 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: mm: page_alloc: rt-friendly per-cpu pages
Date: Fri, 3 Jul 2009 08:29:37 -0500
rt-friendly per-cpu pages: convert the irqs-off per-cpu locking
method into a preemptible, explicit-per-cpu-locks method.
Contains fixes from:
Peter Zijlstra <a.p.zijlstra@chello.nl>
Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
75/286 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: mm: page_alloc: Reduce lock sections further
Date: Fri, 3 Jul 2009 08:44:37 -0500
Split out the pages which are to be freed into a separate list and
call free_pages_bulk() outside of the percpu page allocator locks.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
76/286 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: mm/swap: Convert to percpu locked
Date: Fri, 3 Jul 2009 08:29:51 -0500
Replace global locks (get_cpu + local_irq_save) with "local_locks()".
Currently there is one of for "rotate" and one for "swap".
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
77/286 [
Author: Luiz Capitulino
Email: lcapitulino@redhat.com
Subject: mm: perform lru_add_drain_all() remotely
Date: Fri, 27 May 2016 15:03:28 +0200
lru_add_drain_all() works by scheduling lru_add_drain_cpu() to run
on all CPUs that have non-empty LRU pagevecs and then waiting for
the scheduled work to complete. However, workqueue threads may never
have the chance to run on a CPU that's running a SCHED_FIFO task.
This causes lru_add_drain_all() to block forever.
This commit solves this problem by changing lru_add_drain_all()
to drain the LRU pagevecs of remote CPUs. This is done by grabbing
swapvec_lock and calling lru_add_drain_cpu().
PS: This is based on an idea and initial implementation by
Rik van Riel.
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
78/286 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: mm/vmstat: Protect per cpu variables with preempt disable on RT
Date: Fri, 3 Jul 2009 08:30:13 -0500
Disable preemption on -RT for the vmstat code. On vanila the code runs in
IRQ-off regions while on -RT it is not. "preempt_disable" ensures that the
same ressources is not updated in parallel due to preemption.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
79/286 [
Author: Frank Rowand
Email: frank.rowand@am.sony.com
Subject: ARM: Initialize split page table locks for vector page
Date: Sat, 1 Oct 2011 18:58:13 -0700
Without this patch, ARM can not use SPLIT_PTLOCK_CPUS if
PREEMPT_RT_FULL=y because vectors_user_mapping() creates a
VM_ALWAYSDUMP mapping of the vector page (address 0xffff0000), but no
ptl->lock has been allocated for the page. An attempt to coredump
that page will result in a kernel NULL pointer dereference when
follow_page() attempts to lock the page.
The call tree to the NULL pointer dereference is:
do_notify_resume()
get_signal_to_deliver()
do_coredump()
elf_core_dump()
get_dump_page()
__get_user_pages()
follow_page()
pte_offset_map_lock() <----- a #define
...
rt_spin_lock()
The underlying problem is exposed by mm-shrink-the-page-frame-to-rt-size.patch.
Signed-off-by: Frank Rowand <frank.rowand@am.sony.com>
Cc: Frank <Frank_Rowand@sonyusa.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/4E87C535.2030907@am.sony.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
80/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mm: bounce: Use local_irq_save_nort
Date: Wed, 9 Jan 2013 10:33:09 +0100
kmap_atomic() is preemptible on RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
81/286 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: mm: Allow only slub on RT
Date: Fri, 3 Jul 2009 08:44:03 -0500
Disable SLAB and SLOB on -RT. Only SLUB is adopted to -RT needs.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
82/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mm: Enable SLUB for RT
Date: Thu, 25 Oct 2012 10:32:35 +0100
Make SLUB RT aware by converting locks to raw and using free lists to
move the freeing out of the lock held region.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
83/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: slub: Enable irqs for __GFP_WAIT
Date: Wed, 9 Jan 2013 12:08:15 +0100
SYSTEM_RUNNING might be too late for enabling interrupts. Allocations
with GFP_WAIT can happen before that. So use this as an indicator.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
84/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: slub: Disable SLUB_CPU_PARTIAL
Date: Wed, 15 Apr 2015 19:00:47 +0200
|BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:915
|in_atomic(): 1, irqs_disabled(): 0, pid: 87, name: rcuop/7
|1 lock held by rcuop/7/87:
| #0: (rcu_callback){......}, at: [<ffffffff8112c76a>] rcu_nocb_kthread+0x1ca/0x5d0
|Preemption disabled at:[<ffffffff811eebd9>] put_cpu_partial+0x29/0x220
|
|CPU: 0 PID: 87 Comm: rcuop/7 Tainted: G W 4.0.0-rt0+ #477
|Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
| 000000000007a9fc ffff88013987baf8 ffffffff817441c7 0000000000000007
| 0000000000000000 ffff88013987bb18 ffffffff810eee51 0000000000000000
| ffff88013fc10200 ffff88013987bb48 ffffffff8174a1c4 000000000007a9fc
|Call Trace:
| [<ffffffff817441c7>] dump_stack+0x4f/0x90
| [<ffffffff810eee51>] ___might_sleep+0x121/0x1b0
| [<ffffffff8174a1c4>] rt_spin_lock+0x24/0x60
| [<ffffffff811a689a>] __free_pages_ok+0xaa/0x540
| [<ffffffff811a729d>] __free_pages+0x1d/0x30
| [<ffffffff811eddd5>] __free_slab+0xc5/0x1e0
| [<ffffffff811edf46>] free_delayed+0x56/0x70
| [<ffffffff811eecfd>] put_cpu_partial+0x14d/0x220
| [<ffffffff811efc98>] __slab_free+0x158/0x2c0
| [<ffffffff811f0021>] kmem_cache_free+0x221/0x2d0
| [<ffffffff81204d0c>] file_free_rcu+0x2c/0x40
| [<ffffffff8112c7e3>] rcu_nocb_kthread+0x243/0x5d0
| [<ffffffff810e951c>] kthread+0xfc/0x120
| [<ffffffff8174abc8>] ret_from_fork+0x58/0x90
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
85/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mm: page_alloc: Use local_lock_on() instead of plain spinlock
Date: Thu, 27 Sep 2012 11:11:46 +0200
The plain spinlock while sufficient does not update the local_lock
internals. Use a proper local_lock function instead to ease debugging.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
86/286 [
Author: Yang Shi
Email: yang.shi@windriver.com
Subject: mm/memcontrol: Don't call schedule_work_on in preemption disabled context
Date: Wed, 30 Oct 2013 11:48:33 -0700
The following trace is triggered when running ltp oom test cases:
BUG: sleeping function called from invalid context at kernel/rtmutex.c:659
in_atomic(): 1, irqs_disabled(): 0, pid: 17188, name: oom03
Preemption disabled at:[<ffffffff8112ba70>] mem_cgroup_reclaim+0x90/0xe0
CPU: 2 PID: 17188 Comm: oom03 Not tainted 3.10.10-rt3 #2
Hardware name: Intel Corporation Calpella platform/MATXM-CORE-411-B, BIOS 4.6.3 08/18/2010
ffff88007684d730 ffff880070df9b58 ffffffff8169918d ffff880070df9b70
ffffffff8106db31 ffff88007688b4a0 ffff880070df9b88 ffffffff8169d9c0
ffff88007688b4a0 ffff880070df9bc8 ffffffff81059da1 0000000170df9bb0
Call Trace:
[<ffffffff8169918d>] dump_stack+0x19/0x1b
[<ffffffff8106db31>] __might_sleep+0xf1/0x170
[<ffffffff8169d9c0>] rt_spin_lock+0x20/0x50
[<ffffffff81059da1>] queue_work_on+0x61/0x100
[<ffffffff8112b361>] drain_all_stock+0xe1/0x1c0
[<ffffffff8112ba70>] mem_cgroup_reclaim+0x90/0xe0
[<ffffffff8112beda>] __mem_cgroup_try_charge+0x41a/0xc40
[<ffffffff810f1c91>] ? release_pages+0x1b1/0x1f0
[<ffffffff8106f200>] ? sched_exec+0x40/0xb0
[<ffffffff8112cc87>] mem_cgroup_charge_common+0x37/0x70
[<ffffffff8112e2c6>] mem_cgroup_newpage_charge+0x26/0x30
[<ffffffff8110af68>] handle_pte_fault+0x618/0x840
[<ffffffff8103ecf6>] ? unpin_current_cpu+0x16/0x70
[<ffffffff81070f94>] ? migrate_enable+0xd4/0x200
[<ffffffff8110cde5>] handle_mm_fault+0x145/0x1e0
[<ffffffff810301e1>] __do_page_fault+0x1a1/0x4c0
[<ffffffff8169c9eb>] ? preempt_schedule_irq+0x4b/0x70
[<ffffffff8169e3b7>] ? retint_kernel+0x37/0x40
[<ffffffff8103053e>] do_page_fault+0xe/0x10
[<ffffffff8169e4c2>] page_fault+0x22/0x30
So, to prevent schedule_work_on from being called in preempt disabled context,
replace the pair of get/put_cpu() to get/put_cpu_light().
Signed-off-by: Yang Shi <yang.shi@windriver.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
87/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: mm/memcontrol: Replace local_irq_disable with local locks
Date: Wed, 28 Jan 2015 17:14:16 +0100
There are a few local_irq_disable() which then take sleeping locks. This
patch converts them local locks.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
88/286 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: mm/memcontrol: mem_cgroup_migrate() - replace another local_irq_disable() w. local_lock_irq()
Date: Sun, 5 Jun 2016 08:11:13 +0200
v4.6 grew a local_irq_disable() in mm/memcontrol.c::mem_cgroup_migrate().
Convert it to use the existing local lock (event_lock) like the others.
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
89/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: mm: backing-dev: don't disable IRQs in wb_congested_put()
Date: Fri, 5 Feb 2016 12:17:14 +0100
it triggers:
|BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:930
|in_atomic(): 0, irqs_disabled(): 1, pid: 12, name: rcuc/0
|1 lock held by rcuc/0/12:
| #0: (rcu_callback){......}, at: [<ffffffff810ce1a6>] rcu_cpu_kthread+0x376/0xb10
|irq event stamp: 23636
|hardirqs last enabled at (23635): [<ffffffff8173524c>] _raw_spin_unlock_irqrestore+0x6c/0x80
|hardirqs last disabled at (23636): [<ffffffff81173918>] wb_congested_put+0x18/0x90
| [<ffffffff81735434>] rt_spin_lock+0x24/0x60
| [<ffffffff810afed2>] atomic_dec_and_spin_lock+0x52/0x90
| [<ffffffff81173928>] wb_congested_put+0x28/0x90
| [<ffffffff813b833e>] __blkg_release_rcu+0x5e/0x1e0
| [<ffffffff813b8367>] ? __blkg_release_rcu+0x87/0x1e0
| [<ffffffff813b82e0>] ? blkg_conf_finish+0x90/0x90
| [<ffffffff810ce1e7>] rcu_cpu_kthread+0x3b7/0xb10
due to cgwb_lock beeing taken with spin_lock_irqsave() usually.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
90/286 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: mm/zsmalloc: copy with get_cpu_var() and locking
Date: Tue, 22 Mar 2016 11:16:09 +0100
get_cpu_var() disables preemption and triggers a might_sleep() splat later.
This is replaced with get_locked_var().
This bitspinlocks are replaced with a proper mutex which requires a slightly
larger struct to allocate.
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
[bigeasy: replace the bitspin_lock() with a mutex, get_locked_var(). Mike then
fixed the size magic]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
91/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: radix-tree: Make RT aware
Date: Sun, 17 Jul 2011 21:33:18 +0200
Disable radix_tree_preload() on -RT. This functions returns with
preemption disabled which may cause high latencies and breaks if the
user tries to grab any locks after invoking it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
92/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: panic: skip get_random_bytes for RT_FULL in init_oops_id
Date: Tue, 14 Jul 2015 14:26:34 +0200
Disable on -RT. If this is invoked from irq-context we will have problems
to acquire the sleeping lock.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
93/286 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: timers: Prepare for full preemption
Date: Fri, 3 Jul 2009 08:29:34 -0500
When softirqs can be preempted we need to make sure that cancelling
the timer from the active thread can not deadlock vs. a running timer
callback. Add a waitqueue to resolve that.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
94/286 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: timer: delay waking softirqs from the jiffy tick
Date: Fri, 21 Aug 2009 11:56:45 +0200
People were complaining about broken balancing with the recent -rt
series.
A look at /proc/sched_debug yielded:
cpu#0, 2393.874 MHz
.nr_running : 0
.load : 0
.cpu_load[0] : 177522
.cpu_load[1] : 177522
.cpu_load[2] : 177522
.cpu_load[3] : 177522
.cpu_load[4] : 177522
cpu#1, 2393.874 MHz
.nr_running : 4
.load : 4096
.cpu_load[0] : 181618
.cpu_load[1] : 180850
.cpu_load[2] : 180274
.cpu_load[3] : 179938
.cpu_load[4] : 179758
Which indicated the cpu_load computation was hosed, the 177522 value
indicates that there is one RT task runnable. Initially I thought the
old problem of calculating the cpu_load from a softirq had re-surfaced,
however looking at the code shows its being done from scheduler_tick().
[ we really should fix this RT/cfs interaction some day... ]
A few trace_printk()s later:
sirq-timer/1-19 [001] 174.289744: 19: 50:S ==> [001] 0:140:R <idle>
<idle>-0 [001] 174.290724: enqueue_task_rt: adding task: 19/sirq-timer/1 with load: 177522
<idle>-0 [001] 174.290725: 0:140:R + [001] 19: 50:S sirq-timer/1
<idle>-0 [001] 174.290730: scheduler_tick: current load: 177522
<idle>-0 [001] 174.290732: scheduler_tick: current: 0/swapper
<idle>-0 [001] 174.290736: 0:140:R ==> [001] 19: 50:R sirq-timer/1
sirq-timer/1-19 [001] 174.290741: dequeue_task_rt: removing task: 19/sirq-timer/1 with load: 177522
sirq-timer/1-19 [001] 174.290743: 19: 50:S ==> [001] 0:140:R <idle>
We see that we always raise the timer softirq before doing the load
calculation. Avoid this by re-ordering the scheduler_tick() call in
update_process_times() to occur before we deal with timers.
This lowers the load back to sanity and restores regular load-balancing
behaviour.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
95/286 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: hrtimers: Prepare full preemption
Date: Fri, 3 Jul 2009 08:29:34 -0500
Make cancellation of a running callback in softirq context safe
against preemption.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
96/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: hrtimer: enfore 64byte alignment
Date: Wed, 23 Dec 2015 20:57:41 +0100
The patch "hrtimer: Fixup hrtimer callback changes for preempt-rt" adds
a list_head expired to struct hrtimer_clock_base and with it we run into
BUILD_BUG_ON(sizeof(struct hrtimer_clock_base) > HRTIMER_CLOCK_BASE_ALIGN);
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
97/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: hrtimer: Fixup hrtimer callback changes for preempt-rt
Date: Fri, 3 Jul 2009 08:44:31 -0500
In preempt-rt we can not call the callbacks which take sleeping locks
from the timer interrupt context.
Bring back the softirq split for now, until we fixed the signal
delivery problem for real.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
]
98/286 [
Author: Juri Lelli
Email: juri.lelli@gmail.com
Subject: sched/deadline: dl_task_timer has to be irqsafe
Date: Tue, 13 May 2014 15:30:20 +0200
As for rt_period_timer, dl_task_timer has to be irqsafe.
Signed-off-by: Juri Lelli <juri.lelli@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
99/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: timer-fd: Prevent live lock
Date: Wed, 25 Jan 2012 11:08:40 +0100
If hrtimer_try_to_cancel() requires a retry, then depending on the
priority setting te retry loop might prevent timer callback completion
on RT. Prevent that by waiting for completion on RT, no change for a
non RT kernel.
Reported-by: Sankara Muthukrishnan <sankara.m@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
100/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: tick/broadcast: Make broadcast hrtimer irqsafe
Date: Sat, 27 Feb 2016 10:47:10 +0100
Otherwise we end up with the following:
|=================================
|[ INFO: inconsistent lock state ]
|4.4.2-rt7+ #5 Not tainted
|---------------------------------
|inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
|ktimersoftd/0/4 [HC0[0]:SC0[0]:HE1:SE1] takes:
| (tick_broadcast_lock){?.....}, at: [<ffffffc000150db4>] tick_handle_oneshot_broadcast+0x58/0x27c
|{IN-HARDIRQ-W} state was registered at:
| [<ffffffc000118198>] mark_lock+0x19c/0x6a0
| [<ffffffc000119728>] __lock_acquire+0xb1c/0x2100
| [<ffffffc00011b560>] lock_acquire+0xf8/0x230
| [<ffffffc00061bf08>] _raw_spin_lock_irqsave+0x50/0x68
| [<ffffffc000152188>] tick_broadcast_switch_to_oneshot+0x20/0x60
| [<ffffffc0001529f4>] tick_switch_to_oneshot+0x64/0xd8
| [<ffffffc000152b00>] tick_init_highres+0x1c/0x24
| [<ffffffc000141e58>] hrtimer_run_queues+0x78/0x100
| [<ffffffc00013f804>] update_process_times+0x38/0x74
| [<ffffffc00014fc5c>] tick_periodic+0x60/0x140
| [<ffffffc00014fd68>] tick_handle_periodic+0x2c/0x94
| [<ffffffc00052b878>] arch_timer_handler_phys+0x3c/0x48
| [<ffffffc00012d078>] handle_percpu_devid_irq+0x100/0x390
| [<ffffffc000127f34>] generic_handle_irq+0x34/0x4c
| [<ffffffc000128300>] __handle_domain_irq+0x90/0xf8
| [<ffffffc000082554>] gic_handle_irq+0x5c/0xa4
| [<ffffffc0000855ac>] el1_irq+0x6c/0xec
| [<ffffffc000112bec>] default_idle_call+0x2c/0x44
| [<ffffffc000113058>] cpu_startup_entry+0x3cc/0x410
| [<ffffffc0006169f8>] rest_init+0x158/0x168
| [<ffffffc000888954>] start_kernel+0x3a0/0x3b4
| [<0000000080621000>] 0x80621000
|irq event stamp: 18723
|hardirqs last enabled at (18723): [<ffffffc00061c188>] _raw_spin_unlock_irq+0x38/0x80
|hardirqs last disabled at (18722): [<ffffffc000140a4c>] run_hrtimer_softirq+0x2c/0x2f4
|softirqs last enabled at (0): [<ffffffc0000c4744>] copy_process.isra.50+0x300/0x16d4
|softirqs last disabled at (0): [< (null)>] (null)
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
101/286 [
Author: John Stultz
Email: johnstul@us.ibm.com
Subject: posix-timers: Thread posix-cpu-timers on -rt
Date: Fri, 3 Jul 2009 08:29:58 -0500
posix-cpu-timer code takes non -rt safe locks in hard irq
context. Move it to a thread.
[ 3.0 fixes from Peter Zijlstra <peterz@infradead.org> ]
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
102/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Move task_struct cleanup to RCU
Date: Tue, 31 May 2011 16:59:16 +0200
__put_task_struct() does quite some expensive work. We don't want to
burden random tasks with that.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
103/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Limit the number of task migrations per batch
Date: Mon, 6 Jun 2011 12:12:51 +0200
Put an upper limit on the number of tasks which are migrated per batch
to avoid large latencies.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
104/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Move mmdrop to RCU on RT
Date: Mon, 6 Jun 2011 12:20:33 +0200
Takes sleeping locks and calls into the memory allocator, so nothing
we want to do in task switch and oder atomic contexts.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
105/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: kernel/sched: move stack + kprobe clean up to __put_task_struct()
Date: Mon, 21 Nov 2016 19:31:08 +0100
There is no need to free the stack before the task struct. This also
comes handy on -RT because we can't free memory in preempt disabled
region.
Cc: stable-rt@vger.kernel.org #for kprobe_flush_task()
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
106/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Add saved_state for tasks blocked on sleeping locks
Date: Sat, 25 Jun 2011 09:21:04 +0200
Spinlocks are state preserving in !RT. RT changes the state when a
task gets blocked on a lock. So we need to remember the state before
the lock contention. If a regular wakeup (not a RTmutex related
wakeup) happens, the saved_state is updated to running. When the lock
sleep is done, the saved state is restored.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
107/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Do not account rcu_preempt_depth on RT in might_sleep()
Date: Tue, 7 Jun 2011 09:19:06 +0200
RT changes the rcu_preempt_depth semantics, so we cannot check for it
in might_sleep().
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
108/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Take RT softirq semantics into account in cond_resched()
Date: Thu, 14 Jul 2011 09:56:44 +0200
The softirq semantics work different on -RT. There is no SOFTIRQ_MASK in
the preemption counter which leads to the BUG_ON() statement in
__cond_resched_softirq(). As for -RT it is enough to perform a "normal"
schedule.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
109/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Use the proper LOCK_OFFSET for cond_resched()
Date: Sun, 17 Jul 2011 22:51:33 +0200
RT does not increment preempt count when a 'sleeping' spinlock is
locked. Update PREEMPT_LOCK_OFFSET for that case.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
110/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Disable TTWU_QUEUE on RT
Date: Tue, 13 Sep 2011 16:42:35 +0200
The queued remote wakeup mechanism can introduce rather large
latencies if the number of migrated tasks is high. Disable it for RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
111/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Disable CONFIG_RT_GROUP_SCHED on RT
Date: Mon, 18 Jul 2011 17:03:52 +0200
Carsten reported problems when running:
taskset 01 chrt -f 1 sleep 1
from within rc.local on a F15 machine. The task stays running and
never gets on the run queue because some of the run queues have
rt_throttled=1 which does not go away. Works nice from a ssh login
shell. Disabling CONFIG_RT_GROUP_SCHED solves that as well.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
112/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: ttwu: Return success when only changing the saved_state value
Date: Tue, 13 Dec 2011 21:42:19 +0100
When a task blocks on a rt lock, it saves the current state in
p->saved_state, so a lock related wake up will not destroy the
original state.
When a real wakeup happens, while the task is running due to a lock
wakeup already, we update p->saved_state to TASK_RUNNING, but we do
not return success, which might cause another wakeup in the waitqueue
code and the task remains in the waitqueue list. Return success in
that case as well.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
113/286 [
Author: Steven Rostedt
Email: rostedt@goodmis.org
Subject: sched/workqueue: Only wake up idle workers if not blocked on sleeping spin lock
Date: Mon, 18 Mar 2013 15:12:49 -0400
In -rt, most spin_locks() turn into mutexes. One of these spin_lock
conversions is performed on the workqueue gcwq->lock. When the idle
worker is worken, the first thing it will do is grab that same lock and
it too will block, possibly jumping into the same code, but because
nr_running would already be decremented it prevents an infinite loop.
But this is still a waste of CPU cycles, and it doesn't follow the method
of mainline, as new workers should only be woken when a worker thread is
truly going to sleep, and not just blocked on a spin_lock().
Check the saved_state too before waking up new workers.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
114/286 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: stop_machine: convert stop_machine_run() to PREEMPT_RT
Date: Fri, 3 Jul 2009 08:30:27 -0500
Instead of playing with non-preemption, introduce explicit
startup serialization. This is more robust and cleaner as
well.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy: XXX: stopper_lock -> stop_cpus_lock]
]
115/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: stop_machine: Use raw spinlocks
Date: Wed, 29 Jun 2011 11:01:51 +0200
Use raw-locks in stomp_machine() to allow locking in irq-off regions.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
116/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: hotplug: Lightweight get online cpus
Date: Wed, 15 Jun 2011 12:36:06 +0200
get_online_cpus() is a heavy weight function which involves a global
mutex. migrate_disable() wants a simpler construct which prevents only
a CPU from going doing while a task is in a migrate disabled section.
Implement a per cpu lockless mechanism, which serializes only in the
real unplug case on a global mutex. That serialization affects only
tasks on the cpu which should be brought down.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
117/286 [
Author: Yong Zhang
Email: yong.zhang0@gmail.com
Subject: hotplug: sync_unplug: No "\n" in task name
Date: Sun, 16 Oct 2011 18:56:43 +0800
Otherwise the output will look a little odd.
Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Link: http://lkml.kernel.org/r/1318762607-2261-2-git-send-email-yong.zhang0@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
118/286 [
Author: Yong Zhang
Email: yong.zhang0@gmail.com
Subject: hotplug: Reread hotplug_pcp on pin_current_cpu() retry
Date: Thu, 28 Jul 2011 11:16:00 +0800
When retry happens, it's likely that the task has been migrated to
another cpu (except unplug failed), but it still derefernces the
original hotplug_pcp per cpu data.
Update the pointer to hotplug_pcp in the retry path, so it points to
the current cpu.
Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110728031600.GA338@windriver.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
119/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: trace: Add migrate-disabled counter to tracing output
Date: Sun, 17 Jul 2011 21:56:42 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
120/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: hotplug: Use migrate disable on unplug
Date: Sun, 17 Jul 2011 19:35:29 +0200
Migration needs to be disabled accross the unplug handling to make
sure that the unplug thread is off the unplugged cpu.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
121/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: lockdep: Make it RT aware
Date: Sun, 17 Jul 2011 18:51:23 +0200
teach lockdep that we don't really do softirqs on -RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
122/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: locking: Disable spin on owner for RT
Date: Sun, 17 Jul 2011 21:51:45 +0200
Drop spin on owner for mutex / rwsem. We are most likely not using it
but…
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
123/286 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: tasklet: Prevent tasklets from going into infinite spin in RT
Date: Tue, 29 Nov 2011 20:18:22 -0500
When CONFIG_PREEMPT_RT_FULL is enabled, tasklets run as threads,
and spinlocks turn are mutexes. But this can cause issues with
tasks disabling tasklets. A tasklet runs under ksoftirqd, and
if a tasklets are disabled with tasklet_disable(), the tasklet
count is increased. When a tasklet runs, it checks this counter
and if it is set, it adds itself back on the softirq queue and
returns.
The problem arises in RT because ksoftirq will see that a softirq
is ready to run (the tasklet softirq just re-armed itself), and will
not sleep, but instead run the softirqs again. The tasklet softirq
will still see that the count is non-zero and will not execute
the tasklet and requeue itself on the softirq again, which will
cause ksoftirqd to run it again and again and again.
It gets worse because ksoftirqd runs as a real-time thread.
If it preempted the task that disabled tasklets, and that task
has migration disabled, or can't run for other reasons, the tasklet
softirq will never run because the count will never be zero, and
ksoftirqd will go into an infinite loop. As an RT task, it this
becomes a big problem.
This is a hack solution to have tasklet_disable stop tasklets, and
when a tasklet runs, instead of requeueing the tasklet softirqd
it delays it. When tasklet_enable() is called, and tasklets are
waiting, then the tasklet_enable() will kick the tasklets to continue.
This prevents the lock up from ksoftirq going into an infinite loop.
[ rostedt@goodmis.org: ported to 3.0-rt ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
124/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: softirq: Check preemption after reenabling interrupts
Date: Sun, 13 Nov 2011 17:17:09 +0100
raise_softirq_irqoff() disables interrupts and wakes the softirq
daemon, but after reenabling interrupts there is no preemption check,
so the execution of the softirq thread might be delayed arbitrarily.
In principle we could add that check to local_irq_enable/restore, but
that's overkill as the rasie_softirq_irqoff() sections are the only
ones which show this behaviour.
Reported-by: Carsten Emde <cbe@osadl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
125/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: softirq: Disable softirq stacks for RT
Date: Mon, 18 Jul 2011 13:59:17 +0200
Disable extra stacks for softirqs. We want to preempt softirqs and
having them on special IRQ-stack does not make this easier.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
126/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: softirq: Split softirq locks
Date: Thu, 4 Oct 2012 14:20:47 +0100
The 3.x RT series removed the split softirq implementation in favour
of pushing softirq processing into the context of the thread which
raised it. Though this prevents us from handling the various softirqs
at different priorities. Now instead of reintroducing the split
softirq threads we split the locks which serialize the softirq
processing.
If a softirq is raised in context of a thread, then the softirq is
noted on a per thread field, if the thread is in a bh disabled
region. If the softirq is raised from hard interrupt context, then the
bit is set in the flag field of ksoftirqd and ksoftirqd is invoked.
When a thread leaves a bh disabled region, then it tries to execute
the softirqs which have been raised in its own context. It acquires
the per softirq / per cpu lock for the softirq and then checks,
whether the softirq is still pending in the per cpu
local_softirq_pending() field. If yes, it runs the softirq. If no,
then some other task executed it already. This allows for zero config
softirq elevation in the context of user space tasks or interrupt
threads.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
127/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: kernel: softirq: unlock with irqs on
Date: Tue, 9 Feb 2016 18:17:18 +0100
We unlock the lock while the interrupts are off. This isn't a problem
now but will get because the migrate_disable() + enable are not
symmetrical in regard to the status of interrupts.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
128/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: kernel: migrate_disable() do fastpath in atomic & irqs-off
Date: Tue, 9 Feb 2016 18:18:01 +0100
With interrupts off it makes no sense to do the long path since we can't
leave the CPU anyway. Also we might end up in a recursion with lockdep.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
129/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: genirq: Allow disabling of softirq processing in irq thread context
Date: Tue, 31 Jan 2012 13:01:27 +0100
The processing of softirqs in irq thread context is a performance gain
for the non-rt workloads of a system, but it's counterproductive for
interrupts which are explicitely related to the realtime
workload. Allow such interrupts to prevent softirq processing in their
thread context.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
130/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: softirq: split timer softirqs out of ksoftirqd
Date: Wed, 20 Jan 2016 16:34:17 +0100
The softirqd runs in -RT with SCHED_FIFO (prio 1) and deals mostly with
timer wakeup which can not happen in hardirq context. The prio has been
risen from the normal SCHED_OTHER so the timer wakeup does not happen
too late.
With enough networking load it is possible that the system never goes
idle and schedules ksoftirqd and everything else with a higher priority.
One of the tasks left behind is one of RCU's threads and so we see stalls
and eventually run out of memory.
This patch moves the TIMER and HRTIMER softirqs out of the `ksoftirqd`
thread into its own `ktimersoftd`. The former can now run SCHED_OTHER
(same as mainline) and the latter at SCHED_FIFO due to the wakeups.
From networking point of view: The NAPI callback runs after the network
interrupt thread completes. If its run time takes too long the NAPI code
itself schedules the `ksoftirqd`. Here in the thread it can run at
SCHED_OTHER priority and it won't defer RCU anymore.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
131/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: rtmutex: trylock is okay on -RT
Date: Wed, 2 Dec 2015 11:34:07 +0100
non-RT kernel could deadlock on rt_mutex_trylock() in softirq context. On
-RT we don't run softirqs in IRQ context but in thread context so it is
not a issue here.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
132/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: gpu: don't check for the lock owner.
Date: Tue, 14 Jul 2015 14:26:34 +0200
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
133/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: fs/nfs: turn rmdir_sem into a semaphore
Date: Thu, 15 Sep 2016 10:51:27 +0200
The RW semaphore had a reader side which used the _non_owner version
because it most likely took the reader lock in one thread and released it
in another which would cause lockdep to complain if the "regular"
version was used.
On -RT we need the owner because the rw lock is turned into a rtmutex.
The semaphores on the hand are "plain simple" and should work as
expected. We can't have multiple readers but on -RT we don't allow
multiple readers anyway so that is not a loss.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
134/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: Handle the various new futex race conditions
Date: Fri, 10 Jun 2011 11:04:15 +0200
RT opens a few new interesting race conditions in the rtmutex/futex
combo due to futex hash bucket lock being a 'sleeping' spinlock and
therefor not disabling preemption.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
135/286 [
Author: Steven Rostedt
Email: rostedt@goodmis.org
Subject: futex: Fix bug on when a requeued RT task times out
Date: Tue, 14 Jul 2015 14:26:34 +0200
Requeue with timeout causes a bug with PREEMPT_RT_FULL.
The bug comes from a timed out condition.
TASK 1 TASK 2
------ ------
futex_wait_requeue_pi()
futex_wait_queue_me()
<timed out>
double_lock_hb();
raw_spin_lock(pi_lock);
if (current->pi_blocked_on) {
} else {
current->pi_blocked_on = PI_WAKE_INPROGRESS;
run_spin_unlock(pi_lock);
spin_lock(hb->lock); <-- blocked!
plist_for_each_entry_safe(this) {
rt_mutex_start_proxy_lock();
task_blocks_on_rt_mutex();
BUG_ON(task->pi_blocked_on)!!!!
The BUG_ON() actually has a check for PI_WAKE_INPROGRESS, but the
problem is that, after TASK 1 sets PI_WAKE_INPROGRESS, it then tries to
grab the hb->lock, which it fails to do so. As the hb->lock is a mutex,
it will block and set the "pi_blocked_on" to the hb->lock.
When TASK 2 goes to requeue it, the check for PI_WAKE_INPROGESS fails
because the task1's pi_blocked_on is no longer set to that, but instead,
set to the hb->lock.
The fix:
When calling rt_mutex_start_proxy_lock() a check is made to see
if the proxy tasks pi_blocked_on is set. If so, exit out early.
Otherwise set it to a new flag PI_REQUEUE_INPROGRESS, which notifies
the proxy task that it is being requeued, and will handle things
appropriately.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
136/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: futex: Ensure lock/unlock symetry versus pi_lock and hash bucket lock
Date: Fri, 1 Mar 2013 11:17:42 +0100
In exit_pi_state_list() we have the following locking construct:
spin_lock(&hb->lock);
raw_spin_lock_irq(&curr->pi_lock);
...
spin_unlock(&hb->lock);
In !RT this works, but on RT the migrate_enable() function which is
called from spin_unlock() sees atomic context due to the held pi_lock
and just decrements the migrate_disable_atomic counter of the
task. Now the next call to migrate_disable() sees the counter being
negative and issues a warning. That check should be in
migrate_enable() already.
Fix this by dropping pi_lock before unlocking hb->lock and reaquire
pi_lock after that again. This is safe as the loop code reevaluates
head again under the pi_lock.
Reported-by: Yong Zhang <yong.zhang@windriver.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
137/286 [
Author: Grygorii Strashko
Email: Grygorii.Strashko@linaro.org
Subject: pid.h: include atomic.h
Date: Tue, 21 Jul 2015 19:43:56 +0300
This patch fixes build error:
CC kernel/pid_namespace.o
In file included from kernel/pid_namespace.c:11:0:
include/linux/pid.h: In function 'get_pid':
include/linux/pid.h:78:3: error: implicit declaration of function 'atomic_inc' [-Werror=implicit-function-declaration]
atomic_inc(&pid->count);
^
which happens when
CONFIG_PROVE_LOCKING=n
CONFIG_DEBUG_SPINLOCK=n
CONFIG_DEBUG_MUTEXES=n
CONFIG_DEBUG_LOCK_ALLOC=n
CONFIG_PID_NS=y
Vanilla gets this via spinlock.h.
Signed-off-by: Grygorii Strashko <Grygorii.Strashko@linaro.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
138/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: arm: include definition for cpumask_t
Date: Thu, 22 Dec 2016 17:28:33 +0100
This definition gets pulled in by other files. With the (later) split of
RCU and spinlock.h it won't compile anymore.
The split is done in ("rbtree: don't include the rcu header").
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
139/286 [
Author: "Wolfgang M. Reimer"
Email: linuxball@gmail.com
Subject: locking: locktorture: Do NOT include rwlock.h directly
Date: Tue, 21 Jul 2015 16:20:07 +0200
Including rwlock.h directly will cause kernel builds to fail
if CONFIG_PREEMPT_RT_FULL is defined. The correct header file
(rwlock_rt.h OR rwlock.h) will be included by spinlock.h which
is included by locktorture.c anyway.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Wolfgang M. Reimer <linuxball@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
140/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: Add rtmutex_lock_killable()
Date: Thu, 9 Jun 2011 11:43:52 +0200
Add "killable" type to rtmutex. We need this since rtmutex are used as
"normal" mutexes which do use this type.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
141/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: spinlock: Split the lock types header
Date: Wed, 29 Jun 2011 19:34:01 +0200
Split raw_spinlock into its own file and the remaining spinlock_t into
its own non-RT header. The non-RT header will be replaced later by sleeping
spinlocks.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
142/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: Avoid include hell
Date: Wed, 29 Jun 2011 20:06:39 +0200
Include only the required raw types. This avoids pulling in the
complete spinlock header which in turn requires rtmutex.h at some point.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
143/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: rbtree: don't include the rcu header
Date: Sun, 25 Dec 2016 21:28:33 -0500
The RCU header pulls in spinlock.h and fails due not yet defined types:
|In file included from include/linux/spinlock.h:275:0,
| from include/linux/rcupdate.h:38,
| from include/linux/rbtree.h:34,
| from include/linux/rtmutex.h:17,
| from include/linux/spinlock_types.h:18,
| from kernel/bounds.c:13:
|include/linux/rwlock_rt.h:16:38: error: unknown type name ‘rwlock_t’
| extern void __lockfunc rt_write_lock(rwlock_t *rwlock);
| ^
This patch moves the required RCU function from the rcupdate.h header file into
a new header file which can be included by both users.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
144/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rt: Add the preempt-rt lock replacement APIs
Date: Sun, 26 Jul 2009 19:39:56 +0200
Map spinlocks, rwlocks, rw_semaphores and semaphores to the rt_mutex
based locking functions for preempt-rt.
This also introduces RT's sleeping locks.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
145/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: kernel/futex: don't deboost too early
Date: Thu, 29 Sep 2016 18:49:22 +0200
The sequence:
T1 holds futex
T2 blocks on futex and boosts T1
T1 unlocks futex and holds hb->lock
T1 unlocks rt mutex, so T1 has no more pi waiters
T3 blocks on hb->lock and adds itself to the pi waiters list of T1
T1 unlocks hb->lock and deboosts itself
T4 preempts T1 so the wakeup of T2 gets delayed
As a workaround I attempt here do unlock the hb->lock without a deboost
and perform the deboost after the wake up of the waiter.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
146/286 [
Author: Sebastian Andrzej Siewior
Email: sebastian@breakpoint.cc
Subject: rtmutex: Add RT aware ww locks
Date: Mon, 28 Oct 2013 09:36:37 +0100
lockdep says:
| --------------------------------------------------------------------------
| | Wound/wait tests |
| ---------------------
| ww api failures: ok | ok | ok |
| ww contexts mixing: ok | ok |
| finishing ww context: ok | ok | ok | ok |
| locking mismatches: ok | ok | ok |
| EDEADLK handling: ok | ok | ok | ok | ok | ok | ok | ok | ok | ok |
| spinlock nest unlocked: ok |
| -----------------------------------------------------
| |block | try |context|
| -----------------------------------------------------
| context: ok | ok | ok |
| try: ok | ok | ok |
| block: ok | ok | ok |
| spinlock: ok | ok | ok |
Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
]
147/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: ptrace: fix ptrace vs tasklist_lock race
Date: Thu, 29 Aug 2013 18:21:04 +0200
As explained by Alexander Fyodorov <halcy@yandex.ru>:
|read_lock(&tasklist_lock) in ptrace_stop() is converted to mutex on RT kernel,
|and it can remove __TASK_TRACED from task->state (by moving it to
|task->saved_state). If parent does wait() on child followed by a sys_ptrace
|call, the following race can happen:
|
|- child sets __TASK_TRACED in ptrace_stop()
|- parent does wait() which eventually calls wait_task_stopped() and returns
| child's pid
|- child blocks on read_lock(&tasklist_lock) in ptrace_stop() and moves
| __TASK_TRACED flag to saved_state
|- parent calls sys_ptrace, which calls ptrace_check_attach() and wait_task_inactive()
The patch is based on his initial patch where an additional check is
added in case the __TASK_TRACED moved to ->saved_state. The pi_lock is
taken in case the caller is interrupted between looking into ->state and
->saved_state.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
148/286 [
Author: Peter Zijlstra
Email: a.p.zijlstra@chello.nl
Subject: rcu: Frob softirq test
Date: Sat, 13 Aug 2011 00:23:17 +0200
With RT_FULL we get the below wreckage:
[ 126.060484] =======================================================
[ 126.060486] [ INFO: possible circular locking dependency detected ]
[ 126.060489] 3.0.1-rt10+ #30
[ 126.060490] -------------------------------------------------------
[ 126.060492] irq/24-eth0/1235 is trying to acquire lock:
[ 126.060495] (&(lock)->wait_lock#2){+.+...}, at: [<ffffffff81501c81>] rt_mutex_slowunlock+0x16/0x55
[ 126.060503]
[ 126.060504] but task is already holding lock:
[ 126.060506] (&p->pi_lock){-...-.}, at: [<ffffffff81074fdc>] try_to_wake_up+0x35/0x429
[ 126.060511]
[ 126.060511] which lock already depends on the new lock.
[ 126.060513]
[ 126.060514]
[ 126.060514] the existing dependency chain (in reverse order) is:
[ 126.060516]
[ 126.060516] -> #1 (&p->pi_lock){-...-.}:
[ 126.060519] [<ffffffff810afe9e>] lock_acquire+0x145/0x18a
[ 126.060524] [<ffffffff8150291e>] _raw_spin_lock_irqsave+0x4b/0x85
[ 126.060527] [<ffffffff810b5aa4>] task_blocks_on_rt_mutex+0x36/0x20f
[ 126.060531] [<ffffffff815019bb>] rt_mutex_slowlock+0xd1/0x15a
[ 126.060534] [<ffffffff81501ae3>] rt_mutex_lock+0x2d/0x2f
[ 126.060537] [<ffffffff810d9020>] rcu_boost+0xad/0xde
[ 126.060541] [<ffffffff810d90ce>] rcu_boost_kthread+0x7d/0x9b
[ 126.060544] [<ffffffff8109a760>] kthread+0x99/0xa1
[ 126.060547] [<ffffffff81509b14>] kernel_thread_helper+0x4/0x10
[ 126.060551]
[ 126.060552] -> #0 (&(lock)->wait_lock#2){+.+...}:
[ 126.060555] [<ffffffff810af1b8>] __lock_acquire+0x1157/0x1816
[ 126.060558] [<ffffffff810afe9e>] lock_acquire+0x145/0x18a
[ 126.060561] [<ffffffff8150279e>] _raw_spin_lock+0x40/0x73
[ 126.060564] [<ffffffff81501c81>] rt_mutex_slowunlock+0x16/0x55
[ 126.060566] [<ffffffff81501ce7>] rt_mutex_unlock+0x27/0x29
[ 126.060569] [<ffffffff810d9f86>] rcu_read_unlock_special+0x17e/0x1c4
[ 126.060573] [<ffffffff810da014>] __rcu_read_unlock+0x48/0x89
[ 126.060576] [<ffffffff8106847a>] select_task_rq_rt+0xc7/0xd5
[ 126.060580] [<ffffffff8107511c>] try_to_wake_up+0x175/0x429
[ 126.060583] [<ffffffff81075425>] wake_up_process+0x15/0x17
[ 126.060585] [<ffffffff81080a51>] wakeup_softirqd+0x24/0x26
[ 126.060590] [<ffffffff81081df9>] irq_exit+0x49/0x55
[ 126.060593] [<ffffffff8150a3bd>] smp_apic_timer_interrupt+0x8a/0x98
[ 126.060597] [<ffffffff81509793>] apic_timer_interrupt+0x13/0x20
[ 126.060600] [<ffffffff810d5952>] irq_forced_thread_fn+0x1b/0x44
[ 126.060603] [<ffffffff810d582c>] irq_thread+0xde/0x1af
[ 126.060606] [<ffffffff8109a760>] kthread+0x99/0xa1
[ 126.060608] [<ffffffff81509b14>] kernel_thread_helper+0x4/0x10
[ 126.060611]
[ 126.060612] other info that might help us debug this:
[ 126.060614]
[ 126.060615] Possible unsafe locking scenario:
[ 126.060616]
[ 126.060617] CPU0 CPU1
[ 126.060619] ---- ----
[ 126.060620] lock(&p->pi_lock);
[ 126.060623] lock(&(lock)->wait_lock);
[ 126.060625] lock(&p->pi_lock);
[ 126.060627] lock(&(lock)->wait_lock);
[ 126.060629]
[ 126.060629] *** DEADLOCK ***
[ 126.060630]
[ 126.060632] 1 lock held by irq/24-eth0/1235:
[ 126.060633] #0: (&p->pi_lock){-...-.}, at: [<ffffffff81074fdc>] try_to_wake_up+0x35/0x429
[ 126.060638]
[ 126.060638] stack backtrace:
[ 126.060641] Pid: 1235, comm: irq/24-eth0 Not tainted 3.0.1-rt10+ #30
[ 126.060643] Call Trace:
[ 126.060644] <IRQ> [<ffffffff810acbde>] print_circular_bug+0x289/0x29a
[ 126.060651] [<ffffffff810af1b8>] __lock_acquire+0x1157/0x1816
[ 126.060655] [<ffffffff810ab3aa>] ? trace_hardirqs_off_caller+0x1f/0x99
[ 126.060658] [<ffffffff81501c81>] ? rt_mutex_slowunlock+0x16/0x55
[ 126.060661] [<ffffffff810afe9e>] lock_acquire+0x145/0x18a
[ 126.060664] [<ffffffff81501c81>] ? rt_mutex_slowunlock+0x16/0x55
[ 126.060668] [<ffffffff8150279e>] _raw_spin_lock+0x40/0x73
[ 126.060671] [<ffffffff81501c81>] ? rt_mutex_slowunlock+0x16/0x55
[ 126.060674] [<ffffffff810d9655>] ? rcu_report_qs_rsp+0x87/0x8c
[ 126.060677] [<ffffffff81501c81>] rt_mutex_slowunlock+0x16/0x55
[ 126.060680] [<ffffffff810d9ea3>] ? rcu_read_unlock_special+0x9b/0x1c4
[ 126.060683] [<ffffffff81501ce7>] rt_mutex_unlock+0x27/0x29
[ 126.060687] [<ffffffff810d9f86>] rcu_read_unlock_special+0x17e/0x1c4
[ 126.060690] [<ffffffff810da014>] __rcu_read_unlock+0x48/0x89
[ 126.060693] [<ffffffff8106847a>] select_task_rq_rt+0xc7/0xd5
[ 126.060696] [<ffffffff810683da>] ? select_task_rq_rt+0x27/0xd5
[ 126.060701] [<ffffffff810a852a>] ? clockevents_program_event+0x8e/0x90
[ 126.060704] [<ffffffff8107511c>] try_to_wake_up+0x175/0x429
[ 126.060708] [<ffffffff810a95dc>] ? tick_program_event+0x1f/0x21
[ 126.060711] [<ffffffff81075425>] wake_up_process+0x15/0x17
[ 126.060715] [<ffffffff81080a51>] wakeup_softirqd+0x24/0x26
[ 126.060718] [<ffffffff81081df9>] irq_exit+0x49/0x55
[ 126.060721] [<ffffffff8150a3bd>] smp_apic_timer_interrupt+0x8a/0x98
[ 126.060724] [<ffffffff81509793>] apic_timer_interrupt+0x13/0x20
[ 126.060726] <EOI> [<ffffffff81072855>] ? migrate_disable+0x75/0x12d
[ 126.060733] [<ffffffff81080a61>] ? local_bh_disable+0xe/0x1f
[ 126.060736] [<ffffffff81080a70>] ? local_bh_disable+0x1d/0x1f
[ 126.060739] [<ffffffff810d5952>] irq_forced_thread_fn+0x1b/0x44
[ 126.060742] [<ffffffff81502ac0>] ? _raw_spin_unlock_irq+0x3b/0x59
[ 126.060745] [<ffffffff810d582c>] irq_thread+0xde/0x1af
[ 126.060748] [<ffffffff810d5937>] ? irq_thread_fn+0x3a/0x3a
[ 126.060751] [<ffffffff810d574e>] ? irq_finalize_oneshot+0xd1/0xd1
[ 126.060754] [<ffffffff810d574e>] ? irq_finalize_oneshot+0xd1/0xd1
[ 126.060757] [<ffffffff8109a760>] kthread+0x99/0xa1
[ 126.060761] [<ffffffff81509b14>] kernel_thread_helper+0x4/0x10
[ 126.060764] [<ffffffff81069ed7>] ? finish_task_switch+0x87/0x10a
[ 126.060768] [<ffffffff81502ec4>] ? retint_restore_args+0xe/0xe
[ 126.060771] [<ffffffff8109a6c7>] ? __init_kthread_worker+0x8c/0x8c
[ 126.060774] [<ffffffff81509b10>] ? gs_change+0xb/0xb
Because irq_exit() does:
void irq_exit(void)
{
account_system_vtime(current);
trace_hardirq_exit();
sub_preempt_count(IRQ_EXIT_OFFSET);
if (!in_interrupt() && local_softirq_pending())
invoke_softirq();
...
}
Which triggers a wakeup, which uses RCU, now if the interrupted task has
t->rcu_read_unlock_special set, the rcu usage from the wakeup will end
up in rcu_read_unlock_special(). rcu_read_unlock_special() will test
for in_irq(), which will fail as we just decremented preempt_count
with IRQ_EXIT_OFFSET, and in_sering_softirq(), which for
PREEMPT_RT_FULL reads:
int in_serving_softirq(void)
{
int res;
preempt_disable();
res = __get_cpu_var(local_softirq_runner) == current;
preempt_enable();
return res;
}
Which will thus also fail, resulting in the above wreckage.
The 'somewhat' ugly solution is to open-code the preempt_count() test
in rcu_read_unlock_special().
Also, we're not at all sure how ->rcu_read_unlock_special gets set
here... so this is very likely a bandaid and more thought is required.
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
]
149/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rcu: Merge RCU-bh into RCU-preempt
Date: Wed, 5 Oct 2011 11:59:38 -0700
The Linux kernel has long RCU-bh read-side critical sections that
intolerably increase scheduling latency under mainline's RCU-bh rules,
which include RCU-bh read-side critical sections being non-preemptible.
This patch therefore arranges for RCU-bh to be implemented in terms of
RCU-preempt for CONFIG_PREEMPT_RT_FULL=y.
This has the downside of defeating the purpose of RCU-bh, namely,
handling the case where the system is subjected to a network-based
denial-of-service attack that keeps at least one CPU doing full-time
softirq processing. This issue will be fixed by a later commit.
The current commit will need some work to make it appropriate for
mainline use, for example, it needs to be extended to cover Tiny RCU.
[ paulmck: Added a useful changelog ]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20111005185938.GA20403@linux.vnet.ibm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
150/286 [
Author: "Paul E. McKenney"
Email: paulmck@linux.vnet.ibm.com
Subject: rcu: Make ksoftirqd do RCU quiescent states
Date: Wed, 5 Oct 2011 11:45:18 -0700
Implementing RCU-bh in terms of RCU-preempt makes the system vulnerable
to network-based denial-of-service attacks. This patch therefore
makes __do_softirq() invoke rcu_bh_qs(), but only when __do_softirq()
is running in ksoftirqd context. A wrapper layer in interposed so that
other calls to __do_softirq() avoid invoking rcu_bh_qs(). The underlying
function __do_softirq_common() does the actual work.
The reason that rcu_bh_qs() is bad in these non-ksoftirqd contexts is
that there might be a local_bh_enable() inside an RCU-preempt read-side
critical section. This local_bh_enable() can invoke __do_softirq()
directly, so if __do_softirq() were to invoke rcu_bh_qs() (which just
calls rcu_preempt_qs() in the PREEMPT_RT_FULL case), there would be
an illegal RCU-preempt quiescent state in the middle of an RCU-preempt
read-side critical section. Therefore, quiescent states can only happen
in cases where __do_softirq() is invoked directly from ksoftirqd.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20111005184518.GA21601@linux.vnet.ibm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
151/286 [
Author: Tiejun Chen
Email: tiejun.chen@windriver.com
Subject: rcutree/rcu_bh_qs: Disable irq while calling rcu_preempt_qs()
Date: Wed, 18 Dec 2013 17:51:49 +0800
Any callers to the function rcu_preempt_qs() must disable irqs in
order to protect the assignment to ->rcu_read_unlock_special. In
RT case, rcu_bh_qs() as the wrapper of rcu_preempt_qs() is called
in some scenarios where irq is enabled, like this path,
do_single_softirq()
|
+ local_irq_enable();
+ handle_softirq()
| |
| + rcu_bh_qs()
| |
| + rcu_preempt_qs()
|
+ local_irq_disable()
So here we'd better disable irq directly inside of rcu_bh_qs() to
fix this, otherwise the kernel may be freezable sometimes as
observed. And especially this way is also kind and safe for the
potential rcu_bh_qs() usage elsewhere in the future.
Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
Signed-off-by: Bin Jiang <bin.jiang@windriver.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
152/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: tty/serial/omap: Make the locking RT aware
Date: Thu, 28 Jul 2011 13:32:57 +0200
The lock is a sleeping lock and local_irq_save() is not the
optimsation we are looking for. Redo it to make it work on -RT and
non-RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
153/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: tty/serial/pl011: Make the locking work on RT
Date: Tue, 8 Jan 2013 21:36:51 +0100
The lock is a sleeping lock and local_irq_save() is not the optimsation
we are looking for. Redo it to make it work on -RT and non-RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
154/286 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: rt: Improve the serial console PASS_LIMIT
Date: Wed, 14 Dec 2011 13:05:54 +0100
Beyond the warning:
drivers/tty/serial/8250/8250.c:1613:6: warning: unused variable ‘pass_counter’ [-Wunused-variable]
the solution of just looping infinitely was ugly - up it to 1 million to
give it a chance to continue in some really ugly situation.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
155/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: tty: serial: 8250: don't take the trylock during oops
Date: Mon, 11 Apr 2016 16:55:02 +0200
An oops with irqs off (panic() from irqsafe hrtimer like the watchdog
timer) will lead to a lockdep warning on each invocation and as such
never completes.
Therefore we skip the trylock in the oops case.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
156/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: wait.h: include atomic.h
Date: Mon, 28 Oct 2013 12:19:57 +0100
| CC init/main.o
|In file included from include/linux/mmzone.h:9:0,
| from include/linux/gfp.h:4,
| from include/linux/kmod.h:22,
| from include/linux/module.h:13,
| from init/main.c:15:
|include/linux/wait.h: In function ‘wait_on_atomic_t’:
|include/linux/wait.h:982:2: error: implicit declaration of function ‘atomic_read’ [-Werror=implicit-function-declaration]
| if (atomic_read(val) == 0)
| ^
This pops up on ARM. Non-RT gets its atomic.h include from spinlock.h
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
157/286 [
Author: Daniel Wagner
Email: daniel.wagner@bmw-carit.de
Subject: work-simple: Simple work queue implemenation
Date: Fri, 11 Jul 2014 15:26:11 +0200
Provides a framework for enqueuing callbacks from irq context
PREEMPT_RT_FULL safe. The callbacks are executed in kthread context.
Bases on wait-simple.
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
]
158/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: completion: Use simple wait queues
Date: Fri, 11 Jan 2013 11:23:51 +0100
Completions have no long lasting callbacks and therefor do not need
the complex waitqueue variant. Use simple waitqueues which reduces the
contention on the waitqueue lock.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
159/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: fs/aio: simple simple work
Date: Mon, 16 Feb 2015 18:49:10 +0100
|BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:768
|in_atomic(): 1, irqs_disabled(): 0, pid: 26, name: rcuos/2
|2 locks held by rcuos/2/26:
| #0: (rcu_callback){.+.+..}, at: [<ffffffff810b1a12>] rcu_nocb_kthread+0x1e2/0x380
| #1: (rcu_read_lock_sched){.+.+..}, at: [<ffffffff812acd26>] percpu_ref_kill_rcu+0xa6/0x1c0
|Preemption disabled at:[<ffffffff810b1a93>] rcu_nocb_kthread+0x263/0x380
|Call Trace:
| [<ffffffff81582e9e>] dump_stack+0x4e/0x9c
| [<ffffffff81077aeb>] __might_sleep+0xfb/0x170
| [<ffffffff81589304>] rt_spin_lock+0x24/0x70
| [<ffffffff811c5790>] free_ioctx_users+0x30/0x130
| [<ffffffff812ace34>] percpu_ref_kill_rcu+0x1b4/0x1c0
| [<ffffffff810b1a93>] rcu_nocb_kthread+0x263/0x380
| [<ffffffff8106e046>] kthread+0xd6/0xf0
| [<ffffffff81591eec>] ret_from_fork+0x7c/0xb0
replace this preempt_disable() friendly swork.
Reported-By: Mike Galbraith <umgwanakikbuti@gmail.com>
Suggested-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
160/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: genirq: Do not invoke the affinity callback via a workqueue on RT
Date: Wed, 21 Aug 2013 17:48:46 +0200
Joe Korty reported, that __irq_set_affinity_locked() schedules a
workqueue while holding a rawlock which results in a might_sleep()
warning.
This patch uses swork_queue() instead.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
161/286 [
Author: Yang Shi
Email: yang.shi@windriver.com
Subject: hrtimer: Move schedule_work call to helper thread
Date: Mon, 16 Sep 2013 14:09:19 -0700
When run ltp leapsec_timer test, the following call trace is caught:
BUG: sleeping function called from invalid context at kernel/rtmutex.c:659
in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1
Preemption disabled at:[<ffffffff810857f3>] cpu_startup_entry+0x133/0x310
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.10.10-rt3 #2
Hardware name: Intel Corporation Calpella platform/MATXM-CORE-411-B, BIOS 4.6.3 08/18/2010
ffffffff81c2f800 ffff880076843e40 ffffffff8169918d ffff880076843e58
ffffffff8106db31 ffff88007684b4a0 ffff880076843e70 ffffffff8169d9c0
ffff88007684b4a0 ffff880076843eb0 ffffffff81059da1 0000001876851200
Call Trace:
<IRQ> [<ffffffff8169918d>] dump_stack+0x19/0x1b
[<ffffffff8106db31>] __might_sleep+0xf1/0x170
[<ffffffff8169d9c0>] rt_spin_lock+0x20/0x50
[<ffffffff81059da1>] queue_work_on+0x61/0x100
[<ffffffff81065aa1>] clock_was_set_delayed+0x21/0x30
[<ffffffff810883be>] do_timer+0x40e/0x660
[<ffffffff8108f487>] tick_do_update_jiffies64+0xf7/0x140
[<ffffffff8108fe42>] tick_check_idle+0x92/0xc0
[<ffffffff81044327>] irq_enter+0x57/0x70
[<ffffffff816a040e>] smp_apic_timer_interrupt+0x3e/0x9b
[<ffffffff8169f80a>] apic_timer_interrupt+0x6a/0x70
<EOI> [<ffffffff8155ea1c>] ? cpuidle_enter_state+0x4c/0xc0
[<ffffffff8155eb68>] cpuidle_idle_call+0xd8/0x2d0
[<ffffffff8100b59e>] arch_cpu_idle+0xe/0x30
[<ffffffff8108585e>] cpu_startup_entry+0x19e/0x310
[<ffffffff8168efa2>] start_secondary+0x1ad/0x1b0
The clock_was_set_delayed is called in hard IRQ handler (timer interrupt), which
calls schedule_work.
Under PREEMPT_RT_FULL, schedule_work calls spinlocks which could sleep, so it's
not safe to call schedule_work in interrupt context.
Reference upstream commit b68d61c705ef02384c0538b8d9374545097899ca
(rt,ntp: Move call to schedule_delayed_work() to helper thread)
from git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git, which
makes a similar change.
Signed-off-by: Yang Shi <yang.shi@windriver.com>
[bigeasy: use swork_queue() instead a helper thread]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
162/286 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: locking/percpu-rwsem: Remove preempt_disable variants
Date: Wed, 23 Nov 2016 16:29:32 +0100
Effective revert commit:
87709e28dc7c ("fs/locks: Use percpu_down_read_preempt_disable()")
This is causing major pain for PREEMPT_RT and is only a very small
performance issue for PREEMPT=y.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
]
163/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: fs: namespace preemption fix
Date: Sun, 19 Jul 2009 08:44:27 -0500
On RT we cannot loop with preemption disabled here as
mnt_make_readonly() might have been preempted. We can safely enable
preemption while waiting for MNT_WRITE_HOLD to be cleared. Safe on !RT
as well.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
164/286 [
Author: Yong Zhang
Email: yong.zhang0@gmail.com
Subject: mm: Protect activate_mm() by preempt_[disable&enable]_rt()
Date: Tue, 15 May 2012 13:53:56 +0800
User preempt_*_rt instead of local_irq_*_rt or otherwise there will be
warning on ARM like below:
WARNING: at build/linux/kernel/smp.c:459 smp_call_function_many+0x98/0x264()
Modules linked in:
[<c0013bb4>] (unwind_backtrace+0x0/0xe4) from [<c001be94>] (warn_slowpath_common+0x4c/0x64)
[<c001be94>] (warn_slowpath_common+0x4c/0x64) from [<c001bec4>] (warn_slowpath_null+0x18/0x1c)
[<c001bec4>] (warn_slowpath_null+0x18/0x1c) from [<c0053ff8>](smp_call_function_many+0x98/0x264)
[<c0053ff8>] (smp_call_function_many+0x98/0x264) from [<c0054364>] (smp_call_function+0x44/0x6c)
[<c0054364>] (smp_call_function+0x44/0x6c) from [<c0017d50>] (__new_context+0xbc/0x124)
[<c0017d50>] (__new_context+0xbc/0x124) from [<c009e49c>] (flush_old_exec+0x460/0x5e4)
[<c009e49c>] (flush_old_exec+0x460/0x5e4) from [<c00d61ac>] (load_elf_binary+0x2e0/0x11ac)
[<c00d61ac>] (load_elf_binary+0x2e0/0x11ac) from [<c009d060>] (search_binary_handler+0x94/0x2a4)
[<c009d060>] (search_binary_handler+0x94/0x2a4) from [<c009e8fc>] (do_execve+0x254/0x364)
[<c009e8fc>] (do_execve+0x254/0x364) from [<c0010e84>] (sys_execve+0x34/0x54)
[<c0010e84>] (sys_execve+0x34/0x54) from [<c000da00>] (ret_fast_syscall+0x0/0x30)
---[ end trace 0000000000000002 ]---
The reason is that ARM need irq enabled when doing activate_mm().
According to mm-protect-activate-switch-mm.patch, actually
preempt_[disable|enable]_rt() is sufficient.
Inspired-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1337061236-1766-1-git-send-email-yong.zhang0@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
165/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: block: Turn off warning which is bogus on RT
Date: Tue, 14 Jun 2011 17:05:09 +0200
On -RT the context is always with IRQs enabled. Ignore this warning on -RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
166/286 [
Author: Mike Galbraith
Email: efault@gmx.de
Subject: fs: ntfs: disable interrupt only on !RT
Date: Fri, 3 Jul 2009 08:44:12 -0500
On Sat, 2007-10-27 at 11:44 +0200, Ingo Molnar wrote:
> * Nick Piggin <nickpiggin@yahoo.com.au> wrote:
>
> > > [10138.175796] [<c0105de3>] show_trace+0x12/0x14
> > > [10138.180291] [<c0105dfb>] dump_stack+0x16/0x18
> > > [10138.184769] [<c011609f>] native_smp_call_function_mask+0x138/0x13d
> > > [10138.191117] [<c0117606>] smp_call_function+0x1e/0x24
> > > [10138.196210] [<c012f85c>] on_each_cpu+0x25/0x50
> > > [10138.200807] [<c0115c74>] flush_tlb_all+0x1e/0x20
> > > [10138.205553] [<c016caaf>] kmap_high+0x1b6/0x417
> > > [10138.210118] [<c011ec88>] kmap+0x4d/0x4f
> > > [10138.214102] [<c026a9d8>] ntfs_end_buffer_async_read+0x228/0x2f9
> > > [10138.220163] [<c01a0e9e>] end_bio_bh_io_sync+0x26/0x3f
> > > [10138.225352] [<c01a2b09>] bio_endio+0x42/0x6d
> > > [10138.229769] [<c02c2a08>] __end_that_request_first+0x115/0x4ac
> > > [10138.235682] [<c02c2da7>] end_that_request_chunk+0x8/0xa
> > > [10138.241052] [<c0365943>] ide_end_request+0x55/0x10a
> > > [10138.246058] [<c036dae3>] ide_dma_intr+0x6f/0xac
> > > [10138.250727] [<c0366d83>] ide_intr+0x93/0x1e0
> > > [10138.255125] [<c015afb4>] handle_IRQ_event+0x5c/0xc9
> >
> > Looks like ntfs is kmap()ing from interrupt context. Should be using
> > kmap_atomic instead, I think.
>
> it's not atomic interrupt context but irq thread context - and -rt
> remaps kmap_atomic() to kmap() internally.
Hm. Looking at the change to mm/bounce.c, perhaps I should do this
instead?
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
167/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: fs: jbd2: pull your plug when waiting for space
Date: Mon, 17 Feb 2014 17:30:03 +0100
Two cps in parallel managed to stall the the ext4 fs. It seems that
journal code is either waiting for locks or sleeping waiting for
something to happen. This seems similar to what Mike observed on ext3,
here is his description:
|With an -rt kernel, and a heavy sync IO load, tasks can jam
|up on journal locks without unplugging, which can lead to
|terminal IO starvation. Unplug and schedule when waiting
|for space.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
168/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: x86: Convert mce timer to hrtimer
Date: Mon, 13 Dec 2010 16:33:39 +0100
mce_timer is started in atomic contexts of cpu bringup. This results
in might_sleep() warnings on RT. Convert mce_timer to a hrtimer to
avoid this.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
fold in:
|From: Mike Galbraith <bitbucket@online.de>
|Date: Wed, 29 May 2013 13:52:13 +0200
|Subject: [PATCH] x86/mce: fix mce timer interval
|
|Seems mce timer fire at the wrong frequency in -rt kernels since roughly
|forever due to 32 bit overflow. 3.8-rt is also missing a multiplier.
|
|Add missing us -> ns conversion and 32 bit overflow prevention.
|
|Signed-off-by: Mike Galbraith <bitbucket@online.de>
|[bigeasy: use ULL instead of u64 cast]
|Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
169/286 [
Author: Steven Rostedt
Email: rostedt@goodmis.org
Subject: x86/mce: use swait queue for mce wakeups
Date: Fri, 27 Feb 2015 15:20:37 +0100
We had a customer report a lockup on a 3.0-rt kernel that had the
following backtrace:
[ffff88107fca3e80] rt_spin_lock_slowlock at ffffffff81499113
[ffff88107fca3f40] rt_spin_lock at ffffffff81499a56
[ffff88107fca3f50] __wake_up at ffffffff81043379
[ffff88107fca3f80] mce_notify_irq at ffffffff81017328
[ffff88107fca3f90] intel_threshold_interrupt at ffffffff81019508
[ffff88107fca3fa0] smp_threshold_interrupt at ffffffff81019fc1
[ffff88107fca3fb0] threshold_interrupt at ffffffff814a1853
It actually bugged because the lock was taken by the same owner that
already had that lock. What happened was the thread that was setting
itself on a wait queue had the lock when an MCE triggered. The MCE
interrupt does a wake up on its wait list and grabs the same lock.
NOTE: THIS IS NOT A BUG ON MAINLINE
Sorry for yelling, but as I Cc'd mainline maintainers I want them to
know that this is an PREEMPT_RT bug only. I only Cc'd them for advice.
On PREEMPT_RT the wait queue locks are converted from normal
"spin_locks" into an rt_mutex (see the rt_spin_lock_slowlock above).
These are not to be taken by hard interrupt context. This usually isn't
a problem as most all interrupts in PREEMPT_RT are converted into
schedulable threads. Unfortunately that's not the case with the MCE irq.
As wait queue locks are notorious for long hold times, we can not
convert them to raw_spin_locks without causing issues with -rt. But
Thomas has created a "simple-wait" structure that uses raw spin locks
which may have been a good fit.
Unfortunately, wait queues are not the only issue, as the mce_notify_irq
also does a schedule_work(), which grabs the workqueue spin locks that
have the exact same issue.
Thus, this patch I'm proposing is to move the actual work of the MCE
interrupt into a helper thread that gets woken up on the MCE interrupt
and does the work in a schedulable context.
NOTE: THIS PATCH ONLY CHANGES THE BEHAVIOR WHEN PREEMPT_RT IS SET
Oops, sorry for yelling again, but I want to stress that I keep the same
behavior of mainline when PREEMPT_RT is not set. Thus, this only changes
the MCE behavior when PREEMPT_RT is configured.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
[bigeasy@linutronix: make mce_notify_work() a proper prototype, use
kthread_run()]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
[wagi: use work-simple framework to defer work to a kthread]
Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
]
170/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: x86: stackprotector: Avoid random pool on rt
Date: Thu, 16 Dec 2010 14:25:18 +0100
CPU bringup calls into the random pool to initialize the stack
canary. During boot that works nicely even on RT as the might sleep
checks are disabled. During CPU hotplug the might sleep checks
trigger. Making the locks in random raw is a major PITA, so avoid the
call on RT is the only sensible solution. This is basically the same
randomness which we get during boot where the random pool has no
entropy and we rely on the TSC randomnness.
Reported-by: Carsten Emde <carsten.emde@osadl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
171/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: x86: Use generic rwsem_spinlocks on -rt
Date: Sun, 26 Jul 2009 02:21:32 +0200
Simplifies the separation of anon_rw_semaphores and rw_semaphores for
-rt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
172/286 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: x86: UV: raw_spinlock conversion
Date: Sun, 2 Nov 2014 08:31:37 +0100
Shrug. Lots of hobbyists have a beast in their basement, right?
Signed-off-by: Mike Galbraith <mgalbraith@suse.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
173/286 [
Author: Daniel Wagner
Email: wagi@monom.org
Subject: thermal: Defer thermal wakups to threads
Date: Tue, 17 Feb 2015 09:37:44 +0100
On RT the spin lock in pkg_temp_thermal_platfrom_thermal_notify will
call schedule while we run in irq context.
[<ffffffff816850ac>] dump_stack+0x4e/0x8f
[<ffffffff81680f7d>] __schedule_bug+0xa6/0xb4
[<ffffffff816896b4>] __schedule+0x5b4/0x700
[<ffffffff8168982a>] schedule+0x2a/0x90
[<ffffffff8168a8b5>] rt_spin_lock_slowlock+0xe5/0x2d0
[<ffffffff8168afd5>] rt_spin_lock+0x25/0x30
[<ffffffffa03a7b75>] pkg_temp_thermal_platform_thermal_notify+0x45/0x134 [x86_pkg_temp_thermal]
[<ffffffff8103d4db>] ? therm_throt_process+0x1b/0x160
[<ffffffff8103d831>] intel_thermal_interrupt+0x211/0x250
[<ffffffff8103d8c1>] smp_thermal_interrupt+0x21/0x40
[<ffffffff8169415d>] thermal_interrupt+0x6d/0x80
Let's defer the work to a kthread.
Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
[bigeasy: reoder init/denit position. TODO: flush swork on exit]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
174/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: fs/epoll: Do not disable preemption on RT
Date: Fri, 8 Jul 2011 16:35:35 +0200
ep_call_nested() takes a sleeping lock so we can't disable preemption.
The light version is enough since ep_call_nested() doesn't mind beeing
invoked twice on the same CPU.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
175/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mm/vmalloc: Another preempt disable region which sucks
Date: Tue, 12 Jul 2011 11:39:36 +0200
Avoid the preempt disable version of get_cpu_var(). The inner-lock should
provide enough serialisation.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
176/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: block: mq: use cpu_light()
Date: Wed, 9 Apr 2014 10:37:23 +0200
there is a might sleep splat because get_cpu() disables preemption and
later we grab a lock. As a workaround for this we use get_cpu_light().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
177/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: block/mq: do not invoke preempt_disable()
Date: Tue, 14 Jul 2015 14:26:34 +0200
preempt_disable() and get_cpu() don't play well together with the sleeping
locks it tries to allocate later.
It seems to be enough to replace it with get_cpu_light() and migrate_disable().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
178/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: block/mq: don't complete requests via IPI
Date: Thu, 29 Jan 2015 15:10:08 +0100
The IPI runs in hardirq context and there are sleeping locks. This patch
moves the completion into a workqueue.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
179/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: md: raid5: Make raid5_percpu handling RT aware
Date: Tue, 6 Apr 2010 16:51:31 +0200
__raid_run_ops() disables preemption with get_cpu() around the access
to the raid5_percpu variables. That causes scheduling while atomic
spews on RT.
Serialize the access to the percpu data with a lock and keep the code
preemptible.
Reported-by: Udo van den Heuvel <udovdh@xs4all.nl>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Udo van den Heuvel <udovdh@xs4all.nl>
]
180/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rt: Introduce cpu_chill()
Date: Wed, 7 Mar 2012 20:51:03 +0100
Retry loops on RT might loop forever when the modifying side was
preempted. Add cpu_chill() to replace cpu_relax(). cpu_chill()
defaults to cpu_relax() for non RT. On RT it puts the looping task to
sleep for a tick so the preempted task can make progress.
Steven Rostedt changed it to use a hrtimer instead of msleep():
|
|Ulrich Obergfell pointed out that cpu_chill() calls msleep() which is woken
|up by the ksoftirqd running the TIMER softirq. But as the cpu_chill() is
|called from softirq context, it may block the ksoftirqd() from running, in
|which case, it may never wake up the msleep() causing the deadlock.
|
|I checked the vmcore, and irq/74-qla2xxx is stuck in the msleep() call,
|running on CPU 8. The one ksoftirqd that is stuck, happens to be the one that
|runs on CPU 8, and it is blocked on a lock held by irq/74-qla2xxx. As that
|ksoftirqd is the one that will wake up irq/74-qla2xxx, and it happens to be
|blocked on a lock that irq/74-qla2xxx holds, we have our deadlock.
|
|The solution is not to convert the cpu_chill() back to a cpu_relax() as that
|will re-create a possible live lock that the cpu_chill() fixed earlier, and may
|also leave this bug open on other softirqs. The fix is to remove the
|dependency on ksoftirqd from cpu_chill(). That is, instead of calling
|msleep() that requires ksoftirqd to wake it up, use the
|hrtimer_nanosleep() code that does the wakeup from hard irq context.
|
||Looks to be the lock of the block softirq. I don't have the core dump
||anymore, but from what I could tell the ksoftirqd was blocked on the
||block softirq lock, where the block softirq handler did a msleep
||(called by the qla2xxx interrupt handler).
||
||Looking at trigger_softirq() in block/blk-softirq.c, it can do a
||smp_callfunction() to another cpu to run the block softirq. If that
||happens to be the cpu where the qla2xx irq handler is doing the block
||softirq and is in a middle of a msleep(), I believe the ksoftirqd will
||try to run the softirq. If it does that, then BOOM, it's deadlocked
||because the ksoftirqd will never run the timer softirq either.
|
||I should have also stated that it was only one lock that was involved.
||But the lock owner was doing a msleep() that requires a wakeup by
||ksoftirqd to continue. If ksoftirqd happens to be blocked on a lock
||held by the msleep() caller, then you have your deadlock.
||
||It's best not to have any softirqs going to sleep requiring another
||softirq to wake it up. Note, if we ever require a timer softirq to do a
||cpu_chill() it will most definitely hit this deadlock.
+ bigeasy: add PF_NOFREEZE:
| [....] Waiting for /dev to be fully populated...
| =====================================
| [ BUG: udevd/229 still has locks held! ]
| 3.12.11-rt17 #23 Not tainted
| -------------------------------------
| 1 lock held by udevd/229:
| #0: (&type->i_mutex_dir_key#2){+.+.+.}, at: lookup_slow+0x28/0x98
|
| stack backtrace:
| CPU: 0 PID: 229 Comm: udevd Not tainted 3.12.11-rt17 #23
| (unwind_backtrace+0x0/0xf8) from (show_stack+0x10/0x14)
| (show_stack+0x10/0x14) from (dump_stack+0x74/0xbc)
| (dump_stack+0x74/0xbc) from (do_nanosleep+0x120/0x160)
| (do_nanosleep+0x120/0x160) from (hrtimer_nanosleep+0x90/0x110)
| (hrtimer_nanosleep+0x90/0x110) from (cpu_chill+0x30/0x38)
| (cpu_chill+0x30/0x38) from (dentry_kill+0x158/0x1ec)
| (dentry_kill+0x158/0x1ec) from (dput+0x74/0x15c)
| (dput+0x74/0x15c) from (lookup_real+0x4c/0x50)
| (lookup_real+0x4c/0x50) from (__lookup_hash+0x34/0x44)
| (__lookup_hash+0x34/0x44) from (lookup_slow+0x38/0x98)
| (lookup_slow+0x38/0x98) from (path_lookupat+0x208/0x7fc)
| (path_lookupat+0x208/0x7fc) from (filename_lookup+0x20/0x60)
| (filename_lookup+0x20/0x60) from (user_path_at_empty+0x50/0x7c)
| (user_path_at_empty+0x50/0x7c) from (user_path_at+0x14/0x1c)
| (user_path_at+0x14/0x1c) from (vfs_fstatat+0x48/0x94)
| (vfs_fstatat+0x48/0x94) from (SyS_stat64+0x14/0x30)
| (SyS_stat64+0x14/0x30) from (ret_fast_syscall+0x0/0x48)
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
181/286 [
Author: Steven Rostedt
Email: rostedt@goodmis.org
Subject: cpu_chill: Add a UNINTERRUPTIBLE hrtimer_nanosleep
Date: Tue, 4 Mar 2014 12:28:32 -0500
We hit another bug that was caused by switching cpu_chill() from
msleep() to hrtimer_nanosleep().
This time it is a livelock. The problem is that hrtimer_nanosleep()
calls schedule with the state == TASK_INTERRUPTIBLE. But these means
that if a signal is pending, the scheduler wont schedule, and will
simply change the current task state back to TASK_RUNNING. This
nullifies the whole point of cpu_chill() in the first place. That is,
if a task is spinning on a try_lock() and it preempted the owner of the
lock, if it has a signal pending, it will never give up the CPU to let
the owner of the lock run.
I made a static function __hrtimer_nanosleep() that takes a fifth
parameter "state", which determines the task state of that the
nanosleep() will be in. The normal hrtimer_nanosleep() will act the
same, but cpu_chill() will call the __hrtimer_nanosleep() directly with
the TASK_UNINTERRUPTIBLE state.
cpu_chill() only cares that the first sleep happens, and does not care
about the state of the restart schedule (in hrtimer_nanosleep_restart).
Reported-by: Ulrich Obergfell <uobergfe@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
182/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: block: blk-mq: Use swait
Date: Fri, 13 Feb 2015 11:01:26 +0100
| BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:914
| in_atomic(): 1, irqs_disabled(): 0, pid: 255, name: kworker/u257:6
| 5 locks held by kworker/u257:6/255:
| #0: ("events_unbound"){.+.+.+}, at: [<ffffffff8108edf1>] process_one_work+0x171/0x5e0
| #1: ((&entry->work)){+.+.+.}, at: [<ffffffff8108edf1>] process_one_work+0x171/0x5e0
| #2: (&shost->scan_mutex){+.+.+.}, at: [<ffffffffa000faa3>] __scsi_add_device+0xa3/0x130 [scsi_mod]
| #3: (&set->tag_list_lock){+.+...}, at: [<ffffffff812f09fa>] blk_mq_init_queue+0x96a/0xa50
| #4: (rcu_read_lock_sched){......}, at: [<ffffffff8132887d>] percpu_ref_kill_and_confirm+0x1d/0x120
| Preemption disabled at:[<ffffffff812eff76>] blk_mq_freeze_queue_start+0x56/0x70
|
| CPU: 2 PID: 255 Comm: kworker/u257:6 Not tainted 3.18.7-rt0+ #1
| Workqueue: events_unbound async_run_entry_fn
| 0000000000000003 ffff8800bc29f998 ffffffff815b3a12 0000000000000000
| 0000000000000000 ffff8800bc29f9b8 ffffffff8109aa16 ffff8800bc29fa28
| ffff8800bc5d1bc8 ffff8800bc29f9e8 ffffffff815b8dd4 ffff880000000000
| Call Trace:
| [<ffffffff815b3a12>] dump_stack+0x4f/0x7c
| [<ffffffff8109aa16>] __might_sleep+0x116/0x190
| [<ffffffff815b8dd4>] rt_spin_lock+0x24/0x60
| [<ffffffff810b6089>] __wake_up+0x29/0x60
| [<ffffffff812ee06e>] blk_mq_usage_counter_release+0x1e/0x20
| [<ffffffff81328966>] percpu_ref_kill_and_confirm+0x106/0x120
| [<ffffffff812eff76>] blk_mq_freeze_queue_start+0x56/0x70
| [<ffffffff812f0000>] blk_mq_update_tag_set_depth+0x40/0xd0
| [<ffffffff812f0a1c>] blk_mq_init_queue+0x98c/0xa50
| [<ffffffffa000dcf0>] scsi_mq_alloc_queue+0x20/0x60 [scsi_mod]
| [<ffffffffa000ea35>] scsi_alloc_sdev+0x2f5/0x370 [scsi_mod]
| [<ffffffffa000f494>] scsi_probe_and_add_lun+0x9e4/0xdd0 [scsi_mod]
| [<ffffffffa000fb26>] __scsi_add_device+0x126/0x130 [scsi_mod]
| [<ffffffffa013033f>] ata_scsi_scan_host+0xaf/0x200 [libata]
| [<ffffffffa012b5b6>] async_port_probe+0x46/0x60 [libata]
| [<ffffffff810978fb>] async_run_entry_fn+0x3b/0xf0
| [<ffffffff8108ee81>] process_one_work+0x201/0x5e0
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
183/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: block: Use cpu_chill() for retry loops
Date: Thu, 20 Dec 2012 18:28:26 +0100
Retry loops on RT might loop forever when the modifying side was
preempted. Steven also observed a live lock when there was a
concurrent priority boosting going on.
Use cpu_chill() instead of cpu_relax() to let the system
make progress.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
184/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: fs: dcache: Use cpu_chill() in trylock loops
Date: Wed, 7 Mar 2012 21:00:34 +0100
Retry loops on RT might loop forever when the modifying side was
preempted. Use cpu_chill() instead of cpu_relax() to let the system
make progress.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
185/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: net: Use cpu_chill() instead of cpu_relax()
Date: Wed, 7 Mar 2012 21:10:04 +0100
Retry loops on RT might loop forever when the modifying side was
preempted. Use cpu_chill() instead of cpu_relax() to let the system
make progress.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
186/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: fs/dcache: use swait_queue instead of waitqueue
Date: Wed, 14 Sep 2016 14:35:49 +0200
__d_lookup_done() invokes wake_up_all() while holding a hlist_bl_lock()
which disables preemption. As a workaround convert it to swait.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
187/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: workqueue: Use normal rcu
Date: Wed, 24 Jul 2013 15:26:54 +0200
There is no need for sched_rcu. The undocumented reason why sched_rcu
is used is to avoid a few explicit rcu_read_lock()/unlock() pairs by
abusing the fact that sched_rcu reader side critical sections are also
protected by preempt or irq disabled regions.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
188/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: workqueue: Use local irq lock instead of irq disable regions
Date: Sun, 17 Jul 2011 21:42:26 +0200
Use a local_irq_lock as a replacement for irq off regions. We keep the
semantic of irq-off in regard to the pool->lock and remain preemptible.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
189/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: workqueue: Prevent workqueue versus ata-piix livelock
Date: Mon, 1 Jul 2013 11:02:42 +0200
An Intel i7 system regularly detected rcu_preempt stalls after the kernel
was upgraded from 3.6-rt to 3.8-rt. When the stall happened, disk I/O was no
longer possible, unless the system was restarted.
The kernel message was:
INFO: rcu_preempt self-detected stall on CPU { 6}
[..]
NMI backtrace for cpu 6
CPU 6
Pid: 119, comm: irq/19-ata_piix Not tainted 3.8.13-rt13 #11 Shuttle Inc. SX58/SX58
RIP: 0010:[<ffffffff8124ca60>] [<ffffffff8124ca60>] ip_compute_csum+0x30/0x30
RSP: 0018:ffff880333303cb0 EFLAGS: 00000002
RAX: 0000000000000006 RBX: 00000000000003e9 RCX: 0000000000000034
RDX: 0000000000000000 RSI: ffffffff81aa16d0 RDI: 0000000000000001
RBP: ffff880333303ce8 R08: ffffffff81aa16d0 R09: ffffffff81c1b8cc
R10: 0000000000000000 R11: 0000000000000000 R12: 000000000005161f
R13: 0000000000000006 R14: ffffffff81aa16d0 R15: 0000000000000002
FS: 0000000000000000(0000) GS:ffff880333300000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000003c1b2bb420 CR3: 0000000001a0f000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process irq/19-ata_piix (pid: 119, threadinfo ffff88032d88a000, task ffff88032df80000)
Stack:
ffffffff8124cb32 000000000005161e 00000000000003e9 0000000000001000
0000000000009022 ffffffff81aa16d0 0000000000000002 ffff880333303cf8
ffffffff8124caa9 ffff880333303d08 ffffffff8124cad2 ffff880333303d28
Call Trace:
<IRQ>
[<ffffffff8124cb32>] ? delay_tsc+0x33/0xe3
[<ffffffff8124caa9>] __delay+0xf/0x11
[<ffffffff8124cad2>] __const_udelay+0x27/0x29
[<ffffffff8102d1fa>] native_safe_apic_wait_icr_idle+0x39/0x45
[<ffffffff8102dc9b>] __default_send_IPI_dest_field.constprop.0+0x1e/0x58
[<ffffffff8102dd1e>] default_send_IPI_mask_sequence_phys+0x49/0x7d
[<ffffffff81030326>] physflat_send_IPI_all+0x17/0x19
[<ffffffff8102de53>] arch_trigger_all_cpu_backtrace+0x50/0x79
[<ffffffff810b21d0>] rcu_check_callbacks+0x1cb/0x568
[<ffffffff81048c9c>] ? raise_softirq+0x2e/0x35
[<ffffffff81086be0>] ? tick_sched_do_timer+0x38/0x38
[<ffffffff8104f653>] update_process_times+0x44/0x55
[<ffffffff81086866>] tick_sched_handle+0x4a/0x59
[<ffffffff81086c1c>] tick_sched_timer+0x3c/0x5b
[<ffffffff81062845>] __run_hrtimer+0x9b/0x158
[<ffffffff810631d8>] hrtimer_interrupt+0x172/0x2aa
[<ffffffff8102d498>] smp_apic_timer_interrupt+0x76/0x89
[<ffffffff814d881d>] apic_timer_interrupt+0x6d/0x80
<EOI>
[<ffffffff81057cd2>] ? __local_lock_irqsave+0x17/0x4a
[<ffffffff81059336>] try_to_grab_pending+0x42/0x17e
[<ffffffff8105a699>] mod_delayed_work_on+0x32/0x88
[<ffffffff8105a70b>] mod_delayed_work+0x1c/0x1e
[<ffffffff8122ae84>] blk_run_queue_async+0x37/0x39
[<ffffffff81230985>] flush_end_io+0xf1/0x107
[<ffffffff8122e0da>] blk_finish_request+0x21e/0x264
[<ffffffff8122e162>] blk_end_bidi_request+0x42/0x60
[<ffffffff8122e1ba>] blk_end_request+0x10/0x12
[<ffffffff8132de46>] scsi_io_completion+0x1bf/0x492
[<ffffffff81335cec>] ? sd_done+0x298/0x2ef
[<ffffffff81325a02>] scsi_finish_command+0xe9/0xf2
[<ffffffff8132dbcb>] scsi_softirq_done+0x106/0x10f
[<ffffffff812333d3>] blk_done_softirq+0x77/0x87
[<ffffffff8104826f>] do_current_softirqs+0x172/0x2e1
[<ffffffff810aa820>] ? irq_thread_fn+0x3a/0x3a
[<ffffffff81048466>] local_bh_enable+0x43/0x72
[<ffffffff810aa866>] irq_forced_thread_fn+0x46/0x52
[<ffffffff810ab089>] irq_thread+0x8c/0x17c
[<ffffffff810ab179>] ? irq_thread+0x17c/0x17c
[<ffffffff810aaffd>] ? wake_threads_waitq+0x44/0x44
[<ffffffff8105eb18>] kthread+0x8d/0x95
[<ffffffff8105ea8b>] ? __kthread_parkme+0x65/0x65
[<ffffffff814d7b7c>] ret_from_fork+0x7c/0xb0
[<ffffffff8105ea8b>] ? __kthread_parkme+0x65/0x65
The state of softirqd of this CPU at the time of the crash was:
ksoftirqd/6 R running task 0 53 2 0x00000000
ffff88032fc39d18 0000000000000046 ffff88033330c4c0 ffff8803303f4710
ffff88032fc39fd8 ffff88032fc39fd8 0000000000000000 0000000000062500
ffff88032df88000 ffff8803303f4710 0000000000000000 ffff88032fc38000
Call Trace:
[<ffffffff8105a3ae>] ? __queue_work+0x27c/0x27c
[<ffffffff814d178c>] preempt_schedule+0x61/0x76
[<ffffffff8106cccf>] migrate_enable+0xe5/0x1df
[<ffffffff8105a3ae>] ? __queue_work+0x27c/0x27c
[<ffffffff8104ef52>] run_timer_softirq+0x161/0x1d6
[<ffffffff8104826f>] do_current_softirqs+0x172/0x2e1
[<ffffffff8104840b>] run_ksoftirqd+0x2d/0x45
[<ffffffff8106658a>] smpboot_thread_fn+0x2ea/0x308
[<ffffffff810662a0>] ? test_ti_thread_flag+0xc/0xc
[<ffffffff810662a0>] ? test_ti_thread_flag+0xc/0xc
[<ffffffff8105eb18>] kthread+0x8d/0x95
[<ffffffff8105ea8b>] ? __kthread_parkme+0x65/0x65
[<ffffffff814d7afc>] ret_from_fork+0x7c/0xb0
[<ffffffff8105ea8b>] ? __kthread_parkme+0x65/0x65
Apparently, the softirq demon and the ata_piix IRQ handler were waiting
for each other to finish ending up in a livelock. After the below patch
was applied, the system no longer crashes.
Reported-by: Carsten Emde <C.Emde@osadl.org>
Proposed-by: Thomas Gleixner <tglx@linutronix.de>
Tested by: Carsten Emde <C.Emde@osadl.org>
Signed-off-by: Carsten Emde <C.Emde@osadl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
190/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Distangle worker accounting from rqlock
Date: Wed, 22 Jun 2011 19:47:03 +0200
The worker accounting for cpu bound workers is plugged into the core
scheduler code and the wakeup code. This is not a hard requirement and
can be avoided by keeping track of the state in the workqueue code
itself.
Keep track of the sleeping state in the worker itself and call the
notifier before entering the core scheduler. There might be false
positives when the task is woken between that call and actually
scheduling, but that's not really different from scheduling and being
woken immediately after switching away. There is also no harm from
updating nr_running when the task returns from scheduling instead of
accounting it in the wakeup code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20110622174919.135236139@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
191/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: idr: Use local lock instead of preempt enable/disable
Date: Tue, 14 Jul 2015 14:26:34 +0200
We need to protect the per cpu variable and prevent migration.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
192/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: percpu_ida: Use local locks
Date: Wed, 9 Apr 2014 11:58:17 +0200
the local_irq_save() + spin_lock() does not work that well on -RT
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
193/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: debugobjects: Make RT aware
Date: Sun, 17 Jul 2011 21:41:35 +0200
Avoid filling the pool / allocating memory with irqs off().
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
194/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: jump-label: disable if stop_machine() is used
Date: Wed, 8 Jul 2015 17:14:48 +0200
Some architectures are using stop_machine() while switching the opcode which
leads to latency spikes.
The architectures which use stop_machine() atm:
- ARM stop machine
- s390 stop machine
The architecures which use other sorcery:
- MIPS
- X86
- powerpc
- sparc
- arm64
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy: only ARM for now]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
195/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: seqlock: Prevent rt starvation
Date: Wed, 22 Feb 2012 12:03:30 +0100
If a low prio writer gets preempted while holding the seqlock write
locked, a high prio reader spins forever on RT.
To prevent this let the reader grab the spinlock, so it blocks and
eventually boosts the writer. This way the writer can proceed and
endless spinning is prevented.
For seqcount writers we disable preemption over the update code
path. Thanks to Al Viro for distangling some VFS code to make that
possible.
Nicholas Mc Guire:
- spin_lock+unlock => spin_unlock_wait
- __write_seqcount_begin => __raw_write_seqcount_begin
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
196/286 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: sunrpc: Make svc_xprt_do_enqueue() use get_cpu_light()
Date: Wed, 18 Feb 2015 16:05:28 +0100
|BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:915
|in_atomic(): 1, irqs_disabled(): 0, pid: 3194, name: rpc.nfsd
|Preemption disabled at:[<ffffffffa06bf0bb>] svc_xprt_received+0x4b/0xc0 [sunrpc]
|CPU: 6 PID: 3194 Comm: rpc.nfsd Not tainted 3.18.7-rt1 #9
|Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.404 11/06/2014
| ffff880409630000 ffff8800d9a33c78 ffffffff815bdeb5 0000000000000002
| 0000000000000000 ffff8800d9a33c98 ffffffff81073c86 ffff880408dd6008
| ffff880408dd6000 ffff8800d9a33cb8 ffffffff815c3d84 ffff88040b3ac000
|Call Trace:
| [<ffffffff815bdeb5>] dump_stack+0x4f/0x9e
| [<ffffffff81073c86>] __might_sleep+0xe6/0x150
| [<ffffffff815c3d84>] rt_spin_lock+0x24/0x50
| [<ffffffffa06beec0>] svc_xprt_do_enqueue+0x80/0x230 [sunrpc]
| [<ffffffffa06bf0bb>] svc_xprt_received+0x4b/0xc0 [sunrpc]
| [<ffffffffa06c03ed>] svc_add_new_perm_xprt+0x6d/0x80 [sunrpc]
| [<ffffffffa06b2693>] svc_addsock+0x143/0x200 [sunrpc]
| [<ffffffffa072e69c>] write_ports+0x28c/0x340 [nfsd]
| [<ffffffffa072d2ac>] nfsctl_transaction_write+0x4c/0x80 [nfsd]
| [<ffffffff8117ee83>] vfs_write+0xb3/0x1d0
| [<ffffffff8117f889>] SyS_write+0x49/0xb0
| [<ffffffff815c4556>] system_call_fastpath+0x16/0x1b
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
197/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: net: Use skbufhead with raw lock
Date: Tue, 12 Jul 2011 15:38:34 +0200
Use the rps lock as rawlock so we can keep irq-off regions. It looks low
latency. However we can't kfree() from this context therefore we defer this
to the softirq and use the tofree_queue list for it (similar to process_queue).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
198/286 [
Author: Grygorii Strashko
Email: grygorii.strashko@ti.com
Subject: net/core/cpuhotplug: Drain input_pkt_queue lockless
Date: Fri, 9 Oct 2015 09:25:49 -0500
I can constantly see below error report with 4.1 RT-kernel on TI ARM dra7-evm
if I'm trying to unplug cpu1:
[ 57.737589] CPU1: shutdown
[ 57.767537] BUG: spinlock bad magic on CPU#0, sh/137
[ 57.767546] lock: 0xee994730, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
[ 57.767552] CPU: 0 PID: 137 Comm: sh Not tainted 4.1.10-rt8-01700-g2c38702-dirty #55
[ 57.767555] Hardware name: Generic DRA74X (Flattened Device Tree)
[ 57.767568] [<c001acd0>] (unwind_backtrace) from [<c001534c>] (show_stack+0x20/0x24)
[ 57.767579] [<c001534c>] (show_stack) from [<c075560c>] (dump_stack+0x84/0xa0)
[ 57.767593] [<c075560c>] (dump_stack) from [<c00aca48>] (spin_dump+0x84/0xac)
[ 57.767603] [<c00aca48>] (spin_dump) from [<c00acaa4>] (spin_bug+0x34/0x38)
[ 57.767614] [<c00acaa4>] (spin_bug) from [<c00acc10>] (do_raw_spin_lock+0x168/0x1c0)
[ 57.767624] [<c00acc10>] (do_raw_spin_lock) from [<c075b4cc>] (_raw_spin_lock+0x4c/0x54)
[ 57.767631] [<c075b4cc>] (_raw_spin_lock) from [<c07599fc>] (rt_spin_lock_slowlock+0x5c/0x374)
[ 57.767638] [<c07599fc>] (rt_spin_lock_slowlock) from [<c075bcf4>] (rt_spin_lock+0x38/0x70)
[ 57.767649] [<c075bcf4>] (rt_spin_lock) from [<c06333c0>] (skb_dequeue+0x28/0x7c)
[ 57.767662] [<c06333c0>] (skb_dequeue) from [<c06476ec>] (dev_cpu_callback+0x1b8/0x240)
[ 57.767673] [<c06476ec>] (dev_cpu_callback) from [<c007566c>] (notifier_call_chain+0x3c/0xb4)
The reason is that skb_dequeue is taking skb->lock, but RT changed the
core code to use a raw spinlock. The non-raw lock is not initialized
on purpose to catch exactly this kind of problem.
Fixes: 91df05da13a6 'net: Use skbufhead with raw lock'
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable-rt@vger.kernel.org
]
199/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net: move xmit_recursion to per-task variable on -RT
Date: Wed, 13 Jan 2016 15:55:02 +0100
A softirq on -RT can be preempted. That means one task is in
__dev_queue_xmit(), gets preempted and another task may enter
__dev_queue_xmit() aw well. netperf together with a bridge device
will then trigger the `recursion alert` because each task increments
the xmit_recursion variable which is per-CPU.
A virtual device like br0 is required to trigger this warning.
This patch moves the counter to per task instead per-CPU so it counts
the recursion properly on -RT.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
200/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net: provide a way to delegate processing a softirq to ksoftirqd
Date: Wed, 20 Jan 2016 15:39:05 +0100
If the NET_RX uses up all of his budget it moves the following NAPI
invocations into the `ksoftirqd`. On -RT it does not do so. Instead it
rises the NET_RX softirq in its current context again.
In order to get closer to mainline's behaviour this patch provides
__raise_softirq_irqoff_ksoft() which raises the softirq in the ksoftird.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
201/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net: dev: always take qdisc's busylock in __dev_xmit_skb()
Date: Wed, 30 Mar 2016 13:36:29 +0200
The root-lock is dropped before dev_hard_start_xmit() is invoked and after
setting the __QDISC___STATE_RUNNING bit. If this task is now pushed away
by a task with a higher priority then the task with the higher priority
won't be able to submit packets to the NIC directly instead they will be
enqueued into the Qdisc. The NIC will remain idle until the task(s) with
higher priority leave the CPU and the task with lower priority gets back
and finishes the job.
If we take always the busylock we ensure that the RT task can boost the
low-prio task and submit the packet.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
202/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net/Qdisc: use a seqlock instead seqcount
Date: Wed, 14 Sep 2016 17:36:35 +0200
The seqcount disables preemption on -RT while it is held which can't
remove. Also we don't want the reader to spin for ages if the writer is
scheduled out. The seqlock on the other hand will serialize / sleep on
the lock while writer is active.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
203/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net: add back the missing serialization in ip_send_unicast_reply()
Date: Wed, 31 Aug 2016 17:21:56 +0200
Some time ago Sami Pietikäinen reported a crash on -RT in
ip_send_unicast_reply() which was later fixed by Nicholas Mc Guire
(v3.12.8-rt11). Later (v3.18.8) the code was reworked and I dropped the
patch. As it turns out it was mistake.
I have reports that the same crash is possible with a similar backtrace.
It seems that vanilla protects access to this_cpu_ptr() via
local_bh_disable(). This does not work the on -RT since we can have
NET_RX and NET_TX running in parallel on the same CPU.
This is brings back the old locks.
|Unable to handle kernel NULL pointer dereference at virtual address 00000010
|PC is at __ip_make_skb+0x198/0x3e8
|[<c04e39d8>] (__ip_make_skb) from [<c04e3ca8>] (ip_push_pending_frames+0x20/0x40)
|[<c04e3ca8>] (ip_push_pending_frames) from [<c04e3ff0>] (ip_send_unicast_reply+0x210/0x22c)
|[<c04e3ff0>] (ip_send_unicast_reply) from [<c04fbb54>] (tcp_v4_send_reset+0x190/0x1c0)
|[<c04fbb54>] (tcp_v4_send_reset) from [<c04fcc1c>] (tcp_v4_do_rcv+0x22c/0x288)
|[<c04fcc1c>] (tcp_v4_do_rcv) from [<c0474364>] (release_sock+0xb4/0x150)
|[<c0474364>] (release_sock) from [<c04ed904>] (tcp_close+0x240/0x454)
|[<c04ed904>] (tcp_close) from [<c0511408>] (inet_release+0x74/0x7c)
|[<c0511408>] (inet_release) from [<c0470728>] (sock_release+0x30/0xb0)
|[<c0470728>] (sock_release) from [<c0470abc>] (sock_close+0x1c/0x24)
|[<c0470abc>] (sock_close) from [<c0115ec4>] (__fput+0xe8/0x20c)
|[<c0115ec4>] (__fput) from [<c0116050>] (____fput+0x18/0x1c)
|[<c0116050>] (____fput) from [<c0058138>] (task_work_run+0xa4/0xb8)
|[<c0058138>] (task_work_run) from [<c0011478>] (do_work_pending+0xd0/0xe4)
|[<c0011478>] (do_work_pending) from [<c000e740>] (work_pending+0xc/0x20)
|Code: e3530001 8a000001 e3a00040 ea000011 (e5973010)
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
204/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net: add a lock around icmp_sk()
Date: Wed, 31 Aug 2016 17:54:09 +0200
It looks like the this_cpu_ptr() access in icmp_sk() is protected with
local_bh_disable(). To avoid missing serialization in -RT I am adding
here a local lock. No crash has been observed, this is just precaution.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
205/286 [
Author: Steven Rostedt
Email: rostedt@goodmis.org
Subject: net: Have __napi_schedule_irqoff() disable interrupts on RT
Date: Tue, 6 Dec 2016 17:50:30 -0500
A customer hit a crash where the napi sd->poll_list became corrupted.
The customer had the bnx2x driver, which does a
__napi_schedule_irqoff() in its interrupt handler. Unfortunately, when
running with CONFIG_PREEMPT_RT_FULL, this interrupt handler is run as a
thread and is preemptable. The call to ____napi_schedule() must be done
with interrupts disabled to protect the per cpu softnet_data's
"poll_list, which is protected by disabling interrupts (disabling
preemption is enough when all interrupts are threaded and
local_bh_disable() can't preempt)."
As bnx2x isn't the only driver that does this, the safest thing to do
is to make __napi_schedule_irqoff() call __napi_schedule() instead when
CONFIG_PREEMPT_RT_FULL is enabled, which will call local_irq_save()
before calling ____napi_schedule().
Cc: stable-rt@vger.kernel.org
Signed-off-by: Steven Rostedt (Red Hat) <rostedt@goodmis.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
206/286 [
Author: Carsten Emde
Email: C.Emde@osadl.org
Subject: net: sysrq via icmp
Date: Tue, 19 Jul 2011 13:51:17 +0100
There are (probably rare) situations when a system crashed and the system
console becomes unresponsive but the network icmp layer still is alive.
Wouldn't it be wonderful, if we then could submit a sysreq command via ping?
This patch provides this facility. Please consult the updated documentation
Documentation/sysrq.txt for details.
Signed-off-by: Carsten Emde <C.Emde@osadl.org>
]
207/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: irqwork: push most work into softirq context
Date: Tue, 23 Jun 2015 15:32:51 +0200
Initially we defered all irqwork into softirq because we didn't want the
latency spikes if perf or another user was busy and delayed the RT task.
The NOHZ trigger (nohz_full_kick_work) was the first user that did not work
as expected if it did not run in the original irqwork context so we had to
bring it back somehow for it. push_irq_work_func is the second one that
requires this.
This patch adds the IRQ_WORK_HARD_IRQ which makes sure the callback runs
in raw-irq context. Everything else is defered into softirq context. Without
-RT we have the orignal behavior.
This patch incorporates tglx orignal work which revoked a little bringing back
the arch_irq_work_raise() if possible and a few fixes from Steven Rostedt and
Mike Galbraith,
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
208/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: irqwork: Move irq safe work to irq context
Date: Sun, 15 Nov 2015 18:40:17 +0100
On architectures where arch_irq_work_has_interrupt() returns false, we
end up running the irq safe work from the softirq context. That
results in a potential deadlock in the scheduler irq work which
expects that function to be called with interrupts disabled.
Split the irq_work_tick() function into a hard and soft variant. Call
the hard variant from the tick interrupt and add the soft variant to
the timer softirq.
Reported-and-tested-by: Yanjiang Jin <yanjiang.jin@windriver.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable-rt@vger.kernel.org
]
209/286 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: snd/pcm: fix snd_pcm_stream_lock*() irqs_disabled() splats
Date: Wed, 18 Feb 2015 15:09:23 +0100
Locking functions previously using read_lock_irq()/read_lock_irqsave() were
changed to local_irq_disable/save(), leading to gripes. Use nort variants.
|BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:915
|in_atomic(): 0, irqs_disabled(): 1, pid: 5947, name: alsa-sink-ALC88
|CPU: 5 PID: 5947 Comm: alsa-sink-ALC88 Not tainted 3.18.7-rt1 #9
|Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.404 11/06/2014
| ffff880409316240 ffff88040866fa38 ffffffff815bdeb5 0000000000000002
| 0000000000000000 ffff88040866fa58 ffffffff81073c86 ffffffffa03b2640
| ffff88040239ec00 ffff88040866fa78 ffffffff815c3d34 ffffffffa03b2640
|Call Trace:
| [<ffffffff815bdeb5>] dump_stack+0x4f/0x9e
| [<ffffffff81073c86>] __might_sleep+0xe6/0x150
| [<ffffffff815c3d34>] __rt_spin_lock+0x24/0x50
| [<ffffffff815c4044>] rt_read_lock+0x34/0x40
| [<ffffffffa03a2979>] snd_pcm_stream_lock+0x29/0x70 [snd_pcm]
| [<ffffffffa03a355d>] snd_pcm_playback_poll+0x5d/0x120 [snd_pcm]
| [<ffffffff811937a2>] do_sys_poll+0x322/0x5b0
| [<ffffffff81193d48>] SyS_ppoll+0x1a8/0x1c0
| [<ffffffff815c4556>] system_call_fastpath+0x16/0x1b
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
210/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: printk: Make rt aware
Date: Wed, 19 Sep 2012 14:50:37 +0200
Drop the lock before calling the console driver and do not disable
interrupts while printing to a serial console.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
211/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: kernel/printk: Don't try to print from IRQ/NMI region
Date: Thu, 19 May 2016 17:45:27 +0200
On -RT we try to acquire sleeping locks which might lead to warnings
from lockdep or a warn_on() from spin_try_lock() (which is a rtmutex on
RT).
We don't print in general from a IRQ off region so we should not try
this via console_unblank() / bust_spinlocks() as well.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
212/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: printk: Drop the logbuf_lock more often
Date: Thu, 21 Mar 2013 19:01:05 +0100
The lock is hold with irgs off. The latency drops 500us+ on my arm bugs
with a "full" buffer after executing "dmesg" on the shell.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
213/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: powerpc: Use generic rwsem on RT
Date: Tue, 14 Jul 2015 14:26:34 +0200
Use generic code which uses rtmutex
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
214/286 [
Author: Bogdan Purcareata
Email: bogdan.purcareata@freescale.com
Subject: powerpc/kvm: Disable in-kernel MPIC emulation for PREEMPT_RT_FULL
Date: Fri, 24 Apr 2015 15:53:13 +0000
While converting the openpic emulation code to use a raw_spinlock_t enables
guests to run on RT, there's still a performance issue. For interrupts sent in
directed delivery mode with a multiple CPU mask, the emulated openpic will loop
through all of the VCPUs, and for each VCPUs, it call IRQ_check, which will loop
through all the pending interrupts for that VCPU. This is done while holding the
raw_lock, meaning that in all this time the interrupts and preemption are
disabled on the host Linux. A malicious user app can max both these number and
cause a DoS.
This temporary fix is sent for two reasons. First is so that users who want to
use the in-kernel MPIC emulation are aware of the potential latencies, thus
making sure that the hardware MPIC and their usage scenario does not involve
interrupts sent in directed delivery mode, and the number of possible pending
interrupts is kept small. Secondly, this should incentivize the development of a
proper openpic emulation that would be better suited for RT.
Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Bogdan Purcareata <bogdan.purcareata@freescale.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
215/286 [
Author: Paul Gortmaker
Email: paul.gortmaker@windriver.com
Subject: powerpc: ps3/device-init.c - adapt to completions using swait vs wait
Date: Sun, 31 May 2015 14:44:42 -0400
To fix:
cc1: warnings being treated as errors
arch/powerpc/platforms/ps3/device-init.c: In function 'ps3_notification_read_write':
arch/powerpc/platforms/ps3/device-init.c:755:2: error: passing argument 1 of 'prepare_to_wait_event' from incompatible pointer type
arch/powerpc/platforms/ps3/device-init.c:755:2: error: passing argument 1 of 'abort_exclusive_wait' from incompatible pointer type
arch/powerpc/platforms/ps3/device-init.c:755:2: error: passing argument 1 of 'finish_wait' from incompatible pointer type
arch/powerpc/platforms/ps3/device-init.o] Error 1
make[3]: *** Waiting for unfinished jobs....
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
216/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: ARM: at91: tclib: Default to tclib timer for RT
Date: Sat, 1 May 2010 18:29:35 +0200
RT is not too happy about the shared timer interrupt in AT91
devices. Default to tclib timer for RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
217/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: arm/unwind: use a raw_spin_lock
Date: Fri, 20 Sep 2013 14:31:54 +0200
Mostly unwind is done with irqs enabled however SLUB may call it with
irqs disabled while creating a new SLUB cache.
I had system freeze while loading a module which called
kmem_cache_create() on init. That means SLUB's __slab_alloc() disabled
interrupts and then
->new_slab_objects()
->new_slab()
->setup_object()
->setup_object_debug()
->init_tracking()
->set_track()
->save_stack_trace()
->save_stack_trace_tsk()
->walk_stackframe()
->unwind_frame()
->unwind_find_idx()
=>spin_lock_irqsave(&unwind_lock);
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
218/286 [
Author: "Yadi.hu"
Email: yadi.hu@windriver.com
Subject: ARM: enable irq in translation/section permission fault handlers
Date: Wed, 10 Dec 2014 10:32:09 +0800
Probably happens on all ARM, with
CONFIG_PREEMPT_RT_FULL
CONFIG_DEBUG_ATOMIC_SLEEP
This simple program....
int main() {
*((char*)0xc0001000) = 0;
};
[ 512.742724] BUG: sleeping function called from invalid context at kernel/rtmutex.c:658
[ 512.743000] in_atomic(): 0, irqs_disabled(): 128, pid: 994, name: a
[ 512.743217] INFO: lockdep is turned off.
[ 512.743360] irq event stamp: 0
[ 512.743482] hardirqs last enabled at (0): [< (null)>] (null)
[ 512.743714] hardirqs last disabled at (0): [<c0426370>] copy_process+0x3b0/0x11c0
[ 512.744013] softirqs last enabled at (0): [<c0426370>] copy_process+0x3b0/0x11c0
[ 512.744303] softirqs last disabled at (0): [< (null)>] (null)
[ 512.744631] [<c041872c>] (unwind_backtrace+0x0/0x104)
[ 512.745001] [<c09af0c4>] (dump_stack+0x20/0x24)
[ 512.745355] [<c0462490>] (__might_sleep+0x1dc/0x1e0)
[ 512.745717] [<c09b6770>] (rt_spin_lock+0x34/0x6c)
[ 512.746073] [<c0441bf0>] (do_force_sig_info+0x34/0xf0)
[ 512.746457] [<c0442668>] (force_sig_info+0x18/0x1c)
[ 512.746829] [<c041d880>] (__do_user_fault+0x9c/0xd8)
[ 512.747185] [<c041d938>] (do_bad_area+0x7c/0x94)
[ 512.747536] [<c041d990>] (do_sect_fault+0x40/0x48)
[ 512.747898] [<c040841c>] (do_DataAbort+0x40/0xa0)
[ 512.748181] Exception stack(0xecaa1fb0 to 0xecaa1ff8)
Oxc0000000 belongs to kernel address space, user task can not be
allowed to access it. For above condition, correct result is that
test case should receive a “segment fault” and exits but not stacks.
the root cause is commit 02fe2845d6a8 ("avoid enabling interrupts in
prefetch/data abort handlers"),it deletes irq enable block in Data
abort assemble code and move them into page/breakpiont/alignment fault
handlers instead. But author does not enable irq in translation/section
permission fault handlers. ARM disables irq when it enters exception/
interrupt mode, if kernel doesn't enable irq, it would be still disabled
during translation/section permission fault.
We see the above splat because do_force_sig_info is still called with
IRQs off, and that code eventually does a:
spin_lock_irqsave(&t->sighand->siglock, flags);
As this is architecture independent code, and we've not seen any other
need for other arch to have the siglock converted to raw lock, we can
conclude that we should enable irq for ARM translation/section
permission exception.
Signed-off-by: Yadi.hu <yadi.hu@windriver.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
219/286 [
Author: Josh Cartwright
Email: joshc@ni.com
Subject: genirq: update irq_set_irqchip_state documentation
Date: Thu, 11 Feb 2016 11:54:00 -0600
On -rt kernels, the use of migrate_disable()/migrate_enable() is
sufficient to guarantee a task isn't moved to another CPU. Update the
irq_set_irqchip_state() documentation to reflect this.
Signed-off-by: Josh Cartwright <joshc@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
220/286 [
Author: Josh Cartwright
Email: joshc@ni.com
Subject: KVM: arm/arm64: downgrade preempt_disable()d region to migrate_disable()
Date: Thu, 11 Feb 2016 11:54:01 -0600
kvm_arch_vcpu_ioctl_run() disables the use of preemption when updating
the vgic and timer states to prevent the calling task from migrating to
another CPU. It does so to prevent the task from writing to the
incorrect per-CPU GIC distributor registers.
On -rt kernels, it's possible to maintain the same guarantee with the
use of migrate_{disable,enable}(), with the added benefit that the
migrate-disabled region is preemptible. Update
kvm_arch_vcpu_ioctl_run() to do so.
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Reported-by: Manish Jaggi <Manish.Jaggi@caviumnetworks.com>
Signed-off-by: Josh Cartwright <joshc@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
221/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: arm64/xen: Make XEN depend on !RT
Date: Mon, 12 Oct 2015 11:18:40 +0200
It's not ready and probably never will be, unless xen folks have a
look at it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
222/286 [
Author: Jason Wessel
Email: jason.wessel@windriver.com
Subject: kgdb/serial: Short term workaround
Date: Thu, 28 Jul 2011 12:42:23 -0500
On 07/27/2011 04:37 PM, Thomas Gleixner wrote:
> - KGDB (not yet disabled) is reportedly unusable on -rt right now due
> to missing hacks in the console locking which I dropped on purpose.
>
To work around this in the short term you can use this patch, in
addition to the clocksource watchdog patch that Thomas brewed up.
Comments are welcome of course. Ultimately the right solution is to
change separation between the console and the HW to have a polled mode
+ work queue so as not to introduce any kind of latency.
Thanks,
Jason.
]
223/286 [
Author: Clark Williams
Email: williams@redhat.com
Subject: sysfs: Add /sys/kernel/realtime entry
Date: Sat, 30 Jul 2011 21:55:53 -0500
Add a /sys/kernel entry to indicate that the kernel is a
realtime kernel.
Clark says that he needs this for udev rules, udev needs to evaluate
if its a PREEMPT_RT kernel a few thousand times and parsing uname
output is too slow or so.
Are there better solutions? Should it exist and return 0 on !-rt?
Signed-off-by: Clark Williams <williams@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
]
224/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: powerpc: Disable highmem on RT
Date: Mon, 18 Jul 2011 17:08:34 +0200
The current highmem handling on -RT is not compatible and needs fixups.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
225/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mips: Disable highmem on RT
Date: Mon, 18 Jul 2011 17:10:12 +0200
The current highmem handling on -RT is not compatible and needs fixups.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
226/286 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: mm, rt: kmap_atomic scheduling
Date: Thu, 28 Jul 2011 10:43:51 +0200
In fact, with migrate_disable() existing one could play games with
kmap_atomic. You could save/restore the kmap_atomic slots on context
switch (if there are any in use of course), this should be esp easy now
that we have a kmap_atomic stack.
Something like the below.. it wants replacing all the preempt_disable()
stuff with pagefault_disable() && migrate_disable() of course, but then
you can flip kmaps around like below.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
[dvhart@linux.intel.com: build fix]
Link: http://lkml.kernel.org/r/1311842631.5890.208.camel@twins
[tglx@linutronix.de: Get rid of the per cpu variable and store the idx
and the pte content right away in the task struct.
Shortens the context switch code. ]
]
227/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mm: rt: Fix generic kmap_atomic for RT
Date: Sat, 19 Sep 2015 10:15:00 +0200
The update to 4.1 brought in the mainline variant of the pagefault
disable distangling from preempt count. That introduced a
preempt_disable/enable pair in the generic kmap_atomic/kunmap_atomic
implementations which got not converted to the _nort() variant.
That results in massive 'scheduling while atomic/sleeping function
called from invalid context' splats.
Fix that up.
Reported-and-tested-by: Juergen Borleis <jbe@pengutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
228/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: x86/highmem: Add a "already used pte" check
Date: Mon, 11 Mar 2013 17:09:55 +0100
This is a copy from kmap_atomic_prot().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
229/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: arm/highmem: Flush tlb on unmap
Date: Mon, 11 Mar 2013 21:37:27 +0100
The tlb should be flushed on unmap and thus make the mapping entry
invalid. This is only done in the non-debug case which does not look
right.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
230/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: arm: Enable highmem for rt
Date: Wed, 13 Feb 2013 11:03:11 +0100
fixup highmem for ARM.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
231/286 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: ipc/sem: Rework semaphore wakeups
Date: Wed, 14 Sep 2011 11:57:04 +0200
Current sysv sems have a weird ass wakeup scheme that involves keeping
preemption disabled over a potential O(n^2) loop and busy waiting on
that on other CPUs.
Kill this and simply wake the task directly from under the sem_lock.
This was discovered by a migrate_disable() debug feature that
disallows:
spin_lock();
preempt_disable();
spin_unlock()
preempt_enable();
Cc: Manfred Spraul <manfred@colorfullife.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Manfred Spraul <manfred@colorfullife.com>
Link: http://lkml.kernel.org/r/1315994224.5040.1.camel@twins
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
232/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: x86: kvm Require const tsc for RT
Date: Sun, 6 Nov 2011 12:26:18 +0100
Non constant TSC is a nightmare on bare metal already, but with
virtualization it becomes a complete disaster because the workarounds
are horrible latency wise. That's also a preliminary for running RT in
a guest on top of a RT host.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
233/286 [
Author: Marcelo Tosatti
Email: mtosatti@redhat.com
Subject: KVM: lapic: mark LAPIC timer handler as irqsafe
Date: Wed, 8 Apr 2015 20:33:25 -0300
Since lapic timer handler only wakes up a simple waitqueue,
it can be executed from hardirq context.
Reduces average cyclictest latency by 3us.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
234/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: scsi/fcoe: Make RT aware.
Date: Sat, 12 Nov 2011 14:00:48 +0100
Do not disable preemption while taking sleeping locks. All user look safe
for migrate_diable() only.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
235/286 [
Author: Paul Gortmaker
Email: paul.gortmaker@windriver.com
Subject: sas-ata/isci: dont't disable interrupts in qc_issue handler
Date: Sat, 14 Feb 2015 11:01:16 -0500
On 3.14-rt we see the following trace on Canoe Pass for
SCSI_ISCI "Intel(R) C600 Series Chipset SAS Controller"
when the sas qc_issue handler is run:
BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:905
in_atomic(): 0, irqs_disabled(): 1, pid: 432, name: udevd
CPU: 11 PID: 432 Comm: udevd Not tainted 3.14.28-rt22 #2
Hardware name: Intel Corporation S2600CP/S2600CP, BIOS SE5C600.86B.02.01.0002.082220131453 08/22/2013
ffff880fab500000 ffff880fa9f239c0 ffffffff81a2d273 0000000000000000
ffff880fa9f239d8 ffffffff8107f023 ffff880faac23dc0 ffff880fa9f239f0
ffffffff81a33cc0 ffff880faaeb1400 ffff880fa9f23a40 ffffffff815de891
Call Trace:
[<ffffffff81a2d273>] dump_stack+0x4e/0x7a
[<ffffffff8107f023>] __might_sleep+0xe3/0x160
[<ffffffff81a33cc0>] rt_spin_lock+0x20/0x50
[<ffffffff815de891>] isci_task_execute_task+0x171/0x2f0 <-----
[<ffffffff815cfecb>] sas_ata_qc_issue+0x25b/0x2a0
[<ffffffff81606363>] ata_qc_issue+0x1f3/0x370
[<ffffffff8160c600>] ? ata_scsi_invalid_field+0x40/0x40
[<ffffffff8160c8f5>] ata_scsi_translate+0xa5/0x1b0
[<ffffffff8160efc6>] ata_sas_queuecmd+0x86/0x280
[<ffffffff815ce446>] sas_queuecommand+0x196/0x230
[<ffffffff81081fad>] ? get_parent_ip+0xd/0x50
[<ffffffff815b05a4>] scsi_dispatch_cmd+0xb4/0x210
[<ffffffff815b7744>] scsi_request_fn+0x314/0x530
and gdb shows:
(gdb) list * isci_task_execute_task+0x171
0xffffffff815ddfb1 is in isci_task_execute_task (drivers/scsi/isci/task.c:138).
133 dev_dbg(&ihost->pdev->dev, "%s: num=%d\n", __func__, num);
134
135 for_each_sas_task(num, task) {
136 enum sci_status status = SCI_FAILURE;
137
138 spin_lock_irqsave(&ihost->scic_lock, flags); <-----
139 idev = isci_lookup_device(task->dev);
140 io_ready = isci_device_io_ready(idev, task);
141 tag = isci_alloc_tag(ihost);
142 spin_unlock_irqrestore(&ihost->scic_lock, flags);
(gdb)
In addition to the scic_lock, the function also contains locking of
the task_state_lock -- which is clearly not a candidate for raw lock
conversion. As can be seen by the comment nearby, we really should
be running the qc_issue code with interrupts enabled anyway.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
236/286 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: x86: crypto: Reduce preempt disabled regions
Date: Mon, 14 Nov 2011 18:19:27 +0100
Restrict the preempt disabled regions to the actual floating point
operations and enable preemption for the administrative actions.
This is necessary on RT to avoid that kfree and other operations are
called with preemption disabled.
Reported-and-tested-by: Carsten Emde <cbe@osadl.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
237/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: crypto: Reduce preempt disabled regions, more algos
Date: Fri, 21 Feb 2014 17:24:04 +0100
Don Estabrook reported
| kernel: WARNING: CPU: 2 PID: 858 at kernel/sched/core.c:2428 migrate_disable+0xed/0x100()
| kernel: WARNING: CPU: 2 PID: 858 at kernel/sched/core.c:2462 migrate_enable+0x17b/0x200()
| kernel: WARNING: CPU: 3 PID: 865 at kernel/sched/core.c:2428 migrate_disable+0xed/0x100()
and his backtrace showed some crypto functions which looked fine.
The problem is the following sequence:
glue_xts_crypt_128bit()
{
blkcipher_walk_virt(); /* normal migrate_disable() */
glue_fpu_begin(); /* get atomic */
while (nbytes) {
__glue_xts_crypt_128bit();
blkcipher_walk_done(); /* with nbytes = 0, migrate_enable()
* while we are atomic */
};
glue_fpu_end() /* no longer atomic */
}
and this is why the counter get out of sync and the warning is printed.
The other problem is that we are non-preemptible between
glue_fpu_begin() and glue_fpu_end() and the latency grows. To fix this,
I shorten the FPU off region and ensure blkcipher_walk_done() is called
with preemption enabled. This might hurt the performance because we now
enable/disable the FPU state more often but we gain lower latency and
the bug is gone.
Reported-by: Don Estabrook <don.estabrook@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
238/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: dm: Make rt aware
Date: Mon, 14 Nov 2011 23:06:09 +0100
Use the BUG_ON_NORT variant for the irq_disabled() checks. RT has
interrupts legitimately enabled here as we cant deadlock against the
irq thread due to the "sleeping spinlocks" conversion.
Reported-by: Luis Claudio R. Goncalves <lclaudio@uudg.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
239/286 [
Author: Steven Rostedt
Email: rostedt@goodmis.org
Subject: acpi/rt: Convert acpi_gbl_hardware lock back to a raw_spinlock_t
Date: Wed, 13 Feb 2013 09:26:05 -0500
We hit the following bug with 3.6-rt:
[ 5.898990] BUG: scheduling while atomic: swapper/3/0/0x00000002
[ 5.898991] no locks held by swapper/3/0.
[ 5.898993] Modules linked in:
[ 5.898996] Pid: 0, comm: swapper/3 Not tainted 3.6.11-rt28.19.el6rt.x86_64.debug #1
[ 5.898997] Call Trace:
[ 5.899011] [<ffffffff810804e7>] __schedule_bug+0x67/0x90
[ 5.899028] [<ffffffff81577923>] __schedule+0x793/0x7a0
[ 5.899032] [<ffffffff810b4e40>] ? debug_rt_mutex_print_deadlock+0x50/0x200
[ 5.899034] [<ffffffff81577b89>] schedule+0x29/0x70
[ 5.899036] BUG: scheduling while atomic: swapper/7/0/0x00000002
[ 5.899037] no locks held by swapper/7/0.
[ 5.899039] [<ffffffff81578525>] rt_spin_lock_slowlock+0xe5/0x2f0
[ 5.899040] Modules linked in:
[ 5.899041]
[ 5.899045] [<ffffffff81579a58>] ? _raw_spin_unlock_irqrestore+0x38/0x90
[ 5.899046] Pid: 0, comm: swapper/7 Not tainted 3.6.11-rt28.19.el6rt.x86_64.debug #1
[ 5.899047] Call Trace:
[ 5.899049] [<ffffffff81578bc6>] rt_spin_lock+0x16/0x40
[ 5.899052] [<ffffffff810804e7>] __schedule_bug+0x67/0x90
[ 5.899054] [<ffffffff8157d3f0>] ? notifier_call_chain+0x80/0x80
[ 5.899056] [<ffffffff81577923>] __schedule+0x793/0x7a0
[ 5.899059] [<ffffffff812f2034>] acpi_os_acquire_lock+0x1f/0x23
[ 5.899062] [<ffffffff810b4e40>] ? debug_rt_mutex_print_deadlock+0x50/0x200
[ 5.899068] [<ffffffff8130be64>] acpi_write_bit_register+0x33/0xb0
[ 5.899071] [<ffffffff81577b89>] schedule+0x29/0x70
[ 5.899072] [<ffffffff8130be13>] ? acpi_read_bit_register+0x33/0x51
[ 5.899074] [<ffffffff81578525>] rt_spin_lock_slowlock+0xe5/0x2f0
[ 5.899077] [<ffffffff8131d1fc>] acpi_idle_enter_bm+0x8a/0x28e
[ 5.899079] [<ffffffff81579a58>] ? _raw_spin_unlock_irqrestore+0x38/0x90
[ 5.899081] [<ffffffff8107e5da>] ? this_cpu_load+0x1a/0x30
[ 5.899083] [<ffffffff81578bc6>] rt_spin_lock+0x16/0x40
[ 5.899087] [<ffffffff8144c759>] cpuidle_enter+0x19/0x20
[ 5.899088] [<ffffffff8157d3f0>] ? notifier_call_chain+0x80/0x80
[ 5.899090] [<ffffffff8144c777>] cpuidle_enter_state+0x17/0x50
[ 5.899092] [<ffffffff812f2034>] acpi_os_acquire_lock+0x1f/0x23
[ 5.899094] [<ffffffff8144d1a1>] cpuidle899101] [<ffffffff8130be13>] ?
As the acpi code disables interrupts in acpi_idle_enter_bm, and calls
code that grabs the acpi lock, it causes issues as the lock is currently
in RT a sleeping lock.
The lock was converted from a raw to a sleeping lock due to some
previous issues, and tests that showed it didn't seem to matter.
Unfortunately, it did matter for one of our boxes.
This patch converts the lock back to a raw lock. I've run this code on a
few of my own machines, one being my laptop that uses the acpi quite
extensively. I've been able to suspend and resume without issues.
[ tglx: Made the change exclusive for acpi_gbl_hardware_lock ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: John Kacur <jkacur@gmail.com>
Cc: Clark Williams <clark@redhat.com>
Link: http://lkml.kernel.org/r/1360765565.23152.5.camel@gandalf.local.home
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
240/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: cpumask: Disable CONFIG_CPUMASK_OFFSTACK for RT
Date: Wed, 14 Dec 2011 01:03:49 +0100
There are "valid" GFP_ATOMIC allocations such as
|BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:931
|in_atomic(): 1, irqs_disabled(): 0, pid: 2130, name: tar
|1 lock held by tar/2130:
| #0: (&mm->mmap_sem){++++++}, at: [<ffffffff811d4e89>] SyS_brk+0x39/0x190
|Preemption disabled at:[<ffffffff81063048>] flush_tlb_mm_range+0x28/0x350
|
|CPU: 1 PID: 2130 Comm: tar Tainted: G W 4.8.2-rt2+ #747
|Call Trace:
| [<ffffffff814d52dc>] dump_stack+0x86/0xca
| [<ffffffff810a26fb>] ___might_sleep+0x14b/0x240
| [<ffffffff819bc1d4>] rt_spin_lock+0x24/0x60
| [<ffffffff81194fba>] get_page_from_freelist+0x83a/0x11b0
| [<ffffffff81195e8b>] __alloc_pages_nodemask+0x15b/0x1190
| [<ffffffff811f0b81>] alloc_pages_current+0xa1/0x1f0
| [<ffffffff811f7df5>] new_slab+0x3e5/0x690
| [<ffffffff811fb0d5>] ___slab_alloc+0x495/0x660
| [<ffffffff811fb311>] __slab_alloc.isra.79+0x71/0xc0
| [<ffffffff811fb447>] __kmalloc_node+0xe7/0x240
| [<ffffffff814d4ee0>] alloc_cpumask_var_node+0x20/0x50
| [<ffffffff814d4f3e>] alloc_cpumask_var+0xe/0x10
| [<ffffffff810430c1>] native_send_call_func_ipi+0x21/0x130
| [<ffffffff8111c13f>] smp_call_function_many+0x22f/0x370
| [<ffffffff81062b64>] native_flush_tlb_others+0x1a4/0x3a0
| [<ffffffff8106309b>] flush_tlb_mm_range+0x7b/0x350
| [<ffffffff811c88e2>] tlb_flush_mmu_tlbonly+0x62/0xd0
| [<ffffffff811c9af4>] tlb_finish_mmu+0x14/0x50
| [<ffffffff811d1c84>] unmap_region+0xe4/0x110
| [<ffffffff811d3db3>] do_munmap+0x293/0x470
| [<ffffffff811d4f8c>] SyS_brk+0x13c/0x190
| [<ffffffff810032e2>] do_fast_syscall_32+0xb2/0x2f0
| [<ffffffff819be181>] entry_SYSENTER_compat+0x51/0x60
which forbid allocations at run-time.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
241/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: random: Make it work on rt
Date: Tue, 21 Aug 2012 20:38:50 +0200
Delegate the random insertion to the forced threaded interrupt
handler. Store the return IP of the hard interrupt handler in the irq
descriptor and feed it into the random generator as a source of
entropy.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
242/286 [
Author: Steven Rostedt
Email: rostedt@goodmis.org
Subject: cpu: Make hotplug.lock a "sleeping" spinlock on RT
Date: Fri, 2 Mar 2012 10:36:57 -0500
Tasks can block on hotplug.lock in pin_current_cpu(), but their state
might be != RUNNING. So the mutex wakeup will set the state
unconditionally to RUNNING. That might cause spurious unexpected
wakeups. We could provide a state preserving mutex_lock() function,
but this is semantically backwards. So instead we convert the
hotplug.lock() to a spinlock for RT, which has the state preserving
semantics already.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Carsten Emde <C.Emde@osadl.org>
Cc: John Kacur <jkacur@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Clark Williams <clark.williams@gmail.com>
Link: http://lkml.kernel.org/r/1330702617.25686.265.camel@gandalf.stny.rr.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
243/286 [
Author: Steven Rostedt
Email: srostedt@redhat.com
Subject: cpu/rt: Rework cpu down for PREEMPT_RT
Date: Mon, 16 Jul 2012 08:07:43 +0000
Bringing a CPU down is a pain with the PREEMPT_RT kernel because
tasks can be preempted in many more places than in non-RT. In
order to handle per_cpu variables, tasks may be pinned to a CPU
for a while, and even sleep. But these tasks need to be off the CPU
if that CPU is going down.
Several synchronization methods have been tried, but when stressed
they failed. This is a new approach.
A sync_tsk thread is still created and tasks may still block on a
lock when the CPU is going down, but how that works is a bit different.
When cpu_down() starts, it will create the sync_tsk and wait on it
to inform that current tasks that are pinned on the CPU are no longer
pinned. But new tasks that are about to be pinned will still be allowed
to do so at this time.
Then the notifiers are called. Several notifiers will bring down tasks
that will enter these locations. Some of these tasks will take locks
of other tasks that are on the CPU. If we don't let those other tasks
continue, but make them block until CPU down is done, the tasks that
the notifiers are waiting on will never complete as they are waiting
for the locks held by the tasks that are blocked.
Thus we still let the task pin the CPU until the notifiers are done.
After the notifiers run, we then make new tasks entering the pinned
CPU sections grab a mutex and wait. This mutex is now a per CPU mutex
in the hotplug_pcp descriptor.
To help things along, a new function in the scheduler code is created
called migrate_me(). This function will try to migrate the current task
off the CPU this is going down if possible. When the sync_tsk is created,
all tasks will then try to migrate off the CPU going down. There are
several cases that this wont work, but it helps in most cases.
After the notifiers are called and if a task can't migrate off but enters
the pin CPU sections, it will be forced to wait on the hotplug_pcp mutex
until the CPU down is complete. Then the scheduler will force the migration
anyway.
Also, I found that THREAD_BOUND need to also be accounted for in the
pinned CPU, and the migrate_disable no longer treats them special.
This helps fix issues with ksoftirqd and workqueue that unbind on CPU down.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
244/286 [
Author: Steven Rostedt
Email: rostedt@goodmis.org
Subject: cpu hotplug: Document why PREEMPT_RT uses a spinlock
Date: Thu, 5 Dec 2013 09:16:52 -0500
The patch:
cpu: Make hotplug.lock a "sleeping" spinlock on RT
Tasks can block on hotplug.lock in pin_current_cpu(), but their
state might be != RUNNING. So the mutex wakeup will set the state
unconditionally to RUNNING. That might cause spurious unexpected
wakeups. We could provide a state preserving mutex_lock() function,
but this is semantically backwards. So instead we convert the
hotplug.lock() to a spinlock for RT, which has the state preserving
semantics already.
Fixed a bug where the hotplug lock on PREEMPT_RT can be called after a
task set its state to TASK_UNINTERRUPTIBLE and before it called
schedule. If the hotplug_lock used a mutex, and there was contention,
the current task's state would be turned to TASK_RUNNABLE and the
schedule call will not sleep. This caused unexpected results.
Although the patch had a description of the change, the code had no
comments about it. This causes confusion to those that review the code,
and as PREEMPT_RT is held in a quilt queue and not git, it's not as easy
to see why a change was made. Even if it was in git, the code should
still have a comment for something as subtle as this.
Document the rational for using a spinlock on PREEMPT_RT in the hotplug
lock code.
Reported-by: Nicholas Mc Guire <der.herr@hofr.at>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
245/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: kernel/cpu: fix cpu down problem if kthread's cpu is going down
Date: Fri, 7 Jun 2013 22:37:06 +0200
If kthread is pinned to CPUx and CPUx is going down then we get into
trouble:
- first the unplug thread is created
- it will set itself to hp->unplug. As a result, every task that is
going to take a lock, has to leave the CPU.
- the CPU_DOWN_PREPARE notifier are started. The worker thread will
start a new process for the "high priority worker".
Now kthread would like to take a lock but since it can't leave the CPU
it will never complete its task.
We could fire the unplug thread after the notifier but then the cpu is
no longer marked "online" and the unplug thread will run on CPU0 which
was fixed before :)
So instead the unplug thread is started and kept waiting until the
notfier complete their work.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
246/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: kernel/hotplug: restore original cpu mask oncpu/down
Date: Fri, 14 Jun 2013 17:16:35 +0200
If a task which is allowed to run only on CPU X puts CPU Y down then it
will be allowed on all CPUs but the on CPU Y after it comes back from
kernel. This patch ensures that we don't lose the initial setting unless
the CPU the task is running is going down.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
247/286 [
Author: Tiejun Chen
Email: tiejun.chen@windriver.com
Subject: cpu_down: move migrate_enable() back
Date: Thu, 7 Nov 2013 10:06:07 +0800
Commit 08c1ab68, "hotplug-use-migrate-disable.patch", intends to
use migrate_enable()/migrate_disable() to replace that combination
of preempt_enable() and preempt_disable(), but actually in
!CONFIG_PREEMPT_RT_FULL case, migrate_enable()/migrate_disable()
are still equal to preempt_enable()/preempt_disable(). So that
followed cpu_hotplug_begin()/cpu_unplug_begin(cpu) would go schedule()
to trigger schedule_debug() like this:
_cpu_down()
|
+ migrate_disable() = preempt_disable()
|
+ cpu_hotplug_begin() or cpu_unplug_begin()
|
+ schedule()
|
+ __schedule()
|
+ preempt_disable();
|
+ __schedule_bug() is true!
So we should move migrate_enable() as the original scheme.
Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
]
248/286 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: hotplug: Use set_cpus_allowed_ptr() in sync_unplug_thread()
Date: Tue, 24 Mar 2015 08:14:49 +0100
do_set_cpus_allowed() is not safe vs ->sched_class change.
crash> bt
PID: 11676 TASK: ffff88026f979da0 CPU: 22 COMMAND: "sync_unplug/22"
#0 [ffff880274d25bc8] machine_kexec at ffffffff8103b41c
#1 [ffff880274d25c18] crash_kexec at ffffffff810d881a
#2 [ffff880274d25cd8] oops_end at ffffffff81525818
#3 [ffff880274d25cf8] do_invalid_op at ffffffff81003096
#4 [ffff880274d25d90] invalid_op at ffffffff8152d3de
[exception RIP: set_cpus_allowed_rt+18]
RIP: ffffffff8109e012 RSP: ffff880274d25e48 RFLAGS: 00010202
RAX: ffffffff8109e000 RBX: ffff88026f979da0 RCX: ffff8802770cb6e8
RDX: 0000000000000000 RSI: ffffffff81add700 RDI: ffff88026f979da0
RBP: ffff880274d25e78 R8: ffffffff816112e0 R9: 0000000000000001
R10: 0000000000000001 R11: 0000000000011940 R12: ffff88026f979da0
R13: ffff8802770cb6d0 R14: ffff880274d25fd8 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#5 [ffff880274d25e60] do_set_cpus_allowed at ffffffff8108e65f
#6 [ffff880274d25e80] sync_unplug_thread at ffffffff81058c08
#7 [ffff880274d25ed8] kthread at ffffffff8107cad6
#8 [ffff880274d25f50] ret_from_fork at ffffffff8152bbbc
crash> task_struct ffff88026f979da0 | grep class
sched_class = 0xffffffff816111e0 <fair_sched_class+64>,
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
249/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rt/locking: Reenable migration accross schedule
Date: Mon, 8 Feb 2016 16:15:28 +0100
We currently disable migration across lock acquisition. That includes the part
where we block on the lock and schedule out. We cannot disable migration after
taking the lock as that would cause a possible lock inversion.
But we can be smart and enable migration when we block and schedule out. That
allows the scheduler to place the task freely at least if this is the first
migrate disable level. For nested locking this does not help at all.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
250/286 [
Author: John Kacur
Email: jkacur@redhat.com
Subject: scsi: qla2xxx: Use local_irq_save_nort() in qla2x00_poll
Date: Fri, 27 Apr 2012 12:48:46 +0200
RT triggers the following:
[ 11.307652] [<ffffffff81077b27>] __might_sleep+0xe7/0x110
[ 11.307663] [<ffffffff8150e524>] rt_spin_lock+0x24/0x60
[ 11.307670] [<ffffffff8150da78>] ? rt_spin_lock_slowunlock+0x78/0x90
[ 11.307703] [<ffffffffa0272d83>] qla24xx_intr_handler+0x63/0x2d0 [qla2xxx]
[ 11.307736] [<ffffffffa0262307>] qla2x00_poll+0x67/0x90 [qla2xxx]
Function qla2x00_poll does local_irq_save() before calling qla24xx_intr_handler
which has a spinlock. Since spinlocks are sleepable on rt, it is not allowed
to call them with interrupts disabled. Therefore we use local_irq_save_nort()
instead which saves flags without disabling interrupts.
This fix needs to be applied to v3.0-rt, v3.2-rt and v3.4-rt
Suggested-by: Thomas Gleixner
Signed-off-by: John Kacur <jkacur@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: David Sommerseth <davids@redhat.com>
Link: http://lkml.kernel.org/r/1335523726-10024-1-git-send-email-jkacur@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
251/286 [
Author: Priyanka Jain
Email: Priyanka.Jain@freescale.com
Subject: net: Remove preemption disabling in netif_rx()
Date: Thu, 17 May 2012 09:35:11 +0530
1)enqueue_to_backlog() (called from netif_rx) should be
bind to a particluar CPU. This can be achieved by
disabling migration. No need to disable preemption
2)Fixes crash "BUG: scheduling while atomic: ksoftirqd"
in case of RT.
If preemption is disabled, enqueue_to_backog() is called
in atomic context. And if backlog exceeds its count,
kfree_skb() is called. But in RT, kfree_skb() might
gets scheduled out, so it expects non atomic context.
3)When CONFIG_PREEMPT_RT_FULL is not defined,
migrate_enable(), migrate_disable() maps to
preempt_enable() and preempt_disable(), so no
change in functionality in case of non-RT.
-Replace preempt_enable(), preempt_disable() with
migrate_enable(), migrate_disable() respectively
-Replace get_cpu(), put_cpu() with get_cpu_light(),
put_cpu_light() respectively
Signed-off-by: Priyanka Jain <Priyanka.Jain@freescale.com>
Acked-by: Rajan Srivastava <Rajan.Srivastava@freescale.com>
Cc: <rostedt@goodmis.orgn>
Link: http://lkml.kernel.org/r/1337227511-2271-1-git-send-email-Priyanka.Jain@freescale.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
252/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: net: Another local_irq_disable/kmalloc headache
Date: Wed, 26 Sep 2012 16:21:08 +0200
Replace it by a local lock. Though that's pretty inefficient :(
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
253/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net/core: protect users of napi_alloc_cache against reentrance
Date: Fri, 15 Jan 2016 16:33:34 +0100
On -RT the code running in BH can not be moved to another CPU so CPU
local variable remain local. However the code can be preempted
and another task may enter BH accessing the same CPU using the same
napi_alloc_cache variable.
This patch ensures that each user of napi_alloc_cache uses a local lock.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
254/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: net: netfilter: Serialize xt_write_recseq sections on RT
Date: Sun, 28 Oct 2012 11:18:08 +0100
The netfilter code relies only on the implicit semantics of
local_bh_disable() for serializing wt_write_recseq sections. RT breaks
that and needs explicit serialization here.
Reported-by: Peter LaDow <petela@gocougs.wsu.edu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
255/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net: Add a mutex around devnet_rename_seq
Date: Wed, 20 Mar 2013 18:06:20 +0100
On RT write_seqcount_begin() disables preemption and device_rename()
allocates memory with GFP_KERNEL and grabs later the sysfs_mutex
mutex. Serialize with a mutex and add use the non preemption disabling
__write_seqcount_begin().
To avoid writer starvation, let the reader grab the mutex and release
it when it detects a writer in progress. This keeps the normal case
(no reader on the fly) fast.
[ tglx: Instead of replacing the seqcount by a mutex, add the mutex ]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
256/286 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: crypto: Convert crypto notifier chain to SRCU
Date: Fri, 5 Oct 2012 09:03:24 +0100
The crypto notifier deadlocks on RT. Though this can be a real deadlock
on mainline as well due to fifo fair rwsems.
The involved parties here are:
[ 82.172678] swapper/0 S 0000000000000001 0 1 0 0x00000000
[ 82.172682] ffff88042f18fcf0 0000000000000046 ffff88042f18fc80 ffffffff81491238
[ 82.172685] 0000000000011cc0 0000000000011cc0 ffff88042f18c040 ffff88042f18ffd8
[ 82.172688] 0000000000011cc0 0000000000011cc0 ffff88042f18ffd8 0000000000011cc0
[ 82.172689] Call Trace:
[ 82.172697] [<ffffffff81491238>] ? _raw_spin_unlock_irqrestore+0x6c/0x7a
[ 82.172701] [<ffffffff8148fd3f>] schedule+0x64/0x66
[ 82.172704] [<ffffffff8148ec6b>] schedule_timeout+0x27/0xd0
[ 82.172708] [<ffffffff81043c0c>] ? unpin_current_cpu+0x1a/0x6c
[ 82.172713] [<ffffffff8106e491>] ? migrate_enable+0x12f/0x141
[ 82.172716] [<ffffffff8148fbbd>] wait_for_common+0xbb/0x11f
[ 82.172719] [<ffffffff810709f2>] ? try_to_wake_up+0x182/0x182
[ 82.172722] [<ffffffff8148fc96>] wait_for_completion_interruptible+0x1d/0x2e
[ 82.172726] [<ffffffff811debfd>] crypto_wait_for_test+0x49/0x6b
[ 82.172728] [<ffffffff811ded32>] crypto_register_alg+0x53/0x5a
[ 82.172730] [<ffffffff811ded6c>] crypto_register_algs+0x33/0x72
[ 82.172734] [<ffffffff81ad7686>] ? aes_init+0x12/0x12
[ 82.172737] [<ffffffff81ad76ea>] aesni_init+0x64/0x66
[ 82.172741] [<ffffffff81000318>] do_one_initcall+0x7f/0x13b
[ 82.172744] [<ffffffff81ac4d34>] kernel_init+0x199/0x22c
[ 82.172747] [<ffffffff81ac44ef>] ? loglevel+0x31/0x31
[ 82.172752] [<ffffffff814987c4>] kernel_thread_helper+0x4/0x10
[ 82.172755] [<ffffffff81491574>] ? retint_restore_args+0x13/0x13
[ 82.172759] [<ffffffff81ac4b9b>] ? start_kernel+0x3ca/0x3ca
[ 82.172761] [<ffffffff814987c0>] ? gs_change+0x13/0x13
[ 82.174186] cryptomgr_test S 0000000000000001 0 41 2 0x00000000
[ 82.174189] ffff88042c971980 0000000000000046 ffffffff81d74830 0000000000000292
[ 82.174192] 0000000000011cc0 0000000000011cc0 ffff88042c96eb80 ffff88042c971fd8
[ 82.174195] 0000000000011cc0 0000000000011cc0 ffff88042c971fd8 0000000000011cc0
[ 82.174195] Call Trace:
[ 82.174198] [<ffffffff8148fd3f>] schedule+0x64/0x66
[ 82.174201] [<ffffffff8148ec6b>] schedule_timeout+0x27/0xd0
[ 82.174204] [<ffffffff81043c0c>] ? unpin_current_cpu+0x1a/0x6c
[ 82.174206] [<ffffffff8106e491>] ? migrate_enable+0x12f/0x141
[ 82.174209] [<ffffffff8148fbbd>] wait_for_common+0xbb/0x11f
[ 82.174212] [<ffffffff810709f2>] ? try_to_wake_up+0x182/0x182
[ 82.174215] [<ffffffff8148fc96>] wait_for_completion_interruptible+0x1d/0x2e
[ 82.174218] [<ffffffff811e4883>] cryptomgr_notify+0x280/0x385
[ 82.174221] [<ffffffff814943de>] notifier_call_chain+0x6b/0x98
[ 82.174224] [<ffffffff8108a11c>] ? rt_down_read+0x10/0x12
[ 82.174227] [<ffffffff810677cd>] __blocking_notifier_call_chain+0x70/0x8d
[ 82.174230] [<ffffffff810677fe>] blocking_notifier_call_chain+0x14/0x16
[ 82.174234] [<ffffffff811dd272>] crypto_probing_notify+0x24/0x50
[ 82.174236] [<ffffffff811dd7a1>] crypto_alg_mod_lookup+0x3e/0x74
[ 82.174238] [<ffffffff811dd949>] crypto_alloc_base+0x36/0x8f
[ 82.174241] [<ffffffff811e9408>] cryptd_alloc_ablkcipher+0x6e/0xb5
[ 82.174243] [<ffffffff811dd591>] ? kzalloc.clone.5+0xe/0x10
[ 82.174246] [<ffffffff8103085d>] ablk_init_common+0x1d/0x38
[ 82.174249] [<ffffffff8103852a>] ablk_ecb_init+0x15/0x17
[ 82.174251] [<ffffffff811dd8c6>] __crypto_alloc_tfm+0xc7/0x114
[ 82.174254] [<ffffffff811e0caa>] ? crypto_lookup_skcipher+0x1f/0xe4
[ 82.174256] [<ffffffff811e0dcf>] crypto_alloc_ablkcipher+0x60/0xa5
[ 82.174258] [<ffffffff811e5bde>] alg_test_skcipher+0x24/0x9b
[ 82.174261] [<ffffffff8106d96d>] ? finish_task_switch+0x3f/0xfa
[ 82.174263] [<ffffffff811e6b8e>] alg_test+0x16f/0x1d7
[ 82.174267] [<ffffffff811e45ac>] ? cryptomgr_probe+0xac/0xac
[ 82.174269] [<ffffffff811e45d8>] cryptomgr_test+0x2c/0x47
[ 82.174272] [<ffffffff81061161>] kthread+0x7e/0x86
[ 82.174275] [<ffffffff8106d9dd>] ? finish_task_switch+0xaf/0xfa
[ 82.174278] [<ffffffff814987c4>] kernel_thread_helper+0x4/0x10
[ 82.174281] [<ffffffff81491574>] ? retint_restore_args+0x13/0x13
[ 82.174284] [<ffffffff810610e3>] ? __init_kthread_worker+0x8c/0x8c
[ 82.174287] [<ffffffff814987c0>] ? gs_change+0x13/0x13
[ 82.174329] cryptomgr_probe D 0000000000000002 0 47 2 0x00000000
[ 82.174332] ffff88042c991b70 0000000000000046 ffff88042c991bb0 0000000000000006
[ 82.174335] 0000000000011cc0 0000000000011cc0 ffff88042c98ed00 ffff88042c991fd8
[ 82.174338] 0000000000011cc0 0000000000011cc0 ffff88042c991fd8 0000000000011cc0
[ 82.174338] Call Trace:
[ 82.174342] [<ffffffff8148fd3f>] schedule+0x64/0x66
[ 82.174344] [<ffffffff814901ad>] __rt_mutex_slowlock+0x85/0xbe
[ 82.174347] [<ffffffff814902d2>] rt_mutex_slowlock+0xec/0x159
[ 82.174351] [<ffffffff81089c4d>] rt_mutex_fastlock.clone.8+0x29/0x2f
[ 82.174353] [<ffffffff81490372>] rt_mutex_lock+0x33/0x37
[ 82.174356] [<ffffffff8108a0f2>] __rt_down_read+0x50/0x5a
[ 82.174358] [<ffffffff8108a11c>] ? rt_down_read+0x10/0x12
[ 82.174360] [<ffffffff8108a11c>] rt_down_read+0x10/0x12
[ 82.174363] [<ffffffff810677b5>] __blocking_notifier_call_chain+0x58/0x8d
[ 82.174366] [<ffffffff810677fe>] blocking_notifier_call_chain+0x14/0x16
[ 82.174369] [<ffffffff811dd272>] crypto_probing_notify+0x24/0x50
[ 82.174372] [<ffffffff811debd6>] crypto_wait_for_test+0x22/0x6b
[ 82.174374] [<ffffffff811decd3>] crypto_register_instance+0xb4/0xc0
[ 82.174377] [<ffffffff811e9b76>] cryptd_create+0x378/0x3b6
[ 82.174379] [<ffffffff811de512>] ? __crypto_lookup_template+0x5b/0x63
[ 82.174382] [<ffffffff811e4545>] cryptomgr_probe+0x45/0xac
[ 82.174385] [<ffffffff811e4500>] ? crypto_alloc_pcomp+0x1b/0x1b
[ 82.174388] [<ffffffff81061161>] kthread+0x7e/0x86
[ 82.174391] [<ffffffff8106d9dd>] ? finish_task_switch+0xaf/0xfa
[ 82.174394] [<ffffffff814987c4>] kernel_thread_helper+0x4/0x10
[ 82.174398] [<ffffffff81491574>] ? retint_restore_args+0x13/0x13
[ 82.174401] [<ffffffff810610e3>] ? __init_kthread_worker+0x8c/0x8c
[ 82.174403] [<ffffffff814987c0>] ? gs_change+0x13/0x13
cryptomgr_test spawns the cryptomgr_probe thread from the notifier
call. The probe thread fires the same notifier as the test thread and
deadlocks on the rwsem on RT.
Now this is a potential deadlock in mainline as well, because we have
fifo fair rwsems. If another thread blocks with a down_write() on the
notifier chain before the probe thread issues the down_read() it will
block the probe thread and the whole party is dead locked.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
257/286 [
Author: Yong Zhang
Email: yong.zhang@windriver.com
Subject: lockdep: selftest: Only do hardirq context test for raw spinlock
Date: Mon, 16 Apr 2012 15:01:56 +0800
On -rt there is no softirq context any more and rwlock is sleepable,
disable softirq context test and rwlock+irq test.
Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Cc: Yong Zhang <yong.zhang@windriver.com>
Link: http://lkml.kernel.org/r/1334559716-18447-3-git-send-email-yong.zhang0@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
258/286 [
Author: Josh Cartwright
Email: josh.cartwright@ni.com
Subject: lockdep: selftest: fix warnings due to missing PREEMPT_RT conditionals
Date: Wed, 28 Jan 2015 13:08:45 -0600
"lockdep: Selftest: Only do hardirq context test for raw spinlock"
disabled the execution of certain tests with PREEMPT_RT_FULL, but did
not prevent the tests from still being defined. This leads to warnings
like:
./linux/lib/locking-selftest.c:574:1: warning: 'irqsafe1_hard_rlock_12' defined but not used [-Wunused-function]
./linux/lib/locking-selftest.c:574:1: warning: 'irqsafe1_hard_rlock_21' defined but not used [-Wunused-function]
./linux/lib/locking-selftest.c:577:1: warning: 'irqsafe1_hard_wlock_12' defined but not used [-Wunused-function]
./linux/lib/locking-selftest.c:577:1: warning: 'irqsafe1_hard_wlock_21' defined but not used [-Wunused-function]
./linux/lib/locking-selftest.c:580:1: warning: 'irqsafe1_soft_spin_12' defined but not used [-Wunused-function]
...
Fixed by wrapping the test definitions in #ifndef CONFIG_PREEMPT_RT_FULL
conditionals.
Signed-off-by: Josh Cartwright <josh.cartwright@ni.com>
Signed-off-by: Xander Huff <xander.huff@ni.com>
Acked-by: Gratian Crisan <gratian.crisan@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
259/286 [
Author: Yong Zhang
Email: yong.zhang@windriver.com
Subject: perf: Make swevent hrtimer run in irq instead of softirq
Date: Wed, 11 Jul 2012 22:05:21 +0000
Otherwise we get a deadlock like below:
[ 1044.042749] BUG: scheduling while atomic: ksoftirqd/21/141/0x00010003
[ 1044.042752] INFO: lockdep is turned off.
[ 1044.042754] Modules linked in:
[ 1044.042757] Pid: 141, comm: ksoftirqd/21 Tainted: G W 3.4.0-rc2-rt3-23676-ga723175-dirty #29
[ 1044.042759] Call Trace:
[ 1044.042761] <IRQ> [<ffffffff8107d8e5>] __schedule_bug+0x65/0x80
[ 1044.042770] [<ffffffff8168978c>] __schedule+0x83c/0xa70
[ 1044.042775] [<ffffffff8106bdd2>] ? prepare_to_wait+0x32/0xb0
[ 1044.042779] [<ffffffff81689a5e>] schedule+0x2e/0xa0
[ 1044.042782] [<ffffffff81071ebd>] hrtimer_wait_for_timer+0x6d/0xb0
[ 1044.042786] [<ffffffff8106bb30>] ? wake_up_bit+0x40/0x40
[ 1044.042790] [<ffffffff81071f20>] hrtimer_cancel+0x20/0x40
[ 1044.042794] [<ffffffff8111da0c>] perf_swevent_cancel_hrtimer+0x3c/0x50
[ 1044.042798] [<ffffffff8111da31>] task_clock_event_stop+0x11/0x40
[ 1044.042802] [<ffffffff8111da6e>] task_clock_event_del+0xe/0x10
[ 1044.042805] [<ffffffff8111c568>] event_sched_out+0x118/0x1d0
[ 1044.042809] [<ffffffff8111c649>] group_sched_out+0x29/0x90
[ 1044.042813] [<ffffffff8111ed7e>] __perf_event_disable+0x18e/0x200
[ 1044.042817] [<ffffffff8111c343>] remote_function+0x63/0x70
[ 1044.042821] [<ffffffff810b0aae>] generic_smp_call_function_single_interrupt+0xce/0x120
[ 1044.042826] [<ffffffff81022bc7>] smp_call_function_single_interrupt+0x27/0x40
[ 1044.042831] [<ffffffff8168d50c>] call_function_single_interrupt+0x6c/0x80
[ 1044.042833] <EOI> [<ffffffff811275b0>] ? perf_event_overflow+0x20/0x20
[ 1044.042840] [<ffffffff8168b970>] ? _raw_spin_unlock_irq+0x30/0x70
[ 1044.042844] [<ffffffff8168b976>] ? _raw_spin_unlock_irq+0x36/0x70
[ 1044.042848] [<ffffffff810702e2>] run_hrtimer_softirq+0xc2/0x200
[ 1044.042853] [<ffffffff811275b0>] ? perf_event_overflow+0x20/0x20
[ 1044.042857] [<ffffffff81045265>] __do_softirq_common+0xf5/0x3a0
[ 1044.042862] [<ffffffff81045c3d>] __thread_do_softirq+0x15d/0x200
[ 1044.042865] [<ffffffff81045dda>] run_ksoftirqd+0xfa/0x210
[ 1044.042869] [<ffffffff81045ce0>] ? __thread_do_softirq+0x200/0x200
[ 1044.042873] [<ffffffff81045ce0>] ? __thread_do_softirq+0x200/0x200
[ 1044.042877] [<ffffffff8106b596>] kthread+0xb6/0xc0
[ 1044.042881] [<ffffffff8168b97b>] ? _raw_spin_unlock_irq+0x3b/0x70
[ 1044.042886] [<ffffffff8168d994>] kernel_thread_helper+0x4/0x10
[ 1044.042889] [<ffffffff8107d98c>] ? finish_task_switch+0x8c/0x110
[ 1044.042894] [<ffffffff8168b97b>] ? _raw_spin_unlock_irq+0x3b/0x70
[ 1044.042897] [<ffffffff8168bd5d>] ? retint_restore_args+0xe/0xe
[ 1044.042900] [<ffffffff8106b4e0>] ? kthreadd+0x1e0/0x1e0
[ 1044.042902] [<ffffffff8168d990>] ? gs_change+0xb/0xb
Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1341476476-5666-1-git-send-email-yong.zhang0@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
]
260/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: kernel/perf: mark perf_cpu_context's timer as irqsafe
Date: Thu, 4 Feb 2016 16:38:10 +0100
Otherwise we get a WARN_ON() backtrace and some events are reported as
"not counted".
Cc: stable-rt@vger.kernel.org
Reported-by: Yang Shi <yang.shi@linaro.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
261/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rcu: Disable RCU_FAST_NO_HZ on RT
Date: Sun, 28 Oct 2012 13:26:09 +0000
This uses a timer_list timer from the irq disabled guts of the idle
code. Disable it for now to prevent wreckage.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
262/286 [
Author: "Paul E. McKenney"
Email: paulmck@linux.vnet.ibm.com
Subject: rcu: Eliminate softirq processing from rcutree
Date: Mon, 4 Nov 2013 13:21:10 -0800
Running RCU out of softirq is a problem for some workloads that would
like to manage RCU core processing independently of other softirq work,
for example, setting kthread priority. This commit therefore moves the
RCU core work from softirq to a per-CPU/per-flavor SCHED_OTHER kthread
named rcuc. The SCHED_OTHER approach avoids the scalability problems
that appeared with the earlier attempt to move RCU core processing to
from softirq to kthreads. That said, kernels built with RCU_BOOST=y
will run the rcuc kthreads at the RCU-boosting priority.
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
263/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: rcu: make RCU_BOOST default on RT
Date: Fri, 21 Mar 2014 20:19:05 +0100
Since it is no longer invoked from the softirq people run into OOM more
often if the priority of the RCU thread is too low. Making boosting
default on RT should help in those case and it can be switched off if
someone knows better.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
264/286 [
Author: Julia Cartwright
Email: julia@ni.com
Subject: rcu: enable rcu_normal_after_boot by default for RT
Date: Wed, 12 Oct 2016 11:21:14 -0500
The forcing of an expedited grace period is an expensive and very
RT-application unfriendly operation, as it forcibly preempts all running
tasks on CPUs which are preventing the gp from expiring.
By default, as a policy decision, disable the expediting of grace
periods (after boot) on configurations which enable PREEMPT_RT_FULL.
Suggested-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Julia Cartwright <julia@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
265/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Add support for lazy preemption
Date: Fri, 26 Oct 2012 18:50:54 +0100
It has become an obsession to mitigate the determinism vs. throughput
loss of RT. Looking at the mainline semantics of preemption points
gives a hint why RT sucks throughput wise for ordinary SCHED_OTHER
tasks. One major issue is the wakeup of tasks which are right away
preempting the waking task while the waking task holds a lock on which
the woken task will block right after having preempted the wakee. In
mainline this is prevented due to the implicit preemption disable of
spin/rw_lock held regions. On RT this is not possible due to the fully
preemptible nature of sleeping spinlocks.
Though for a SCHED_OTHER task preempting another SCHED_OTHER task this
is really not a correctness issue. RT folks are concerned about
SCHED_FIFO/RR tasks preemption and not about the purely fairness
driven SCHED_OTHER preemption latencies.
So I introduced a lazy preemption mechanism which only applies to
SCHED_OTHER tasks preempting another SCHED_OTHER task. Aside of the
existing preempt_count each tasks sports now a preempt_lazy_count
which is manipulated on lock acquiry and release. This is slightly
incorrect as for lazyness reasons I coupled this on
migrate_disable/enable so some other mechanisms get the same treatment
(e.g. get_cpu_light).
Now on the scheduler side instead of setting NEED_RESCHED this sets
NEED_RESCHED_LAZY in case of a SCHED_OTHER/SCHED_OTHER preemption and
therefor allows to exit the waking task the lock held region before
the woken task preempts. That also works better for cross CPU wakeups
as the other side can stay in the adaptive spinning loop.
For RT class preemption there is no change. This simply sets
NEED_RESCHED and forgoes the lazy preemption counter.
Initial test do not expose any observable latency increasement, but
history shows that I've been proven wrong before :)
The lazy preemption mode is per default on, but with
CONFIG_SCHED_DEBUG enabled it can be disabled via:
# echo NO_PREEMPT_LAZY >/sys/kernel/debug/sched_features
and reenabled via
# echo PREEMPT_LAZY >/sys/kernel/debug/sched_features
The test results so far are very machine and workload dependent, but
there is a clear trend that it enhances the non RT workload
performance.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
266/286 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: ftrace: Fix trace header alignment
Date: Sun, 16 Oct 2016 05:08:30 +0200
Line up helper arrows to the right column.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
[bigeasy: fixup function tracer header]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
267/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: x86: Support for lazy preemption
Date: Thu, 1 Nov 2012 11:03:47 +0100
Implement the x86 pieces for lazy preempt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
268/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: arm: Add support for lazy preemption
Date: Wed, 31 Oct 2012 12:04:11 +0100
Implement the arm pieces for lazy preempt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
269/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: powerpc: Add support for lazy preemption
Date: Thu, 1 Nov 2012 10:14:11 +0100
Implement the powerpc pieces for lazy preempt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
270/286 [
Author: Anders Roxell
Email: anders.roxell@linaro.org
Subject: arch/arm64: Add lazy preempt support
Date: Thu, 14 May 2015 17:52:17 +0200
arm64 is missing support for PREEMPT_RT. The main feature which is
lacking is support for lazy preemption. The arch-specific entry code,
thread information structure definitions, and associated data tables
have to be extended to provide this support. Then the Kconfig file has
to be extended to indicate the support is available, and also to
indicate that support for full RT preemption is now available.
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
]
271/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: leds: trigger: disable CPU trigger on -RT
Date: Thu, 23 Jan 2014 14:45:59 +0100
as it triggers:
|CPU: 0 PID: 0 Comm: swapper Not tainted 3.12.8-rt10 #141
|[<c0014aa4>] (unwind_backtrace+0x0/0xf8) from [<c0012788>] (show_stack+0x1c/0x20)
|[<c0012788>] (show_stack+0x1c/0x20) from [<c043c8dc>] (dump_stack+0x20/0x2c)
|[<c043c8dc>] (dump_stack+0x20/0x2c) from [<c004c5e8>] (__might_sleep+0x13c/0x170)
|[<c004c5e8>] (__might_sleep+0x13c/0x170) from [<c043f270>] (__rt_spin_lock+0x28/0x38)
|[<c043f270>] (__rt_spin_lock+0x28/0x38) from [<c043fa00>] (rt_read_lock+0x68/0x7c)
|[<c043fa00>] (rt_read_lock+0x68/0x7c) from [<c036cf74>] (led_trigger_event+0x2c/0x5c)
|[<c036cf74>] (led_trigger_event+0x2c/0x5c) from [<c036e0bc>] (ledtrig_cpu+0x54/0x5c)
|[<c036e0bc>] (ledtrig_cpu+0x54/0x5c) from [<c000ffd8>] (arch_cpu_idle_exit+0x18/0x1c)
|[<c000ffd8>] (arch_cpu_idle_exit+0x18/0x1c) from [<c00590b8>] (cpu_startup_entry+0xa8/0x234)
|[<c00590b8>] (cpu_startup_entry+0xa8/0x234) from [<c043b2cc>] (rest_init+0xb8/0xe0)
|[<c043b2cc>] (rest_init+0xb8/0xe0) from [<c061ebe0>] (start_kernel+0x2c4/0x380)
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
272/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mmci: Remove bogus local_irq_save()
Date: Wed, 9 Jan 2013 12:11:12 +0100
On !RT interrupt runs with interrupts disabled. On RT it's in a
thread, so no need to disable interrupts at all.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
273/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: cpufreq: drop K8's driver from beeing selected
Date: Thu, 9 Apr 2015 15:23:01 +0200
Ralf posted a picture of a backtrace from
| powernowk8_target_fn() -> transition_frequency_fidvid() and then at the
| end:
| 932 policy = cpufreq_cpu_get(smp_processor_id());
| 933 cpufreq_cpu_put(policy);
crashing the system on -RT. I assumed that policy was a NULL pointer but
was rulled out. Since Ralf can't do any more investigations on this and
I have no machine with this, I simply switch it off.
Reported-by: Ralf Mardorf <ralf.mardorf@alice-dsl.net>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
274/286 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: connector/cn_proc: Protect send_msg() with a local lock on RT
Date: Sun, 16 Oct 2016 05:11:54 +0200
|BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:931
|in_atomic(): 1, irqs_disabled(): 0, pid: 31807, name: sleep
|Preemption disabled at:[<ffffffff8148019b>] proc_exit_connector+0xbb/0x140
|
|CPU: 4 PID: 31807 Comm: sleep Tainted: G W E 4.8.0-rt11-rt #106
|Call Trace:
| [<ffffffff813436cd>] dump_stack+0x65/0x88
| [<ffffffff8109c425>] ___might_sleep+0xf5/0x180
| [<ffffffff816406b0>] __rt_spin_lock+0x20/0x50
| [<ffffffff81640978>] rt_read_lock+0x28/0x30
| [<ffffffff8156e209>] netlink_broadcast_filtered+0x49/0x3f0
| [<ffffffff81522621>] ? __kmalloc_reserve.isra.33+0x31/0x90
| [<ffffffff8156e5cd>] netlink_broadcast+0x1d/0x20
| [<ffffffff8147f57a>] cn_netlink_send_mult+0x19a/0x1f0
| [<ffffffff8147f5eb>] cn_netlink_send+0x1b/0x20
| [<ffffffff814801d8>] proc_exit_connector+0xf8/0x140
| [<ffffffff81077f71>] do_exit+0x5d1/0xba0
| [<ffffffff810785cc>] do_group_exit+0x4c/0xc0
| [<ffffffff81078654>] SyS_exit_group+0x14/0x20
| [<ffffffff81640a72>] entry_SYSCALL_64_fastpath+0x1a/0xa4
Since ab8ed951080e ("connector: fix out-of-order cn_proc netlink message
delivery") which is v4.7-rc6.
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
275/286 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: drivers/block/zram: Replace bit spinlocks with rtmutex for -rt
Date: Thu, 31 Mar 2016 04:08:28 +0200
They're nondeterministic, and lead to ___might_sleep() splats in -rt.
OTOH, they're a lot less wasteful than an rtmutex per page.
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
276/286 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: drivers/zram: Don't disable preemption in zcomp_stream_get/put()
Date: Thu, 20 Oct 2016 11:15:22 +0200
In v4.7, the driver switched to percpu compression streams, disabling
preemption via get/put_cpu_ptr(). Use a per-zcomp_strm lock here. We
also have to fix an lock order issue in zram_decompress_page() such
that zs_map_object() nests inside of zcomp_stream_put() as it does in
zram_bvec_write().
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
[bigeasy: get_locked_var() -> per zcomp_strm lock]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
277/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: drm/i915: drop trace_i915_gem_ring_dispatch on rt
Date: Thu, 25 Apr 2013 18:12:52 +0200
This tracepoint is responsible for:
|[<814cc358>] __schedule_bug+0x4d/0x59
|[<814d24cc>] __schedule+0x88c/0x930
|[<814d3b90>] ? _raw_spin_unlock_irqrestore+0x40/0x50
|[<814d3b95>] ? _raw_spin_unlock_irqrestore+0x45/0x50
|[<810b57b5>] ? task_blocks_on_rt_mutex+0x1f5/0x250
|[<814d27d9>] schedule+0x29/0x70
|[<814d3423>] rt_spin_lock_slowlock+0x15b/0x278
|[<814d3786>] rt_spin_lock+0x26/0x30
|[<a00dced9>] gen6_gt_force_wake_get+0x29/0x60 [i915]
|[<a00e183f>] gen6_ring_get_irq+0x5f/0x100 [i915]
|[<a00b2a33>] ftrace_raw_event_i915_gem_ring_dispatch+0xe3/0x100 [i915]
|[<a00ac1b3>] i915_gem_do_execbuffer.isra.13+0xbd3/0x1430 [i915]
|[<810f8943>] ? trace_buffer_unlock_commit+0x43/0x60
|[<8113e8d2>] ? ftrace_raw_event_kmem_alloc+0xd2/0x180
|[<8101d063>] ? native_sched_clock+0x13/0x80
|[<a00acf29>] i915_gem_execbuffer2+0x99/0x280 [i915]
|[<a00114a3>] drm_ioctl+0x4c3/0x570 [drm]
|[<8101d0d9>] ? sched_clock+0x9/0x10
|[<a00ace90>] ? i915_gem_execbuffer+0x480/0x480 [i915]
|[<810f1c18>] ? rb_commit+0x68/0xa0
|[<810f1c6c>] ? ring_buffer_unlock_commit+0x1c/0xa0
|[<81197467>] do_vfs_ioctl+0x97/0x540
|[<81021318>] ? ftrace_raw_event_sys_enter+0xd8/0x130
|[<811979a1>] sys_ioctl+0x91/0xb0
|[<814db931>] tracesys+0xe1/0xe6
Chris Wilson does not like to move i915_trace_irq_get() out of the macro
|No. This enables the IRQ, as well as making a number of
|very expensively serialised read, unconditionally.
so it is gone now on RT.
Reported-by: Joakim Hernberg <jbh@alchemy.lu>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
278/286 [
Author: Clark Williams
Email: williams@redhat.com
Subject: i915: bogus warning from i915 when running on PREEMPT_RT
Date: Tue, 26 May 2015 10:43:43 -0500
The i915 driver has a 'WARN_ON(!in_interrupt())' in the display
handler, which whines constanly on the RT kernel (since the interrupt
is actually handled in a threaded handler and not actual interrupt
context).
Change the WARN_ON to WARN_ON_NORT
Tested-by: Joakim Hernberg <jhernberg@alchemy.lu>
Signed-off-by: Clark Williams <williams@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
279/286 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: drm,radeon,i915: Use preempt_disable/enable_rt() where recommended
Date: Sat, 27 Feb 2016 08:09:11 +0100
DRM folks identified the spots, so use them.
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-rt-users <linux-rt-users@vger.kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
280/286 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: drm,i915: Use local_lock/unlock_irq() in intel_pipe_update_start/end()
Date: Sat, 27 Feb 2016 09:01:42 +0100
[ 8.014039] BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:918
[ 8.014041] in_atomic(): 0, irqs_disabled(): 1, pid: 78, name: kworker/u4:4
[ 8.014045] CPU: 1 PID: 78 Comm: kworker/u4:4 Not tainted 4.1.7-rt7 #5
[ 8.014055] Workqueue: events_unbound async_run_entry_fn
[ 8.014059] 0000000000000000 ffff880037153748 ffffffff815f32c9 0000000000000002
[ 8.014063] ffff88013a50e380 ffff880037153768 ffffffff815ef075 ffff8800372c06c8
[ 8.014066] ffff8800372c06c8 ffff880037153778 ffffffff8107c0b3 ffff880037153798
[ 8.014067] Call Trace:
[ 8.014074] [<ffffffff815f32c9>] dump_stack+0x4a/0x61
[ 8.014078] [<ffffffff815ef075>] ___might_sleep.part.93+0xe9/0xee
[ 8.014082] [<ffffffff8107c0b3>] ___might_sleep+0x53/0x80
[ 8.014086] [<ffffffff815f9064>] rt_spin_lock+0x24/0x50
[ 8.014090] [<ffffffff8109368b>] prepare_to_wait+0x2b/0xa0
[ 8.014152] [<ffffffffa016c04c>] intel_pipe_update_start+0x17c/0x300 [i915]
[ 8.014156] [<ffffffff81093b40>] ? prepare_to_wait_event+0x120/0x120
[ 8.014201] [<ffffffffa0158f36>] intel_begin_crtc_commit+0x166/0x1e0 [i915]
[ 8.014215] [<ffffffffa00c806d>] drm_atomic_helper_commit_planes+0x5d/0x1a0 [drm_kms_helper]
[ 8.014260] [<ffffffffa0171e9b>] intel_atomic_commit+0xab/0xf0 [i915]
[ 8.014288] [<ffffffffa00654c7>] drm_atomic_commit+0x37/0x60 [drm]
[ 8.014298] [<ffffffffa00c6fcd>] drm_atomic_helper_plane_set_property+0x8d/0xd0 [drm_kms_helper]
[ 8.014301] [<ffffffff815f77d9>] ? __ww_mutex_lock+0x39/0x40
[ 8.014319] [<ffffffffa0053b3d>] drm_mode_plane_set_obj_prop+0x2d/0x90 [drm]
[ 8.014328] [<ffffffffa00c8edb>] restore_fbdev_mode+0x6b/0xf0 [drm_kms_helper]
[ 8.014337] [<ffffffffa00cae49>] drm_fb_helper_restore_fbdev_mode_unlocked+0x29/0x80 [drm_kms_helper]
[ 8.014346] [<ffffffffa00caec2>] drm_fb_helper_set_par+0x22/0x50 [drm_kms_helper]
[ 8.014390] [<ffffffffa016dfba>] intel_fbdev_set_par+0x1a/0x60 [i915]
[ 8.014394] [<ffffffff81327dc4>] fbcon_init+0x4f4/0x580
[ 8.014398] [<ffffffff8139ef4c>] visual_init+0xbc/0x120
[ 8.014401] [<ffffffff813a1623>] do_bind_con_driver+0x163/0x330
[ 8.014405] [<ffffffff813a1b2c>] do_take_over_console+0x11c/0x1c0
[ 8.014408] [<ffffffff813236e3>] do_fbcon_takeover+0x63/0xd0
[ 8.014410] [<ffffffff81328965>] fbcon_event_notify+0x785/0x8d0
[ 8.014413] [<ffffffff8107c12d>] ? __might_sleep+0x4d/0x90
[ 8.014416] [<ffffffff810775fe>] notifier_call_chain+0x4e/0x80
[ 8.014419] [<ffffffff810779cd>] __blocking_notifier_call_chain+0x4d/0x70
[ 8.014422] [<ffffffff81077a06>] blocking_notifier_call_chain+0x16/0x20
[ 8.014425] [<ffffffff8132b48b>] fb_notifier_call_chain+0x1b/0x20
[ 8.014428] [<ffffffff8132d8fa>] register_framebuffer+0x21a/0x350
[ 8.014439] [<ffffffffa00cb164>] drm_fb_helper_initial_config+0x274/0x3e0 [drm_kms_helper]
[ 8.014483] [<ffffffffa016f1cb>] intel_fbdev_initial_config+0x1b/0x20 [i915]
[ 8.014486] [<ffffffff8107912c>] async_run_entry_fn+0x4c/0x160
[ 8.014490] [<ffffffff81070ffa>] process_one_work+0x14a/0x470
[ 8.014493] [<ffffffff81071489>] worker_thread+0x169/0x4c0
[ 8.014496] [<ffffffff81071320>] ? process_one_work+0x470/0x470
[ 8.014499] [<ffffffff81076606>] kthread+0xc6/0xe0
[ 8.014502] [<ffffffff81070000>] ? queue_work_on+0x80/0x110
[ 8.014506] [<ffffffff81076540>] ? kthread_worker_fn+0x1c0/0x1c0
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-rt-users <linux-rt-users@vger.kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
281/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: cgroups: use simple wait in css_release()
Date: Fri, 13 Feb 2015 15:52:24 +0100
To avoid:
|BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:914
|in_atomic(): 1, irqs_disabled(): 0, pid: 92, name: rcuc/11
|2 locks held by rcuc/11/92:
| #0: (rcu_callback){......}, at: [<ffffffff810e037e>] rcu_cpu_kthread+0x3de/0x940
| #1: (rcu_read_lock_sched){......}, at: [<ffffffff81328390>] percpu_ref_call_confirm_rcu+0x0/0xd0
|Preemption disabled at:[<ffffffff813284e2>] percpu_ref_switch_to_atomic_rcu+0x82/0xc0
|CPU: 11 PID: 92 Comm: rcuc/11 Not tainted 3.18.7-rt0+ #1
| ffff8802398cdf80 ffff880235f0bc28 ffffffff815b3a12 0000000000000000
| 0000000000000000 ffff880235f0bc48 ffffffff8109aa16 0000000000000000
| ffff8802398cdf80 ffff880235f0bc78 ffffffff815b8dd4 000000000000df80
|Call Trace:
| [<ffffffff815b3a12>] dump_stack+0x4f/0x7c
| [<ffffffff8109aa16>] __might_sleep+0x116/0x190
| [<ffffffff815b8dd4>] rt_spin_lock+0x24/0x60
| [<ffffffff8108d2cd>] queue_work_on+0x6d/0x1d0
| [<ffffffff8110c881>] css_release+0x81/0x90
| [<ffffffff8132844e>] percpu_ref_call_confirm_rcu+0xbe/0xd0
| [<ffffffff813284e2>] percpu_ref_switch_to_atomic_rcu+0x82/0xc0
| [<ffffffff810e03e5>] rcu_cpu_kthread+0x445/0x940
| [<ffffffff81098a2d>] smpboot_thread_fn+0x18d/0x2d0
| [<ffffffff810948d8>] kthread+0xe8/0x100
| [<ffffffff815b9c3c>] ret_from_fork+0x7c/0xb0
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
282/286 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: memcontrol: Prevent scheduling while atomic in cgroup code
Date: Sat, 21 Jun 2014 10:09:48 +0200
mm, memcg: make refill_stock() use get_cpu_light()
Nikita reported the following memcg scheduling while atomic bug:
Call Trace:
[e22d5a90] [c0007ea8] show_stack+0x4c/0x168 (unreliable)
[e22d5ad0] [c0618c04] __schedule_bug+0x94/0xb0
[e22d5ae0] [c060b9ec] __schedule+0x530/0x550
[e22d5bf0] [c060bacc] schedule+0x30/0xbc
[e22d5c00] [c060ca24] rt_spin_lock_slowlock+0x180/0x27c
[e22d5c70] [c00b39dc] res_counter_uncharge_until+0x40/0xc4
[e22d5ca0] [c013ca88] drain_stock.isra.20+0x54/0x98
[e22d5cc0] [c01402ac] __mem_cgroup_try_charge+0x2e8/0xbac
[e22d5d70] [c01410d4] mem_cgroup_charge_common+0x3c/0x70
[e22d5d90] [c0117284] __do_fault+0x38c/0x510
[e22d5df0] [c011a5f4] handle_pte_fault+0x98/0x858
[e22d5e50] [c060ed08] do_page_fault+0x42c/0x6fc
[e22d5f40] [c000f5b4] handle_page_fault+0xc/0x80
What happens:
refill_stock()
get_cpu_var()
drain_stock()
res_counter_uncharge()
res_counter_uncharge_until()
spin_lock() <== boom
Fix it by replacing get/put_cpu_var() with get/put_cpu_light().
Reported-by: Nikita Yushchenko <nyushchenko@dev.rtsoft.ru>
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
[bigeasy: use memcg_stock_ll as a locallock since it is now IRQ-off region]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
283/286 [
Author: Steven Rostedt
Email: rostedt@goodmis.org
Subject: rt,ntp: Move call to schedule_delayed_work() to helper thread
Date: Wed, 26 Jun 2013 15:28:11 -0400
The ntp code for notify_cmos_timer() is called from a hard interrupt
context. schedule_delayed_work() under PREEMPT_RT_FULL calls spinlocks
that have been converted to mutexes, thus calling schedule_delayed_work()
from interrupt is not safe.
Add a helper thread that does the call to schedule_delayed_work and wake
up that thread instead of calling schedule_delayed_work() directly.
This is only for CONFIG_PREEMPT_RT_FULL, otherwise the code still calls
schedule_delayed_work() directly in irq context.
Note: There's a few places in the kernel that do this. Perhaps the RT
code should have a dedicated thread that does the checks. Just register
a notifier on boot up for your check and wake up the thread when
needed. This will be a todo.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
[bigeasy: use swork_queue() instead a helper thread]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
284/286 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: md: disable bcache
Date: Thu, 29 Aug 2013 11:48:57 +0200
It uses anon semaphores
|drivers/md/bcache/request.c: In function ‘cached_dev_write_complete’:
|drivers/md/bcache/request.c:1007:2: error: implicit declaration of function ‘up_read_non_owner’ [-Werror=implicit-function-declaration]
| up_read_non_owner(&dc->writeback_lock);
| ^
|drivers/md/bcache/request.c: In function ‘request_write’:
|drivers/md/bcache/request.c:1033:2: error: implicit declaration of function ‘down_read_non_owner’ [-Werror=implicit-function-declaration]
| down_read_non_owner(&dc->writeback_lock);
| ^
either we get rid of those or we have to introduce them…
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
285/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: workqueue: Prevent deadlock/stall on RT
Date: Fri, 27 Jun 2014 16:24:52 +0200
Austin reported a XFS deadlock/stall on RT where scheduled work gets
never exececuted and tasks are waiting for each other for ever.
The underlying problem is the modification of the RT code to the
handling of workers which are about to go to sleep. In mainline a
worker thread which goes to sleep wakes an idle worker if there is
more work to do. This happens from the guts of the schedule()
function. On RT this must be outside and the accessed data structures
are not protected against scheduling due to the spinlock to rtmutex
conversion. So the naive solution to this was to move the code outside
of the scheduler and protect the data structures by the pool
lock. That approach turned out to be a little naive as we cannot call
into that code when the thread blocks on a lock, as it is not allowed
to block on two locks in parallel. So we dont call into the worker
wakeup magic when the worker is blocked on a lock, which causes the
deadlock/stall observed by Austin and Mike.
Looking deeper into that worker code it turns out that the only
relevant data structure which needs to be protected is the list of
idle workers which can be woken up.
So the solution is to protect the list manipulation operations with
preempt_enable/disable pairs on RT and call unconditionally into the
worker code even when the worker is blocked on a lock. The preemption
protection is safe as there is nothing which can fiddle with the list
outside of thread context.
Reported-and_tested-by: Austin Schuh <austin@peloton-tech.com>
Reported-and_tested-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://vger.kernel.org/r/alpine.DEB.2.10.1406271249510.5170@nanos
Cc: Richard Weinberger <richard.weinberger@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
]
286/286 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: Add localversion for -RT release
Date: Fri, 8 Jul 2011 20:25:16 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
Signed-off-by: Bruce Ashfield <bruce.ashfield@windriver.com>
|