Age | Commit message (Collapse) | Author |
|
1/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: highmem: Don't disable preemption on RT in kmap_atomic()
Date: Fri, 30 Oct 2020 13:59:06 +0100
Disabling preemption makes it impossible to acquire sleeping locks within
kmap_atomic() section.
For PREEMPT_RT it is sufficient to disable migration.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
2/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: timers: Move clearing of base::timer_running under base::lock
Date: Sun, 6 Dec 2020 22:40:07 +0100
syzbot reported KCSAN data races vs. timer_base::timer_running being set to
NULL without holding base::lock in expire_timers().
This looks innocent and most reads are clearly not problematic but for a
non-RT kernel it's completely irrelevant whether the store happens before
or after taking the lock. For an RT kernel moving the store under the lock
requires an extra unlock/lock pair in the case that there is a waiter for
the timer. But that's not the end of the world and definitely not worth the
trouble of adding boatloads of comments and annotations to the code. Famous
last words...
Reported-by: syzbot+aa7c2385d46c5eba0b89@syzkaller.appspotmail.com
Reported-by: syzbot+abea4558531bae1ba9fe@syzkaller.appspotmail.com
Link: https://lkml.kernel.org/r/87lfea7gw8.fsf@nanos.tec.linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: stable-rt@vger.kernel.org
]
3/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: kthread: Move prio/affinite change into the newly created thread
Date: Mon, 9 Nov 2020 21:30:41 +0100
With enabled threaded interrupts the nouveau driver reported the
following:
| Chain exists of:
| &mm->mmap_lock#2 --> &device->mutex --> &cpuset_rwsem
|
| Possible unsafe locking scenario:
|
| CPU0 CPU1
| ---- ----
| lock(&cpuset_rwsem);
| lock(&device->mutex);
| lock(&cpuset_rwsem);
| lock(&mm->mmap_lock#2);
The device->mutex is nvkm_device::mutex.
Unblocking the lockchain at `cpuset_rwsem' is probably the easiest thing
to do.
Move the priority reset to the start of the newly created thread.
Fixes: 710da3c8ea7df ("sched/core: Prevent race condition between cpuset and __sched_setscheduler()")
Reported-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lkml.kernel.org/r/a23a826af7c108ea5651e73b8fbae5e653f16e86.camel@gmx.de
]
4/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: genirq: Move prio assignment into the newly created thread
Date: Mon, 9 Nov 2020 23:32:39 +0100
With enabled threaded interrupts the nouveau driver reported the
following:
| Chain exists of:
| &mm->mmap_lock#2 --> &device->mutex --> &cpuset_rwsem
|
| Possible unsafe locking scenario:
|
| CPU0 CPU1
| ---- ----
| lock(&cpuset_rwsem);
| lock(&device->mutex);
| lock(&cpuset_rwsem);
| lock(&mm->mmap_lock#2);
The device->mutex is nvkm_device::mutex.
Unblocking the lockchain at `cpuset_rwsem' is probably the easiest thing
to do.
Move the priority assignment to the start of the newly created thread.
Fixes: 710da3c8ea7df ("sched/core: Prevent race condition between cpuset and __sched_setscheduler()")
Reported-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy: Patch description]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lkml.kernel.org/r/a23a826af7c108ea5651e73b8fbae5e653f16e86.camel@gmx.de
]
5/191 [
Author: Valentin Schneider
Email: valentin.schneider@arm.com
Subject: notifier: Make atomic_notifiers use raw_spinlock
Date: Sun, 22 Nov 2020 20:19:04 +0000
Booting a recent PREEMPT_RT kernel (v5.10-rc3-rt7-rebase) on my arm64 Juno
leads to the idle task blocking on an RT sleeping spinlock down some
notifier path:
[ 1.809101] BUG: scheduling while atomic: swapper/5/0/0x00000002
[ 1.809116] Modules linked in:
[ 1.809123] Preemption disabled at:
[ 1.809125] secondary_start_kernel (arch/arm64/kernel/smp.c:227)
[ 1.809146] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G W 5.10.0-rc3-rt7 #168
[ 1.809153] Hardware name: ARM Juno development board (r0) (DT)
[ 1.809158] Call trace:
[ 1.809160] dump_backtrace (arch/arm64/kernel/stacktrace.c:100 (discriminator 1))
[ 1.809170] show_stack (arch/arm64/kernel/stacktrace.c:198)
[ 1.809178] dump_stack (lib/dump_stack.c:122)
[ 1.809188] __schedule_bug (kernel/sched/core.c:4886)
[ 1.809197] __schedule (./arch/arm64/include/asm/preempt.h:18 kernel/sched/core.c:4913 kernel/sched/core.c:5040)
[ 1.809204] preempt_schedule_lock (kernel/sched/core.c:5365 (discriminator 1))
[ 1.809210] rt_spin_lock_slowlock_locked (kernel/locking/rtmutex.c:1072)
[ 1.809217] rt_spin_lock_slowlock (kernel/locking/rtmutex.c:1110)
[ 1.809224] rt_spin_lock (./include/linux/rcupdate.h:647 kernel/locking/rtmutex.c:1139)
[ 1.809231] atomic_notifier_call_chain_robust (kernel/notifier.c:71 kernel/notifier.c:118 kernel/notifier.c:186)
[ 1.809240] cpu_pm_enter (kernel/cpu_pm.c:39 kernel/cpu_pm.c:93)
[ 1.809249] psci_enter_idle_state (drivers/cpuidle/cpuidle-psci.c:52 drivers/cpuidle/cpuidle-psci.c:129)
[ 1.809258] cpuidle_enter_state (drivers/cpuidle/cpuidle.c:238)
[ 1.809267] cpuidle_enter (drivers/cpuidle/cpuidle.c:353)
[ 1.809275] do_idle (kernel/sched/idle.c:132 kernel/sched/idle.c:213 kernel/sched/idle.c:273)
[ 1.809282] cpu_startup_entry (kernel/sched/idle.c:368 (discriminator 1))
[ 1.809288] secondary_start_kernel (arch/arm64/kernel/smp.c:273)
Two points worth noting:
1) That this is conceptually the same issue as pointed out in:
313c8c16ee62 ("PM / CPU: replace raw_notifier with atomic_notifier")
2) Only the _robust() variant of atomic_notifier callchains suffer from
this
AFAICT only the cpu_pm_notifier_chain really needs to be changed, but
singling it out would mean introducing a new (truly) non-blocking API. At
the same time, callers that are fine with any blocking within the call
chain should use blocking notifiers, so patching up all atomic_notifier's
doesn't seem *too* crazy to me.
Fixes: 70d932985757 ("notifier: Fix broken error handling pattern")
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Link: https://lkml.kernel.org/r/20201122201904.30940-1-valentin.schneider@arm.com
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
6/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: powerpc/mm: Move the linear_mapping_mutex to the ifdef where it is used
Date: Fri, 19 Feb 2021 17:51:07 +0100
The mutex linear_mapping_mutex is defined at the of the file while its
only two user are within the CONFIG_MEMORY_HOTPLUG block.
A compile without CONFIG_MEMORY_HOTPLUG set fails on PREEMPT_RT because
its mutex implementation is smart enough to realize that it is unused.
Move the definition of linear_mapping_mutex to ifdef block where it is
used.
Fixes: 1f73ad3e8d755 ("powerpc/mm: print warning in arch_remove_linear_mapping()")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
7/191 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: limit second loop of syslog_print_all
Date: Wed, 17 Feb 2021 16:15:31 +0100
The second loop of syslog_print_all() subtracts lengths that were
added in the first loop. With commit b031a684bfd0 ("printk: remove
logbuf_lock writer-protection of ringbuffer") it is possible that
records are (over)written during syslog_print_all(). This allows the
possibility of the second loop subtracting lengths that were never
added in the first loop.
This situation can result in syslog_print_all() filling the buffer
starting from a later record, even though there may have been room
to fit the earlier record(s) as well.
Fixes: b031a684bfd0 ("printk: remove logbuf_lock writer-protection of ringbuffer")
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
]
8/191 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: kmsg_dump: remove unused fields
Date: Mon, 21 Dec 2020 11:19:39 +0106
struct kmsg_dumper still contains some fields that were used to
iterate the old ringbuffer. They are no longer used. Remove them
and update the struct documentation.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
9/191 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: refactor kmsg_dump_get_buffer()
Date: Mon, 30 Nov 2020 01:41:56 +0106
kmsg_dump_get_buffer() requires nearly the same logic as
syslog_print_all(), but uses different variable names and
does not make use of the ringbuffer loop macros. Modify
kmsg_dump_get_buffer() so that the implementation is as similar
to syslog_print_all() as possible.
A follow-up commit will move this common logic into a
separate helper function.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
10/191 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: consolidate kmsg_dump_get_buffer/syslog_print_all code
Date: Wed, 13 Jan 2021 11:29:53 +0106
The logic for finding records to fit into a buffer is the same for
kmsg_dump_get_buffer() and syslog_print_all(). Introduce a helper
function find_first_fitting_seq() to handle this logic.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
]
11/191 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: introduce CONSOLE_LOG_MAX for improved multi-line support
Date: Thu, 10 Dec 2020 12:48:01 +0106
Instead of using "LOG_LINE_MAX + PREFIX_MAX" for temporary buffer
sizes, introduce CONSOLE_LOG_MAX. This represents the maximum size
that is allowed to be printed to the console for a single record.
Rather than setting CONSOLE_LOG_MAX to "LOG_LINE_MAX + PREFIX_MAX"
(1024), increase it to 4096. With a larger buffer size, multi-line
records that are nearly LOG_LINE_MAX in length will have a better
chance of being fully printed. (When formatting a record for the
console, each line of a multi-line record is prepended with a copy
of the prefix.)
Signed-off-by: John Ogness <john.ogness@linutronix.de>
]
12/191 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: use seqcount_latch for clear_seq
Date: Mon, 30 Nov 2020 01:41:58 +0106
kmsg_dump_rewind_nolock() locklessly reads @clear_seq. However,
this is not done atomically. Since @clear_seq is 64-bit, this
cannot be an atomic operation for all platforms. Therefore, use
a seqcount_latch to allow readers to always read a consistent
value.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
13/191 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: use atomic64_t for devkmsg_user.seq
Date: Thu, 10 Dec 2020 15:33:40 +0106
@user->seq is indirectly protected by @logbuf_lock. Once @logbuf_lock
is removed, @user->seq will be no longer safe from an atomicity point
of view.
In preparation for the removal of @logbuf_lock, change it to
atomic64_t to provide this safety.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
]
14/191 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: add syslog_lock
Date: Thu, 10 Dec 2020 16:58:02 +0106
The global variables @syslog_seq, @syslog_partial, @syslog_time
and write access to @clear_seq are protected by @logbuf_lock.
Once @logbuf_lock is removed, these variables will need their
own synchronization method. Introduce @syslog_lock for this
purpose.
@syslog_lock is a raw_spin_lock for now. This simplifies the
transition to removing @logbuf_lock. Once @logbuf_lock and the
safe buffers are removed, @syslog_lock can change to spin_lock.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
15/191 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: introduce a kmsg_dump iterator
Date: Fri, 18 Dec 2020 11:40:08 +0000
Rather than store the iterator information into the registered
kmsg_dump structure, create a separate iterator structure. The
kmsg_dump_iter structure can reside on the stack of the caller,
thus allowing lockless use of the kmsg_dump functions.
This is in preparation for removal of @logbuf_lock.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
16/191 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: um: synchronize kmsg_dumper
Date: Mon, 21 Dec 2020 11:10:03 +0106
The kmsg_dumper can be called from any context and CPU, possibly
from multiple CPUs simultaneously. Since a static buffer is used
to retrieve the kernel logs, this buffer must be protected against
simultaneous dumping.
Cc: Richard Weinberger <richard@nod.at>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
17/191 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: remove logbuf_lock
Date: Tue, 26 Jan 2021 17:43:19 +0106
Since the ringbuffer is lockless, there is no need for it to be
protected by @logbuf_lock. Remove @logbuf_lock.
This means that printk_nmi_direct and printk_safe_flush_on_panic()
no longer need to acquire any lock to run.
@console_seq, @exclusive_console_stop_seq, @console_dropped are
protected by @console_lock.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
18/191 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: kmsg_dump: remove _nolock() variants
Date: Mon, 21 Dec 2020 10:27:58 +0106
kmsg_dump_rewind() and kmsg_dump_get_line() are lockless, so there is
no need for _nolock() variants. Remove these functions and switch all
callers of the _nolock() variants.
The functions without _nolock() were chosen because they are already
exported to kernel modules.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
]
19/191 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: kmsg_dump: use kmsg_dump_rewind
Date: Wed, 17 Feb 2021 18:23:16 +0100
kmsg_dump() is open coding the kmsg_dump_rewind(). Call
kmsg_dump_rewind() instead.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
20/191 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: console: remove unnecessary safe buffer usage
Date: Wed, 17 Feb 2021 18:28:05 +0100
Upon registering a console, safe buffers are activated when setting
up the sequence number to replay the log. However, these are already
protected by @console_sem and @syslog_lock. Remove the unnecessary
safe buffer usage.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
]
21/191 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: track/limit recursion
Date: Fri, 11 Dec 2020 00:55:25 +0106
Limit printk() recursion to 1 level. This is enough to print a
stacktrace for the printk call, should a WARN or BUG occur.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
22/191 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: remove safe buffers
Date: Mon, 30 Nov 2020 01:42:00 +0106
With @logbuf_lock removed, the high level printk functions for
storing messages are lockless. Messages can be stored from any
context, so there is no need for the NMI and safe buffers anymore.
Remove the NMI and safe buffers. In NMI or safe contexts, store
the message immediately but still use irq_work to defer the console
printing.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
23/191 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: convert @syslog_lock to spin_lock
Date: Thu, 18 Feb 2021 17:37:41 +0100
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
24/191 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: console: add write_atomic interface
Date: Mon, 30 Nov 2020 01:42:01 +0106
Add a write_atomic() callback to the console. This is an optional
function for console drivers. The function must be atomic (including
NMI safe) for writing to the console.
Console drivers must still implement the write() callback. The
write_atomic() callback will only be used in special situations,
such as when the kernel panics.
Creating an NMI safe write_atomic() that must synchronize with
write() requires a careful implementation of the console driver. To
aid with the implementation, a set of console_atomic_*() functions
are provided:
void console_atomic_lock(unsigned int *flags);
void console_atomic_unlock(unsigned int flags);
These functions synchronize using a processor-reentrant spinlock
(called a cpulock).
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
25/191 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: serial: 8250: implement write_atomic
Date: Mon, 30 Nov 2020 01:42:02 +0106
Implement a non-sleeping NMI-safe write_atomic() console function in
order to support emergency console printing.
Since interrupts need to be disabled during transmit, all usage of
the IER register is wrapped with access functions that use the
console_atomic_lock() function to synchronize register access while
tracking the state of the interrupts. This is necessary because
write_atomic() can be called from an NMI context that has preempted
write_atomic().
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
26/191 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: relocate printk_delay() and vprintk_default()
Date: Mon, 30 Nov 2020 01:42:03 +0106
Move printk_delay() and vprintk_default() "as is" further up so that
they can be used by new functions in an upcoming commit.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
27/191 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: combine boot_delay_msec() into printk_delay()
Date: Mon, 30 Nov 2020 01:42:04 +0106
boot_delay_msec() is always called immediately before printk_delay()
so just combine the two.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
28/191 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: change @console_seq to atomic64_t
Date: Mon, 30 Nov 2020 01:42:05 +0106
In preparation for atomic printing, change @console_seq to atomic
so that it can be accessed without requiring @console_sem.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
29/191 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: introduce kernel sync mode
Date: Mon, 30 Nov 2020 01:42:06 +0106
When the kernel performs an OOPS, enter into "sync mode":
- only atomic consoles (write_atomic() callback) will print
- printing occurs within vprintk_store() instead of console_unlock()
CONSOLE_LOG_MAX is moved to printk.h to support the per-console
buffer used in sync mode.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
30/191 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: move console printing to kthreads
Date: Mon, 30 Nov 2020 01:42:07 +0106
Create a kthread for each console to perform console printing. Now
all console printing is fully asynchronous except for the boot
console and when the kernel enters sync mode (and there are atomic
consoles available).
The console_lock() and console_unlock() functions now only do what
their name says... locking and unlocking of the console.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
31/191 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: remove deferred printing
Date: Mon, 30 Nov 2020 01:42:08 +0106
Since printing occurs either atomically or from the printing
kthread, there is no need for any deferring or tracking possible
recursion paths. Remove all printk context tracking.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
32/191 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: add console handover
Date: Mon, 30 Nov 2020 01:42:09 +0106
If earlyprintk is used, a boot console will print directly to the
console immediately. The boot console will unregister itself as soon
as a non-boot console registers. However, the non-boot console does
not begin printing until its kthread has started. Since this happens
much later, there is a long pause in the console output. If the
ringbuffer is small, messages could even be dropped during the
pause.
Add a new CON_HANDOVER console flag to be used internally by printk
in order to track which non-boot console took over from a boot
console. If handover consoles have implemented write_atomic(), they
are allowed to print directly to the console until their kthread can
take over.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
33/191 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: add pr_flush()
Date: Mon, 30 Nov 2020 01:42:10 +0106
Provide a function to allow waiting for console printers to catch
up to the latest logged message.
Use pr_flush() to give console printers a chance to finish in
critical situations if no atomic console is available. For now
pr_flush() is only used in the most common error paths:
panic(), print_oops_end_marker(), report_bug(), kmsg_dump().
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
34/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: kcov: Remove kcov include from sched.h and move it to its users.
Date: Thu, 18 Feb 2021 18:31:24 +0100
The recent addition of in_serving_softirq() to kconv.h results in
compile failure on PREEMPT_RT because it requires
task_struct::softirq_disable_cnt. This is not available if kconv.h is
included from sched.h.
It is not needed to include kconv.h from sched.h. All but the net/ user
already include the kconv header file.
Move the include of the kconv.h header from sched.h it its users.
Additionally include sched.h from kconv.h to ensure that everything
task_struct related is available.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Andrey Konovalov <andreyknvl@google.com>
Link: https://lkml.kernel.org/r/20210218173124.iy5iyqv3a4oia4vv@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
35/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: cgroup: use irqsave in cgroup_rstat_flush_locked()
Date: Tue, 3 Jul 2018 18:19:48 +0200
All callers of cgroup_rstat_flush_locked() acquire cgroup_rstat_lock
either with spin_lock_irq() or spin_lock_irqsave().
cgroup_rstat_flush_locked() itself acquires cgroup_rstat_cpu_lock which
is a raw_spin_lock. This lock is also acquired in cgroup_rstat_updated()
in IRQ context and therefore requires _irqsave() locking suffix in
cgroup_rstat_flush_locked().
Since there is no difference between spin_lock_t and raw_spin_lock_t
on !RT lockdep does not complain here. On RT lockdep complains because
the interrupts were not disabled here and a deadlock is possible.
Acquire the raw_spin_lock_t with disabled interrupts.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
36/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: mm: workingset: replace IRQ-off check with a lockdep assert.
Date: Mon, 11 Feb 2019 10:40:46 +0100
Commit
68d48e6a2df57 ("mm: workingset: add vmstat counter for shadow nodes")
introduced an IRQ-off check to ensure that a lock is held which also
disabled interrupts. This does not work the same way on -RT because none
of the locks, that are held, disable interrupts.
Replace this check with a lockdep assert which ensures that the lock is
held.
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
37/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: shmem: Use raw_spinlock_t for ->stat_lock
Date: Fri, 14 Aug 2020 18:53:34 +0200
Each CPU has SHMEM_INO_BATCH inodes available in `->ino_batch' which is
per-CPU. Access here is serialized by disabling preemption. If the pool is
empty, it gets reloaded from `->next_ino'. Access here is serialized by
->stat_lock which is a spinlock_t and can not be acquired with disabled
preemption.
One way around it would make per-CPU ino_batch struct containing the inode
number a local_lock_t.
Another sollution is to promote ->stat_lock to a raw_spinlock_t. The critical
sections are short. The mpol_put() should be moved outside of the critical
section to avoid invoking the destrutor with disabled preemption.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
38/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: net: Move lockdep where it belongs
Date: Tue, 8 Sep 2020 07:32:20 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
39/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: tcp: Remove superfluous BH-disable around listening_hash
Date: Mon, 12 Oct 2020 17:33:54 +0200
Commit
9652dc2eb9e40 ("tcp: relax listening_hash operations")
removed the need to disable bottom half while acquiring
listening_hash.lock. There are still two callers left which disable
bottom half before the lock is acquired.
Drop local_bh_disable() around __inet_hash() which acquires
listening_hash->lock, invoke inet_ehash_nolisten() with disabled BH.
inet_unhash() conditionally acquires listening_hash->lock.
Reported-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lore.kernel.org/linux-rt-users/12d6f9879a97cd56c09fb53dee343cbb14f7f1f7.camel@gmx.de/
Link: https://lkml.kernel.org/r/X9CheYjuXWc75Spa@hirez.programming.kicks-ass.net
]
40/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: smp: Wake ksoftirqd on PREEMPT_RT instead do_softirq().
Date: Mon, 15 Feb 2021 18:44:12 +0100
The softirq implementation on PREEMPT_RT does not provide do_softirq().
The other user of do_softirq() is replaced with a local_bh_disable()
+ enable() around the possible raise-softirq invocation. This can not be
done here because migration_cpu_stop() is invoked with disabled
preemption.
Wake the softirq thread on PREEMPT_RT if there are any pending softirqs.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
41/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: tasklets: Replace barrier() with cpu_relax() in tasklet_unlock_wait()
Date: Tue, 9 Mar 2021 09:42:04 +0100
A barrier() in a tight loop which waits for something to happen on a remote
CPU is a pointless exercise. Replace it with cpu_relax() which allows HT
siblings to make progress.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
42/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: tasklets: Use static inlines for stub implementations
Date: Tue, 9 Mar 2021 09:42:05 +0100
Inlines exist for a reason.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
43/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: tasklets: Provide tasklet_disable_in_atomic()
Date: Tue, 9 Mar 2021 09:42:06 +0100
Replacing the spin wait loops in tasklet_unlock_wait() with
wait_var_event() is not possible as a handful of tasklet_disable()
invocations are happening in atomic context. All other invocations are in
teardown paths which can sleep.
Provide tasklet_disable_in_atomic() and tasklet_unlock_spin_wait() to
convert the few atomic use cases over, which allows to change
tasklet_disable() and tasklet_unlock_wait() in a later step.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
44/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: tasklets: Use spin wait in tasklet_disable() temporarily
Date: Tue, 9 Mar 2021 09:42:07 +0100
To ease the transition use spin waiting in tasklet_disable() until all
usage sites from atomic context have been cleaned up.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
45/191 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: tasklets: Replace spin wait in tasklet_unlock_wait()
Date: Tue, 9 Mar 2021 09:42:08 +0100
tasklet_unlock_wait() spin waits for TASKLET_STATE_RUN to be cleared. This
is wasting CPU cycles in a tight loop which is especially painful in a
guest when the CPU running the tasklet is scheduled out.
tasklet_unlock_wait() is invoked from tasklet_kill() which is used in
teardown paths and not performance critical at all. Replace the spin wait
with wait_var_event().
There are no users of tasklet_unlock_wait() which are invoked from atomic
contexts. The usage in tasklet_disable() has been replaced temporarily with
the spin waiting variant until the atomic users are fixed up and will be
converted to the sleep wait variant later.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
46/191 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: tasklets: Replace spin wait in tasklet_kill()
Date: Tue, 9 Mar 2021 09:42:09 +0100
tasklet_kill() spin waits for TASKLET_STATE_SCHED to be cleared invoking
yield() from inside the loop. yield() is an ill defined mechanism and the
result might still be wasting CPU cycles in a tight loop which is
especially painful in a guest when the CPU running the tasklet is scheduled
out.
tasklet_kill() is used in teardown paths and not performance critical at
all. Replace the spin wait with wait_var_event().
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
47/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: tasklets: Prevent tasklet_unlock_spin_wait() deadlock on RT
Date: Tue, 9 Mar 2021 09:42:10 +0100
tasklet_unlock_spin_wait() spin waits for the TASKLET_STATE_SCHED bit in
the tasklet state to be cleared. This works on !RT nicely because the
corresponding execution can only happen on a different CPU.
On RT softirq processing is preemptible, therefore a task preempting the
softirq processing thread can spin forever.
Prevent this by invoking local_bh_disable()/enable() inside the loop. In
case that the softirq processing thread was preempted by the current task,
current will block on the local lock which yields the CPU to the preempted
softirq processing thread. If the tasklet is processed on a different CPU
then the local_bh_disable()/enable() pair is just a waste of processor
cycles.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
48/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net: jme: Replace link-change tasklet with work
Date: Tue, 9 Mar 2021 09:42:11 +0100
The link change tasklet disables the tasklets for tx/rx processing while
upating hw parameters and then enables the tasklets again.
This update can also be pushed into a workqueue where it can be performed
in preemptible context. This allows tasklet_disable() to become sleeping.
Replace the linkch_task tasklet with a work.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
49/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net: sundance: Use tasklet_disable_in_atomic().
Date: Tue, 9 Mar 2021 09:42:12 +0100
tasklet_disable() is used in the timer callback. This might be distangled,
but without access to the hardware that's a bit risky.
Replace it with tasklet_disable_in_atomic() so tasklet_disable() can be
changed to a sleep wait once all remaining atomic users are converted.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Denis Kirjanov <kda@linux-powerpc.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
50/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: ath9k: Use tasklet_disable_in_atomic()
Date: Tue, 9 Mar 2021 09:42:13 +0100
All callers of ath9k_beacon_ensure_primary_slot() are preemptible /
acquire a mutex except for this callchain:
spin_lock_bh(&sc->sc_pcu_lock);
ath_complete_reset()
-> ath9k_calculate_summary_state()
-> ath9k_beacon_ensure_primary_slot()
It's unclear how that can be distangled, so use tasklet_disable_in_atomic()
for now. This allows tasklet_disable() to become sleepable once the
remaining atomic users are cleaned up.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: ath9k-devel@qca.qualcomm.com
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: linux-wireless@vger.kernel.org
Cc: netdev@vger.kernel.org
Acked-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
51/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: atm: eni: Use tasklet_disable_in_atomic() in the send() callback
Date: Tue, 9 Mar 2021 09:42:14 +0100
The atmdev_ops::send callback which calls tasklet_disable() is invoked with
bottom halfs disabled from net_device_ops::ndo_start_xmit(). All other
invocations of tasklet_disable() in this driver happen in preemptible
context.
Change the send() call to use tasklet_disable_in_atomic() which allows
tasklet_disable() to be made sleepable once the remaining atomic context
usage sites are cleaned up.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Chas Williams <3chas3@gmail.com>
Cc: linux-atm-general@lists.sourceforge.net
Cc: netdev@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
52/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: PCI: hv: Use tasklet_disable_in_atomic()
Date: Tue, 9 Mar 2021 09:42:15 +0100
The hv_compose_msi_msg() callback in irq_chip::irq_compose_msi_msg is
invoked via irq_chip_compose_msi_msg(), which itself is always invoked from
atomic contexts from the guts of the interrupt core code.
There is no way to change this w/o rewriting the whole driver, so use
tasklet_disable_in_atomic() which allows to make tasklet_disable()
sleepable once the remaining atomic users are addressed.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-hyperv@vger.kernel.org
Cc: linux-pci@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
53/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: firewire: ohci: Use tasklet_disable_in_atomic() where required
Date: Tue, 9 Mar 2021 09:42:16 +0100
tasklet_disable() is invoked in several places. Some of them are in atomic
context which prevents a conversion of tasklet_disable() to a sleepable
function.
The atomic callchains are:
ar_context_tasklet()
ohci_cancel_packet()
tasklet_disable()
...
ohci_flush_iso_completions()
tasklet_disable()
The invocation of tasklet_disable() from at_context_flush() is always in
preemptible context.
Use tasklet_disable_in_atomic() for the two invocations in
ohci_cancel_packet() and ohci_flush_iso_completions().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
Cc: linux1394-devel@lists.sourceforge.net
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
54/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: tasklets: Switch tasklet_disable() to the sleep wait variant
Date: Tue, 9 Mar 2021 09:42:17 +0100
-- NOT FOR IMMEDIATE MERGING --
Now that all users of tasklet_disable() are invoked from sleepable context,
convert it to use tasklet_unlock_wait() which might sleep.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
55/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: softirq: Add RT specific softirq accounting
Date: Tue, 9 Mar 2021 09:55:53 +0100
RT requires the softirq processing and local bottomhalf disabled regions to
be preemptible. Using the normal preempt count based serialization is
therefore not possible because this implicitely disables preemption.
RT kernels use a per CPU local lock to serialize bottomhalfs. As
local_bh_disable() can nest the lock can only be acquired on the outermost
invocation of local_bh_disable() and released when the nest count becomes
zero. Tasks which hold the local lock can be preempted so its required to
keep track of the nest count per task.
Add a RT only counter to task struct and adjust the relevant macros in
preempt.h.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
56/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: irqtime: Make accounting correct on RT
Date: Tue, 9 Mar 2021 09:55:54 +0100
vtime_account_irq and irqtime_account_irq() base checks on preempt_count()
which fails on RT because preempt_count() does not contain the softirq
accounting which is seperate on RT.
These checks do not need the full preempt count as they only operate on the
hard and softirq sections.
Use irq_count() instead which provides the correct value on both RT and non
RT kernels. The compiler is clever enough to fold the masking for !RT:
99b: 65 8b 05 00 00 00 00 mov %gs:0x0(%rip),%eax
- 9a2: 25 ff ff ff 7f and $0x7fffffff,%eax
+ 9a2: 25 00 ff ff 00 and $0xffff00,%eax
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
57/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: softirq: Move various protections into inline helpers
Date: Tue, 9 Mar 2021 09:55:55 +0100
To allow reuse of the bulk of softirq processing code for RT and to avoid
#ifdeffery all over the place, split protections for various code sections
out into inline helpers so the RT variant can just replace them in one go.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
58/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: softirq: Make softirq control and processing RT aware
Date: Tue, 9 Mar 2021 09:55:56 +0100
Provide a local lock based serialization for soft interrupts on RT which
allows the local_bh_disabled() sections and servicing soft interrupts to be
preemptible.
Provide the necessary inline helpers which allow to reuse the bulk of the
softirq processing code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
59/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: tick/sched: Prevent false positive softirq pending warnings on RT
Date: Tue, 9 Mar 2021 09:55:57 +0100
On RT a task which has soft interrupts disabled can block on a lock and
schedule out to idle while soft interrupts are pending. This triggers the
warning in the NOHZ idle code which complains about going idle with pending
soft interrupts. But as the task is blocked soft interrupt processing is
temporarily blocked as well which means that such a warning is a false
positive.
To prevent that check the per CPU state which indicates that a scheduled
out task has soft interrupts disabled.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
60/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rcu: Prevent false positive softirq warning on RT
Date: Tue, 9 Mar 2021 09:55:58 +0100
Soft interrupt disabled sections can legitimately be preempted or schedule
out when blocking on a lock on RT enabled kernels so the RCU preempt check
warning has to be disabled for RT kernels.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
61/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locking/rtmutex: Remove cruft
Date: Tue, 29 Sep 2020 15:21:17 +0200
Most of this is around since the very beginning. I'm not sure if this
was used while the rtmutex-deadlock-tester was around but today it seems
to only waste memory:
- save_state: No users
- name: Assigned and printed if a dead lock was detected. I'm keeping it
but want to point out that lockdep has the same information.
- file + line: Printed if ::name was NULL. This is only used for
in-kernel locks so it ::name shouldn't be NULL and then ::file and
::line isn't used.
- magic: Assigned to NULL by rt_mutex_destroy().
Remove members of rt_mutex which are not used.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
62/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locking/rtmutex: Remove output from deadlock detector.
Date: Tue, 29 Sep 2020 16:05:11 +0200
In commit
f5694788ad8da ("rt_mutex: Add lockdep annotations")
rtmutex gained lockdep annotation for rt_mutex_lock() and and related
functions.
lockdep will see the locking order and may complain about a deadlock
before rtmutex' own mechanism gets a chance to detect it.
The rtmutex deadlock detector will only complain locks with the
RT_MUTEX_MIN_CHAINWALK and a waiter must be pending. That means it
works only for in-kernel locks because the futex interface always uses
RT_MUTEX_FULL_CHAINWALK.
The requirement for an active waiter limits the detector to actual
deadlocks and makes it possible to report potential deadlocks like
lockdep does.
It looks like lockdep is better suited for reporting deadlocks.
Remove rtmutex' debug print on deadlock detection.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
63/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locking/rtmutex: Move rt_mutex_init() outside of CONFIG_DEBUG_RT_MUTEXES
Date: Tue, 29 Sep 2020 16:32:49 +0200
rt_mutex_init() only initializes lockdep if CONFIG_DEBUG_RT_MUTEXES is
enabled. The static initializer (DEFINE_RT_MUTEX) does not have such a
restriction.
Move rt_mutex_init() outside of CONFIG_DEBUG_RT_MUTEXES.
Move the remaining functions in this CONFIG_DEBUG_RT_MUTEXES block to
the upper block.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
64/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locking/rtmutex: Remove rt_mutex_timed_lock()
Date: Wed, 7 Oct 2020 12:11:33 +0200
rt_mutex_timed_lock() has no callers since commit
c051b21f71d1f ("rtmutex: Confine deadlock logic to futex")
Remove rt_mutex_timed_lock().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
65/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: locking/rtmutex: Handle the various new futex race conditions
Date: Fri, 10 Jun 2011 11:04:15 +0200
RT opens a few new interesting race conditions in the rtmutex/futex
combo due to futex hash bucket lock being a 'sleeping' spinlock and
therefor not disabling preemption.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
66/191 [
Author: Steven Rostedt
Email: rostedt@goodmis.org
Subject: futex: Fix bug on when a requeued RT task times out
Date: Tue, 14 Jul 2015 14:26:34 +0200
Requeue with timeout causes a bug with PREEMPT_RT.
The bug comes from a timed out condition.
TASK 1 TASK 2
------ ------
futex_wait_requeue_pi()
futex_wait_queue_me()
<timed out>
double_lock_hb();
raw_spin_lock(pi_lock);
if (current->pi_blocked_on) {
} else {
current->pi_blocked_on = PI_WAKE_INPROGRESS;
run_spin_unlock(pi_lock);
spin_lock(hb->lock); <-- blocked!
plist_for_each_entry_safe(this) {
rt_mutex_start_proxy_lock();
task_blocks_on_rt_mutex();
BUG_ON(task->pi_blocked_on)!!!!
The BUG_ON() actually has a check for PI_WAKE_INPROGRESS, but the
problem is that, after TASK 1 sets PI_WAKE_INPROGRESS, it then tries to
grab the hb->lock, which it fails to do so. As the hb->lock is a mutex,
it will block and set the "pi_blocked_on" to the hb->lock.
When TASK 2 goes to requeue it, the check for PI_WAKE_INPROGESS fails
because the task1's pi_blocked_on is no longer set to that, but instead,
set to the hb->lock.
The fix:
When calling rt_mutex_start_proxy_lock() a check is made to see
if the proxy tasks pi_blocked_on is set. If so, exit out early.
Otherwise set it to a new flag PI_REQUEUE_INPROGRESS, which notifies
the proxy task that it is being requeued, and will handle things
appropriately.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
67/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: locking/rtmutex: Make lock_killable work
Date: Sat, 1 Apr 2017 12:50:59 +0200
Locking an rt mutex killable does not work because signal handling is
restricted to TASK_INTERRUPTIBLE.
Use signal_pending_state() unconditionally.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
68/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: locking/spinlock: Split the lock types header
Date: Wed, 29 Jun 2011 19:34:01 +0200
Split raw_spinlock into its own file and the remaining spinlock_t into
its own non-RT header. The non-RT header will be replaced later by sleeping
spinlocks.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
69/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: locking/rtmutex: Avoid include hell
Date: Wed, 29 Jun 2011 20:06:39 +0200
Include only the required raw types. This avoids pulling in the
complete spinlock header which in turn requires rtmutex.h at some point.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
70/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: lockdep: Reduce header files in debug_locks.h
Date: Fri, 14 Aug 2020 16:55:25 +0200
The inclusion of printk.h leads to circular dependency if spinlock_t is
based on rt_mutex.
Include only atomic.h (xchg()) and cache.h (__read_mostly).
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
71/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locking: split out the rbtree definition
Date: Fri, 14 Aug 2020 17:08:41 +0200
rtmutex.h needs the definition for rb_root_cached. By including kernel.h
we will get to spinlock.h which requires rtmutex.h again.
Split out the required struct definition and move it into its own header
file which can be included by rtmutex.h
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
72/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: locking/rtmutex: Provide rt_mutex_slowlock_locked()
Date: Thu, 12 Oct 2017 16:14:22 +0200
This is the inner-part of rt_mutex_slowlock(), required for rwsem-rt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
73/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: locking/rtmutex: export lockdep-less version of rt_mutex's lock, trylock and unlock
Date: Thu, 12 Oct 2017 16:36:39 +0200
Required for lock implementation ontop of rtmutex.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
74/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Add saved_state for tasks blocked on sleeping locks
Date: Sat, 25 Jun 2011 09:21:04 +0200
Spinlocks are state preserving in !RT. RT changes the state when a
task gets blocked on a lock. So we need to remember the state before
the lock contention. If a regular wakeup (not a RTmutex related
wakeup) happens, the saved_state is updated to running. When the lock
sleep is done, the saved state is restored.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
75/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: locking/rtmutex: add sleeping lock implementation
Date: Thu, 12 Oct 2017 17:11:19 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
76/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locking/rtmutex: Allow rt_mutex_trylock() on PREEMPT_RT
Date: Wed, 2 Dec 2015 11:34:07 +0100
Non PREEMPT_RT kernel can deadlock on rt_mutex_trylock() in softirq
context.
On PREEMPT_RT the softirq context is handled in thread context. This
avoids the deadlock in the slow path and PI-boosting will be done on the
correct thread.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
77/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: locking/rtmutex: add mutex implementation based on rtmutex
Date: Thu, 12 Oct 2017 17:17:03 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
78/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: locking/rtmutex: add rwsem implementation based on rtmutex
Date: Thu, 12 Oct 2017 17:28:34 +0200
The RT specific R/W semaphore implementation restricts the number of readers
to one because a writer cannot block on multiple readers and inherit its
priority or budget.
The single reader restricting is painful in various ways:
- Performance bottleneck for multi-threaded applications in the page fault
path (mmap sem)
- Progress blocker for drivers which are carefully crafted to avoid the
potential reader/writer deadlock in mainline.
The analysis of the writer code paths shows, that properly written RT tasks
should not take them. Syscalls like mmap(), file access which take mmap sem
write locked have unbound latencies which are completely unrelated to mmap
sem. Other R/W sem users like graphics drivers are not suitable for RT tasks
either.
So there is little risk to hurt RT tasks when the RT rwsem implementation is
changed in the following way:
- Allow concurrent readers
- Make writers block until the last reader left the critical section. This
blocking is not subject to priority/budget inheritance.
- Readers blocked on a writer inherit their priority/budget in the normal
way.
There is a drawback with this scheme. R/W semaphores become writer unfair
though the applications which have triggered writer starvation (mostly on
mmap_sem) in the past are not really the typical workloads running on a RT
system. So while it's unlikely to hit writer starvation, it's possible. If
there are unexpected workloads on RT systems triggering it, we need to rethink
the approach.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
79/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: locking/rtmutex: add rwlock implementation based on rtmutex
Date: Thu, 12 Oct 2017 17:18:06 +0200
The implementation is bias-based, similar to the rwsem implementation.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
80/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: locking/rtmutex: wire up RT's locking
Date: Thu, 12 Oct 2017 17:31:14 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
81/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locking/rtmutex: add ww_mutex addon for mutex-rt
Date: Thu, 12 Oct 2017 17:34:38 +0200
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
82/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locking/rtmutex: Use custom scheduling function for spin-schedule()
Date: Tue, 6 Oct 2020 13:07:17 +0200
PREEMPT_RT builds the rwsem, mutex, spinlock and rwlock typed locks on
top of a rtmutex lock. While blocked task->pi_blocked_on is set
(tsk_is_pi_blocked()) and task needs to schedule away while waiting.
The schedule process must distinguish between blocking on a regular
sleeping lock (rwsem and mutex) and a RT-only sleeping lock (spinlock
and rwlock):
- rwsem and mutex must flush block requests (blk_schedule_flush_plug())
even if blocked on a lock. This can not deadlock because this also
happens for non-RT.
There should be a warning if the scheduling point is within a RCU read
section.
- spinlock and rwlock must not flush block requests. This will deadlock
if the callback attempts to acquire a lock which is already acquired.
Similarly to being preempted, there should be no warning if the
scheduling point is within a RCU read section.
Add preempt_schedule_lock() which is invoked if scheduling is required
while blocking on a PREEMPT_RT-only sleeping lock.
Remove tsk_is_pi_blocked() from the scheduler path which is no longer
needed with the additional scheduler entry point.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
83/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: signal: Revert ptrace preempt magic
Date: Wed, 21 Sep 2011 19:57:12 +0200
Upstream commit '53da1d9456fe7f8 fix ptrace slowness' is nothing more
than a bandaid around the ptrace design trainwreck. It's not a
correctness issue, it's merily a cosmetic bandaid.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
84/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: preempt: Provide preempt_*_(no)rt variants
Date: Fri, 24 Jul 2009 12:38:56 +0200
RT needs a few preempt_disable/enable points which are not necessary
otherwise. Implement variants to avoid #ifdeffery.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
85/191 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: mm/vmstat: Protect per cpu variables with preempt disable on RT
Date: Fri, 3 Jul 2009 08:30:13 -0500
Disable preemption on -RT for the vmstat code. On vanila the code runs in
IRQ-off regions while on -RT it is not. "preempt_disable" ensures that the
same ressources is not updated in parallel due to preemption.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
86/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: mm/memcontrol: Disable preemption in __mod_memcg_lruvec_state()
Date: Wed, 28 Oct 2020 18:15:32 +0100
The callers expect disabled preemption/interrupts while invoking
__mod_memcg_lruvec_state(). This works mainline because a lock of
somekind is acquired.
Use preempt_disable_rt() where per-CPU variables are accessed and a
stable pointer is expected. This is also done in __mod_zone_page_state()
for the same reason.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
87/191 [
Author: Ahmed S. Darwish
Email: a.darwish@linutronix.de
Subject: xfrm: Use sequence counter with associated spinlock
Date: Wed, 10 Jun 2020 12:53:22 +0200
A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.
Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.
If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.
Upstream-status: The xfrm locking used for seqcoun writer serialization
appears to be broken. If that's the case, a proper fix will need to be
submitted upstream. (e.g. make the seqcount per network namespace?)
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
88/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: u64_stats: Disable preemption on 32bit-UP/SMP with RT during updates
Date: Mon, 17 Aug 2020 12:28:10 +0200
On RT the seqcount_t is required even on UP because the softirq can be
preempted. The IRQ handler is threaded so it is also preemptible.
Disable preemption on 32bit-RT during value updates. There is no need to
disable interrupts on RT because the handler is run threaded. Therefore
disabling preemption is enough to guarantee that the update is not
interruped.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
89/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: fs/dcache: use swait_queue instead of waitqueue
Date: Wed, 14 Sep 2016 14:35:49 +0200
__d_lookup_done() invokes wake_up_all() while holding a hlist_bl_lock()
which disables preemption. As a workaround convert it to swait.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
90/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: fs/dcache: disable preemption on i_dir_seq's write side
Date: Fri, 20 Oct 2017 11:29:53 +0200
i_dir_seq is an opencoded seqcounter. Based on the code it looks like we
could have two writers in parallel despite the fact that the d_lock is
held. The problem is that during the write process on RT the preemption
is still enabled and if this process is interrupted by a reader with RT
priority then we lock up.
To avoid that lock up I am disabling the preemption during the update.
The rename of i_dir_seq is here to ensure to catch new write sides in
future.
Cc: stable-rt@vger.kernel.org
Reported-by: Oleg.Karfich@wago.com
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
91/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net/Qdisc: use a seqlock instead seqcount
Date: Wed, 14 Sep 2016 17:36:35 +0200
The seqcount disables preemption on -RT while it is held which can't
remove. Also we don't want the reader to spin for ages if the writer is
scheduled out. The seqlock on the other hand will serialize / sleep on
the lock while writer is active.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
92/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net: Properly annotate the try-lock for the seqlock
Date: Tue, 8 Sep 2020 16:57:11 +0200
In patch
("net/Qdisc: use a seqlock instead seqcount")
the seqcount has been replaced with a seqlock to allow to reader to
boost the preempted writer.
The try_write_seqlock() acquired the lock with a try-lock but the
seqcount annotation was "lock".
Opencode write_seqcount_t_begin() and use the try-lock annotation for
lockdep.
Reported-by: Mike Galbraith <efault@gmx.de>
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
93/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: kconfig: Disable config options which are not RT compatible
Date: Sun, 24 Jul 2011 12:11:43 +0200
Disable stuff which is known to have issues on RT
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
94/191 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: mm: Allow only SLUB on RT
Date: Fri, 3 Jul 2009 08:44:03 -0500
Memory allocation disables interrupts as part of the allocation and freeing
process. For -RT it is important that this section remain short and don't
depend on the size of the request or an internal state of the memory allocator.
At the beginning the SLAB memory allocator was adopted for RT's needs and it
required substantial changes. Later, with the addition of the SLUB memory
allocator we adopted this one as well and the changes were smaller. More
important, due to the design of the SLUB allocator it performs better and its
worst case latency was smaller. In the end only SLUB remained supported.
Disable SLAB and SLOB on -RT. Only SLUB is adopted to -RT needs.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
95/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Disable CONFIG_RT_GROUP_SCHED on RT
Date: Mon, 18 Jul 2011 17:03:52 +0200
Carsten reported problems when running:
taskset 01 chrt -f 1 sleep 1
from within rc.local on a F15 machine. The task stays running and
never gets on the run queue because some of the run queues have
rt_throttled=1 which does not go away. Works nice from a ssh login
shell. Disabling CONFIG_RT_GROUP_SCHED solves that as well.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
96/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net/core: disable NET_RX_BUSY_POLL on RT
Date: Sat, 27 May 2017 19:02:06 +0200
napi_busy_loop() disables preemption and performs a NAPI poll. We can't acquire
sleeping locks with disabled preemption so we would have to work around this
and add explicit locking for synchronisation against ksoftirqd.
Without explicit synchronisation a low priority process would "own" the NAPI
state (by setting NAPIF_STATE_SCHED) and could be scheduled out (no
preempt_disable() and BH is preemptible on RT).
In case a network packages arrives then the interrupt handler would set
NAPIF_STATE_MISSED and the system would wait until the task owning the NAPI
would be scheduled in again.
Should a task with RT priority busy poll then it would consume the CPU instead
allowing tasks with lower priority to run.
The NET_RX_BUSY_POLL is disabled by default (the system wide sysctls for
poll/read are set to zero) so disable NET_RX_BUSY_POLL on RT to avoid wrong
locking context on RT. Should this feature be considered useful on RT systems
then it could be enabled again with proper locking and synchronisation.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
97/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: efi: Disable runtime services on RT
Date: Thu, 26 Jul 2018 15:03:16 +0200
Based on meassurements the EFI functions get_variable /
get_next_variable take up to 2us which looks okay.
The functions get_time, set_time take around 10ms. Those 10ms are too
much. Even one ms would be too much.
Ard mentioned that SetVariable might even trigger larger latencies if
the firware will erase flash blocks on NOR.
The time-functions are used by efi-rtc and can be triggered during
runtimed (either via explicit read/write or ntp sync).
The variable write could be used by pstore.
These functions can be disabled without much of a loss. The poweroff /
reboot hooks may be provided by PSCI.
Disable EFI's runtime wrappers.
This was observed on "EFI v2.60 by SoftIron Overdrive 1000".
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
98/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: efi: Allow efi=runtime
Date: Thu, 26 Jul 2018 15:06:10 +0200
In case the command line option "efi=noruntime" is default at built-time, the user
could overwrite its state by `efi=runtime' and allow it again.
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
99/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rt: Add local irq locks
Date: Mon, 20 Jun 2011 09:03:47 +0200
Introduce locallock. For !RT this maps to preempt_disable()/
local_irq_disable() so there is not much that changes. For RT this will
map to a spinlock. This makes preemption possible and locked "ressource"
gets the lockdep anotation it wouldn't have otherwise. The locks are
recursive for owner == current. Also, all locks user migrate_disable()
which ensures that the task is not migrated to another CPU while the lock
is held and the owner is preempted.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
100/191 [
Author: Oleg Nesterov
Email: oleg@redhat.com
Subject: signal/x86: Delay calling signals in atomic
Date: Tue, 14 Jul 2015 14:26:34 +0200
On x86_64 we must disable preemption before we enable interrupts
for stack faults, int3 and debugging, because the current task is using
a per CPU debug stack defined by the IST. If we schedule out, another task
can come in and use the same stack and cause the stack to be corrupted
and crash the kernel on return.
When CONFIG_PREEMPT_RT is enabled, spin_locks become mutexes, and
one of these is the spin lock used in signal handling.
Some of the debug code (int3) causes do_trap() to send a signal.
This function calls a spin lock that has been converted to a mutex
and has the possibility to sleep. If this happens, the above issues with
the corrupted stack is possible.
Instead of calling the signal right away, for PREEMPT_RT and x86_64,
the signal information is stored on the stacks task_struct and
TIF_NOTIFY_RESUME is set. Then on exit of the trap, the signal resume
code will send the signal when preemption is enabled.
[ rostedt: Switched from #ifdef CONFIG_PREEMPT_RT to
ARCH_RT_DELAYS_SIGNAL_SEND and added comments to the code. ]
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy: also needed on 32bit as per Yang Shi <yang.shi@linaro.org>]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
101/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: kernel/sched: add {put|get}_cpu_light()
Date: Sat, 27 May 2017 19:02:06 +0200
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
102/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: trace: Add migrate-disabled counter to tracing output
Date: Sun, 17 Jul 2011 21:56:42 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
103/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locking: don't check for __LINUX_SPINLOCK_TYPES_H on -RT archs
Date: Fri, 4 Aug 2017 17:40:42 +0200
Upstream uses arch_spinlock_t within spinlock_t and requests that
spinlock_types.h header file is included first.
On -RT we have the rt_mutex with its raw_lock wait_lock which needs
architectures' spinlock_types.h header file for its definition. However
we need rt_mutex first because it is used to build the spinlock_t so
that check does not work for us.
Therefore I am dropping that check.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
104/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mm: sl[au]b: Change list_lock to raw_spinlock_t
Date: Mon, 28 May 2018 15:24:22 +0200
The list_lock is used with used with IRQs off on PREEMPT_RT. Make it a
raw_spinlock_t otherwise the interrupts won't be disabled on PREEMPT_RT.
The locking rules remain unchanged.
The lock is updated for SLAB and SLUB since both share the same header
file for struct kmem_cache_node defintion.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
105/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: mm: slub: Make object_map_lock a raw_spinlock_t
Date: Thu, 16 Jul 2020 18:47:50 +0200
The variable object_map is protected by object_map_lock. The lock is always
acquired in debug code and within already atomic context
Make object_map_lock a raw_spinlock_t.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
106/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mm: slub: Enable irqs for __GFP_WAIT
Date: Wed, 9 Jan 2013 12:08:15 +0100
SYSTEM_RUNNING might be too late for enabling interrupts. Allocations
with GFP_WAIT can happen before that. So use this as an indicator.
[bigeasy: Add warning on RT for allocations in atomic context.
Don't enable interrupts on allocations during SYSTEM_SUSPEND. This is done
during suspend by ACPI, noticed by Liwei Song <liwei.song@windriver.com>
]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
107/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: mm: slub: Move discard_slab() invocations out of IRQ-off sections
Date: Fri, 26 Feb 2021 15:14:15 +0100
discard_slab() gives the memory back to the page-allocator. Some of its
invocation occur from IRQ-disabled sections which were disabled by SLUB.
An example is the deactivate_slab() invocation from within
___slab_alloc() or put_cpu_partial().
Instead of giving the memory back directly, put the pages on a list and
process it once the caller is out of the known IRQ-off region.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
108/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: mm: slub: Move flush_cpu_slab() invocations __free_slab() invocations out of IRQ context
Date: Fri, 26 Feb 2021 17:11:55 +0100
flush_all() flushes a specific SLAB cache on each CPU (where the cache
is present). The discard_delayed()/__free_slab() invocation happens
within IPI handler and is problematic for PREEMPT_RT.
The flush operation is not a frequent operation or a hot path. The
per-CPU flush operation can be moved to within a workqueue.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
109/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: mm: slub: Don't resize the location tracking cache on PREEMPT_RT
Date: Fri, 26 Feb 2021 17:26:04 +0100
The location tracking cache has a size of a page and is resized if its
current size is too small.
This allocation happens with disabled interrupts and can't happen on
PREEMPT_RT.
Should one page be too small, then we have to allocate more at the
beginning. The only downside is that less callers will be visible.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
110/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: mm: page_alloc: Use migrate_disable() in drain_local_pages_wq()
Date: Thu, 2 Jul 2020 14:27:23 +0200
drain_local_pages_wq() disables preemption to avoid CPU migration during
CPU hotplug and can't use cpus_read_lock().
Using migrate_disable() works here, too. The scheduler won't take the
CPU offline until the task left the migrate-disable section.
Use migrate_disable() in drain_local_pages_wq().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
111/191 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: mm: page_alloc: Use a local_lock instead of explicit local_irq_save().
Date: Fri, 3 Jul 2009 08:29:37 -0500
The page-allocator disables interrupts for a few reasons:
- Decouple interrupt the irqsave operation from spin_lock() so it can be
extended over the actual lock region and cover other areas. Areas like
counters increments where the preemptible version can be avoided.
- Access to the per-CPU pcp from struct zone.
Replace the irqsave with a local-lock. The counters are expected to be
always modified with disabled preemption and no access from interrupt
context.
Contains fixes from:
Peter Zijlstra <a.p.zijlstra@chello.nl>
Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
112/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: mm: slub: Don't enable partial CPU caches on PREEMPT_RT by default
Date: Tue, 2 Mar 2021 18:58:04 +0100
SLUB's partial CPU caches lead to higher latencies in a hackbench
benchmark.
Don't enable partial CPU caches by default on PREEMPT_RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
113/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: mm: memcontrol: Provide a local_lock for per-CPU memcg_stock
Date: Tue, 18 Aug 2020 10:30:00 +0200
The interrupts are disabled to ensure CPU-local access to the per-CPU
variable `memcg_stock'.
As the code inside the interrupt disabled section acquires regular
spinlocks, which are converted to 'sleeping' spinlocks on a PREEMPT_RT
kernel, this conflicts with the RT semantics.
Convert it to a local_lock which allows RT kernels to substitute them with
a real per CPU lock. On non RT kernels this maps to local_irq_save() as
before, but provides also lockdep coverage of the critical region.
No functional change.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
114/191 [
Author: Yang Shi
Email: yang.shi@windriver.com
Subject: mm/memcontrol: Don't call schedule_work_on in preemption disabled context
Date: Wed, 30 Oct 2013 11:48:33 -0700
The following trace is triggered when running ltp oom test cases:
BUG: sleeping function called from invalid context at kernel/rtmutex.c:659
in_atomic(): 1, irqs_disabled(): 0, pid: 17188, name: oom03
Preemption disabled at:[<ffffffff8112ba70>] mem_cgroup_reclaim+0x90/0xe0
CPU: 2 PID: 17188 Comm: oom03 Not tainted 3.10.10-rt3 #2
Hardware name: Intel Corporation Calpella platform/MATXM-CORE-411-B, BIOS 4.6.3 08/18/2010
ffff88007684d730 ffff880070df9b58 ffffffff8169918d ffff880070df9b70
ffffffff8106db31 ffff88007688b4a0 ffff880070df9b88 ffffffff8169d9c0
ffff88007688b4a0 ffff880070df9bc8 ffffffff81059da1 0000000170df9bb0
Call Trace:
[<ffffffff8169918d>] dump_stack+0x19/0x1b
[<ffffffff8106db31>] __might_sleep+0xf1/0x170
[<ffffffff8169d9c0>] rt_spin_lock+0x20/0x50
[<ffffffff81059da1>] queue_work_on+0x61/0x100
[<ffffffff8112b361>] drain_all_stock+0xe1/0x1c0
[<ffffffff8112ba70>] mem_cgroup_reclaim+0x90/0xe0
[<ffffffff8112beda>] __mem_cgroup_try_charge+0x41a/0xc40
[<ffffffff810f1c91>] ? release_pages+0x1b1/0x1f0
[<ffffffff8106f200>] ? sched_exec+0x40/0xb0
[<ffffffff8112cc87>] mem_cgroup_charge_common+0x37/0x70
[<ffffffff8112e2c6>] mem_cgroup_newpage_charge+0x26/0x30
[<ffffffff8110af68>] handle_pte_fault+0x618/0x840
[<ffffffff8103ecf6>] ? unpin_current_cpu+0x16/0x70
[<ffffffff81070f94>] ? migrate_enable+0xd4/0x200
[<ffffffff8110cde5>] handle_mm_fault+0x145/0x1e0
[<ffffffff810301e1>] __do_page_fault+0x1a1/0x4c0
[<ffffffff8169c9eb>] ? preempt_schedule_irq+0x4b/0x70
[<ffffffff8169e3b7>] ? retint_kernel+0x37/0x40
[<ffffffff8103053e>] do_page_fault+0xe/0x10
[<ffffffff8169e4c2>] page_fault+0x22/0x30
So, to prevent schedule_work_on from being called in preempt disabled context,
replace the pair of get/put_cpu() to get/put_cpu_light().
Signed-off-by: Yang Shi <yang.shi@windriver.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
115/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: mm/memcontrol: Replace local_irq_disable with local locks
Date: Wed, 28 Jan 2015 17:14:16 +0100
There are a few local_irq_disable() which then take sleeping locks. This
patch converts them local locks.
[bigeasy: Move unlock after memcg_check_events() in mem_cgroup_swapout(),
pointed out by Matt Fleming <matt@codeblueprint.co.uk>]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
116/191 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: mm/zsmalloc: copy with get_cpu_var() and locking
Date: Tue, 22 Mar 2016 11:16:09 +0100
get_cpu_var() disables preemption and triggers a might_sleep() splat later.
This is replaced with get_locked_var().
This bitspinlocks are replaced with a proper mutex which requires a slightly
larger struct to allocate.
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
[bigeasy: replace the bitspin_lock() with a mutex, get_locked_var(). Mike then
fixed the size magic]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
117/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: x86: kvm Require const tsc for RT
Date: Sun, 6 Nov 2011 12:26:18 +0100
Non constant TSC is a nightmare on bare metal already, but with
virtualization it becomes a complete disaster because the workarounds
are horrible latency wise. That's also a preliminary for running RT in
a guest on top of a RT host.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
118/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: wait.h: include atomic.h
Date: Mon, 28 Oct 2013 12:19:57 +0100
| CC init/main.o
|In file included from include/linux/mmzone.h:9:0,
| from include/linux/gfp.h:4,
| from include/linux/kmod.h:22,
| from include/linux/module.h:13,
| from init/main.c:15:
|include/linux/wait.h: In function ‘wait_on_atomic_t’:
|include/linux/wait.h:982:2: error: implicit declaration of function ‘atomic_read’ [-Werror=implicit-function-declaration]
| if (atomic_read(val) == 0)
| ^
This pops up on ARM. Non-RT gets its atomic.h include from spinlock.h
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
119/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Limit the number of task migrations per batch
Date: Mon, 6 Jun 2011 12:12:51 +0200
Put an upper limit on the number of tasks which are migrated per batch
to avoid large latencies.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
120/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Move mmdrop to RCU on RT
Date: Mon, 6 Jun 2011 12:20:33 +0200
Takes sleeping locks and calls into the memory allocator, so nothing
we want to do in task switch and oder atomic contexts.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
121/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: kernel/sched: move stack + kprobe clean up to __put_task_struct()
Date: Mon, 21 Nov 2016 19:31:08 +0100
There is no need to free the stack before the task struct (except for reasons
mentioned in commit 68f24b08ee89 ("sched/core: Free the stack early if
CONFIG_THREAD_INFO_IN_TASK")). This also comes handy on -RT because we can't
free memory in preempt disabled region.
vfree_atomic() delays the memory cleanup to a worker. Since we move everything
to the RCU callback, we can also free it immediately.
Cc: stable-rt@vger.kernel.org #for kprobe_flush_task()
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
122/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Do not account rcu_preempt_depth on RT in might_sleep()
Date: Tue, 7 Jun 2011 09:19:06 +0200
RT changes the rcu_preempt_depth semantics, so we cannot check for it
in might_sleep().
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
123/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Disable TTWU_QUEUE on RT
Date: Tue, 13 Sep 2011 16:42:35 +0200
The queued remote wakeup mechanism can introduce rather large
latencies if the number of migrated tasks is high. Disable it for RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
124/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: softirq: Check preemption after reenabling interrupts
Date: Sun, 13 Nov 2011 17:17:09 +0100
raise_softirq_irqoff() disables interrupts and wakes the softirq
daemon, but after reenabling interrupts there is no preemption check,
so the execution of the softirq thread might be delayed arbitrarily.
In principle we could add that check to local_irq_enable/restore, but
that's overkill as the rasie_softirq_irqoff() sections are the only
ones which show this behaviour.
Reported-by: Carsten Emde <cbe@osadl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
125/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: softirq: Disable softirq stacks for RT
Date: Mon, 18 Jul 2011 13:59:17 +0200
Disable extra stacks for softirqs. We want to preempt softirqs and
having them on special IRQ-stack does not make this easier.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
126/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net/core: use local_bh_disable() in netif_rx_ni()
Date: Fri, 16 Jun 2017 19:03:16 +0200
In 2004 netif_rx_ni() gained a preempt_disable() section around
netif_rx() and its do_softirq() + testing for it. The do_softirq() part
is required because netif_rx() raises the softirq but does not invoke
it. The preempt_disable() is required to remain on the same CPU which added the
skb to the per-CPU list.
All this can be avoided be putting this into a local_bh_disable()ed
section. The local_bh_enable() part will invoke do_softirq() if
required.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
127/191 [
Author: Grygorii Strashko
Email: Grygorii.Strashko@linaro.org
Subject: pid.h: include atomic.h
Date: Tue, 21 Jul 2015 19:43:56 +0300
This patch fixes build error:
CC kernel/pid_namespace.o
In file included from kernel/pid_namespace.c:11:0:
include/linux/pid.h: In function 'get_pid':
include/linux/pid.h:78:3: error: implicit declaration of function 'atomic_inc' [-Werror=implicit-function-declaration]
atomic_inc(&pid->count);
^
which happens when
CONFIG_PROVE_LOCKING=n
CONFIG_DEBUG_SPINLOCK=n
CONFIG_DEBUG_MUTEXES=n
CONFIG_DEBUG_LOCK_ALLOC=n
CONFIG_PID_NS=y
Vanilla gets this via spinlock.h.
Signed-off-by: Grygorii Strashko <Grygorii.Strashko@linaro.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
128/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: ptrace: fix ptrace vs tasklist_lock race
Date: Thu, 29 Aug 2013 18:21:04 +0200
As explained by Alexander Fyodorov <halcy@yandex.ru>:
|read_lock(&tasklist_lock) in ptrace_stop() is converted to mutex on RT kernel,
|and it can remove __TASK_TRACED from task->state (by moving it to
|task->saved_state). If parent does wait() on child followed by a sys_ptrace
|call, the following race can happen:
|
|- child sets __TASK_TRACED in ptrace_stop()
|- parent does wait() which eventually calls wait_task_stopped() and returns
| child's pid
|- child blocks on read_lock(&tasklist_lock) in ptrace_stop() and moves
| __TASK_TRACED flag to saved_state
|- parent calls sys_ptrace, which calls ptrace_check_attach() and wait_task_inactive()
The patch is based on his initial patch where an additional check is
added in case the __TASK_TRACED moved to ->saved_state. The pi_lock is
taken in case the caller is interrupted between looking into ->state and
->saved_state.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
129/191 [
Author: Oleg Nesterov
Email: oleg@redhat.com
Subject: ptrace: fix ptrace_unfreeze_traced() race with rt-lock
Date: Tue, 3 Nov 2020 12:39:01 +0100
The patch "ptrace: fix ptrace vs tasklist_lock race" changed
ptrace_freeze_traced() to take task->saved_state into account, but
ptrace_unfreeze_traced() has the same problem and needs a similar fix:
it should check/update both ->state and ->saved_state.
Reported-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Fixes: "ptrace: fix ptrace vs tasklist_lock race"
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: stable-rt@vger.kernel.org
]
130/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: rcu: Delay RCU-selftests
Date: Wed, 10 Mar 2021 15:09:02 +0100
Delay RCU-selftests until ksoftirqd is up and running.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
131/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locking: Make spinlock_t and rwlock_t a RCU section on RT
Date: Tue, 19 Nov 2019 09:25:04 +0100
On !RT a locked spinlock_t and rwlock_t disables preemption which
implies a RCU read section. There is code that relies on that behaviour.
Add an explicit RCU read section on RT while a sleeping lock (a lock
which would disables preemption on !RT) acquired.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
132/191 [
Author: Scott Wood
Email: swood@redhat.com
Subject: rcutorture: Avoid problematic critical section nesting on RT
Date: Wed, 11 Sep 2019 17:57:29 +0100
rcutorture was generating some nesting scenarios that are not
reasonable. Constrain the state selection to avoid them.
Example #1:
1. preempt_disable()
2. local_bh_disable()
3. preempt_enable()
4. local_bh_enable()
On PREEMPT_RT, BH disabling takes a local lock only when called in
non-atomic context. Thus, atomic context must be retained until after BH
is re-enabled. Likewise, if BH is initially disabled in non-atomic
context, it cannot be re-enabled in atomic context.
Example #2:
1. rcu_read_lock()
2. local_irq_disable()
3. rcu_read_unlock()
4. local_irq_enable()
If the thread is preempted between steps 1 and 2,
rcu_read_unlock_special.b.blocked will be set, but it won't be
acted on in step 3 because IRQs are disabled. Thus, reporting of the
quiescent state will be delayed beyond the local_irq_enable().
For now, these scenarios will continue to be tested on non-PREEMPT_RT
kernels, until debug checks are added to ensure that they are not
happening elsewhere.
Signed-off-by: Scott Wood <swood@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
133/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mm/vmalloc: Another preempt disable region which sucks
Date: Tue, 12 Jul 2011 11:39:36 +0200
Avoid the preempt disable version of get_cpu_var(). The inner-lock should
provide enough serialisation.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
134/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: block/mq: do not invoke preempt_disable()
Date: Tue, 14 Jul 2015 14:26:34 +0200
preempt_disable() and get_cpu() don't play well together with the sleeping
locks it tries to allocate later.
It seems to be enough to replace it with get_cpu_light() and migrate_disable().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
135/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: md: raid5: Make raid5_percpu handling RT aware
Date: Tue, 6 Apr 2010 16:51:31 +0200
__raid_run_ops() disables preemption with get_cpu() around the access
to the raid5_percpu variables. That causes scheduling while atomic
spews on RT.
Serialize the access to the percpu data with a lock and keep the code
preemptible.
Reported-by: Udo van den Heuvel <udovdh@xs4all.nl>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Udo van den Heuvel <udovdh@xs4all.nl>
]
136/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: scsi/fcoe: Make RT aware.
Date: Sat, 12 Nov 2011 14:00:48 +0100
Do not disable preemption while taking sleeping locks. All user look safe
for migrate_diable() only.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
137/191 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: sunrpc: Make svc_xprt_do_enqueue() use get_cpu_light()
Date: Wed, 18 Feb 2015 16:05:28 +0100
|BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:915
|in_atomic(): 1, irqs_disabled(): 0, pid: 3194, name: rpc.nfsd
|Preemption disabled at:[<ffffffffa06bf0bb>] svc_xprt_received+0x4b/0xc0 [sunrpc]
|CPU: 6 PID: 3194 Comm: rpc.nfsd Not tainted 3.18.7-rt1 #9
|Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.404 11/06/2014
| ffff880409630000 ffff8800d9a33c78 ffffffff815bdeb5 0000000000000002
| 0000000000000000 ffff8800d9a33c98 ffffffff81073c86 ffff880408dd6008
| ffff880408dd6000 ffff8800d9a33cb8 ffffffff815c3d84 ffff88040b3ac000
|Call Trace:
| [<ffffffff815bdeb5>] dump_stack+0x4f/0x9e
| [<ffffffff81073c86>] __might_sleep+0xe6/0x150
| [<ffffffff815c3d84>] rt_spin_lock+0x24/0x50
| [<ffffffffa06beec0>] svc_xprt_do_enqueue+0x80/0x230 [sunrpc]
| [<ffffffffa06bf0bb>] svc_xprt_received+0x4b/0xc0 [sunrpc]
| [<ffffffffa06c03ed>] svc_add_new_perm_xprt+0x6d/0x80 [sunrpc]
| [<ffffffffa06b2693>] svc_addsock+0x143/0x200 [sunrpc]
| [<ffffffffa072e69c>] write_ports+0x28c/0x340 [nfsd]
| [<ffffffffa072d2ac>] nfsctl_transaction_write+0x4c/0x80 [nfsd]
| [<ffffffff8117ee83>] vfs_write+0xb3/0x1d0
| [<ffffffff8117f889>] SyS_write+0x49/0xb0
| [<ffffffff815c4556>] system_call_fastpath+0x16/0x1b
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
138/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rt: Introduce cpu_chill()
Date: Wed, 7 Mar 2012 20:51:03 +0100
Retry loops on RT might loop forever when the modifying side was
preempted. Add cpu_chill() to replace cpu_relax(). cpu_chill()
defaults to cpu_relax() for non RT. On RT it puts the looping task to
sleep for a tick so the preempted task can make progress.
Steven Rostedt changed it to use a hrtimer instead of msleep():
|
|Ulrich Obergfell pointed out that cpu_chill() calls msleep() which is woken
|up by the ksoftirqd running the TIMER softirq. But as the cpu_chill() is
|called from softirq context, it may block the ksoftirqd() from running, in
|which case, it may never wake up the msleep() causing the deadlock.
+ bigeasy later changed to schedule_hrtimeout()
|If a task calls cpu_chill() and gets woken up by a regular or spurious
|wakeup and has a signal pending, then it exits the sleep loop in
|do_nanosleep() and sets up the restart block. If restart->nanosleep.type is
|not TI_NONE then this results in accessing a stale user pointer from a
|previously interrupted syscall and a copy to user based on the stale
|pointer or a BUG() when 'type' is not supported in nanosleep_copyout().
+ bigeasy: add PF_NOFREEZE:
| [....] Waiting for /dev to be fully populated...
| =====================================
| [ BUG: udevd/229 still has locks held! ]
| 3.12.11-rt17 #23 Not tainted
| -------------------------------------
| 1 lock held by udevd/229:
| #0: (&type->i_mutex_dir_key#2){+.+.+.}, at: lookup_slow+0x28/0x98
|
| stack backtrace:
| CPU: 0 PID: 229 Comm: udevd Not tainted 3.12.11-rt17 #23
| (unwind_backtrace+0x0/0xf8) from (show_stack+0x10/0x14)
| (show_stack+0x10/0x14) from (dump_stack+0x74/0xbc)
| (dump_stack+0x74/0xbc) from (do_nanosleep+0x120/0x160)
| (do_nanosleep+0x120/0x160) from (hrtimer_nanosleep+0x90/0x110)
| (hrtimer_nanosleep+0x90/0x110) from (cpu_chill+0x30/0x38)
| (cpu_chill+0x30/0x38) from (dentry_kill+0x158/0x1ec)
| (dentry_kill+0x158/0x1ec) from (dput+0x74/0x15c)
| (dput+0x74/0x15c) from (lookup_real+0x4c/0x50)
| (lookup_real+0x4c/0x50) from (__lookup_hash+0x34/0x44)
| (__lookup_hash+0x34/0x44) from (lookup_slow+0x38/0x98)
| (lookup_slow+0x38/0x98) from (path_lookupat+0x208/0x7fc)
| (path_lookupat+0x208/0x7fc) from (filename_lookup+0x20/0x60)
| (filename_lookup+0x20/0x60) from (user_path_at_empty+0x50/0x7c)
| (user_path_at_empty+0x50/0x7c) from (user_path_at+0x14/0x1c)
| (user_path_at+0x14/0x1c) from (vfs_fstatat+0x48/0x94)
| (vfs_fstatat+0x48/0x94) from (SyS_stat64+0x14/0x30)
| (SyS_stat64+0x14/0x30) from (ret_fast_syscall+0x0/0x48)
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
139/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: fs: namespace: Use cpu_chill() in trylock loops
Date: Wed, 7 Mar 2012 21:00:34 +0100
Retry loops on RT might loop forever when the modifying side was
preempted. Use cpu_chill() instead of cpu_relax() to let the system
make progress.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
140/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: debugobjects: Make RT aware
Date: Sun, 17 Jul 2011 21:41:35 +0200
Avoid filling the pool / allocating memory with irqs off().
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
141/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: net: Use skbufhead with raw lock
Date: Tue, 12 Jul 2011 15:38:34 +0200
Use the rps lock as rawlock so we can keep irq-off regions. It looks low
latency. However we can't kfree() from this context therefore we defer this
to the softirq and use the tofree_queue list for it (similar to process_queue).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
142/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net: Dequeue in dev_cpu_dead() without the lock
Date: Wed, 16 Sep 2020 16:15:39 +0200
Upstream uses skb_dequeue() to acquire lock of `input_pkt_queue'. The reason is
to synchronize against a remote CPU which still thinks that the CPU is online
enqueues packets to this CPU.
There are no guarantees that the packet is enqueued before the callback is run,
it just hope.
RT however complains about an not initialized lock because it uses another lock
for `input_pkt_queue' due to the IRQ-off nature of the context.
Use the unlocked dequeue version for `input_pkt_queue'.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
143/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net: dev: always take qdisc's busylock in __dev_xmit_skb()
Date: Wed, 30 Mar 2016 13:36:29 +0200
The root-lock is dropped before dev_hard_start_xmit() is invoked and after
setting the __QDISC___STATE_RUNNING bit. If this task is now pushed away
by a task with a higher priority then the task with the higher priority
won't be able to submit packets to the NIC directly instead they will be
enqueued into the Qdisc. The NIC will remain idle until the task(s) with
higher priority leave the CPU and the task with lower priority gets back
and finishes the job.
If we take always the busylock we ensure that the RT task can boost the
low-prio task and submit the packet.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
144/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: irqwork: push most work into softirq context
Date: Tue, 23 Jun 2015 15:32:51 +0200
Initially we defered all irqwork into softirq because we didn't want the
latency spikes if perf or another user was busy and delayed the RT task.
The NOHZ trigger (nohz_full_kick_work) was the first user that did not work
as expected if it did not run in the original irqwork context so we had to
bring it back somehow for it. push_irq_work_func is the second one that
requires this.
This patch adds the IRQ_WORK_HARD_IRQ which makes sure the callback runs
in raw-irq context. Everything else is defered into softirq context. Without
-RT we have the orignal behavior.
This patch incorporates tglx orignal work which revoked a little bringing back
the arch_irq_work_raise() if possible and a few fixes from Steven Rostedt and
Mike Galbraith,
[bigeasy: melt tglx's irq_work_tick_soft() which splits irq_work_tick() into a
hard and soft variant]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
145/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: crypto: limit more FPU-enabled sections
Date: Thu, 30 Nov 2017 13:40:10 +0100
Those crypto drivers use SSE/AVX/… for their crypto work and in order to
do so in kernel they need to enable the "FPU" in kernel mode which
disables preemption.
There are two problems with the way they are used:
- the while loop which processes X bytes may create latency spikes and
should be avoided or limited.
- the cipher-walk-next part may allocate/free memory and may use
kmap_atomic().
The whole kernel_fpu_begin()/end() processing isn't probably that cheap.
It most likely makes sense to process as much of those as possible in one
go. The new *_fpu_sched_rt() schedules only if a RT task is pending.
Probably we should measure the performance those ciphers in pure SW
mode and with this optimisations to see if it makes sense to keep them
for RT.
This kernel_fpu_resched() makes the code more preemptible which might hurt
performance.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
146/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: crypto: cryptd - add a lock instead preempt_disable/local_bh_disable
Date: Thu, 26 Jul 2018 18:52:00 +0200
cryptd has a per-CPU lock which protected with local_bh_disable() and
preempt_disable().
Add an explicit spin_lock to make the locking context more obvious and
visible to lockdep. Since it is a per-CPU lock, there should be no lock
contention on the actual spinlock.
There is a small race-window where we could be migrated to another CPU
after the cpu_queue has been obtain. This is not a problem because the
actual ressource is protected by the spinlock.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
147/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: panic: skip get_random_bytes for RT_FULL in init_oops_id
Date: Tue, 14 Jul 2015 14:26:34 +0200
Disable on -RT. If this is invoked from irq-context we will have problems
to acquire the sleeping lock.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
148/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: x86: stackprotector: Avoid random pool on rt
Date: Thu, 16 Dec 2010 14:25:18 +0100
CPU bringup calls into the random pool to initialize the stack
canary. During boot that works nicely even on RT as the might sleep
checks are disabled. During CPU hotplug the might sleep checks
trigger. Making the locks in random raw is a major PITA, so avoid the
call on RT is the only sensible solution. This is basically the same
randomness which we get during boot where the random pool has no
entropy and we rely on the TSC randomnness.
Reported-by: Carsten Emde <carsten.emde@osadl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
149/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: random: Make it work on rt
Date: Tue, 21 Aug 2012 20:38:50 +0200
Delegate the random insertion to the forced threaded interrupt
handler. Store the return IP of the hard interrupt handler in the irq
descriptor and feed it into the random generator as a source of
entropy.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
150/191 [
Author: Priyanka Jain
Email: Priyanka.Jain@freescale.com
Subject: net: Remove preemption disabling in netif_rx()
Date: Thu, 17 May 2012 09:35:11 +0530
1)enqueue_to_backlog() (called from netif_rx) should be
bind to a particluar CPU. This can be achieved by
disabling migration. No need to disable preemption
2)Fixes crash "BUG: scheduling while atomic: ksoftirqd"
in case of RT.
If preemption is disabled, enqueue_to_backog() is called
in atomic context. And if backlog exceeds its count,
kfree_skb() is called. But in RT, kfree_skb() might
gets scheduled out, so it expects non atomic context.
-Replace preempt_enable(), preempt_disable() with
migrate_enable(), migrate_disable() respectively
-Replace get_cpu(), put_cpu() with get_cpu_light(),
put_cpu_light() respectively
Signed-off-by: Priyanka Jain <Priyanka.Jain@freescale.com>
Acked-by: Rajan Srivastava <Rajan.Srivastava@freescale.com>
Cc: <rostedt@goodmis.orgn>
Link: http://lkml.kernel.org/r/1337227511-2271-1-git-send-email-Priyanka.Jain@freescale.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy: Remove assumption about migrate_disable() from the description.]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
151/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: lockdep: Make it RT aware
Date: Sun, 17 Jul 2011 18:51:23 +0200
teach lockdep that we don't really do softirqs on -RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
152/191 [
Author: Yong Zhang
Email: yong.zhang@windriver.com
Subject: lockdep: selftest: Only do hardirq context test for raw spinlock
Date: Mon, 16 Apr 2012 15:01:56 +0800
On -rt there is no softirq context any more and rwlock is sleepable,
disable softirq context test and rwlock+irq test.
Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Cc: Yong Zhang <yong.zhang@windriver.com>
Link: http://lkml.kernel.org/r/1334559716-18447-3-git-send-email-yong.zhang0@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
153/191 [
Author: Josh Cartwright
Email: josh.cartwright@ni.com
Subject: lockdep: selftest: fix warnings due to missing PREEMPT_RT conditionals
Date: Wed, 28 Jan 2015 13:08:45 -0600
"lockdep: Selftest: Only do hardirq context test for raw spinlock"
disabled the execution of certain tests with PREEMPT_RT, but did
not prevent the tests from still being defined. This leads to warnings
like:
./linux/lib/locking-selftest.c:574:1: warning: 'irqsafe1_hard_rlock_12' defined but not used [-Wunused-function]
./linux/lib/locking-selftest.c:574:1: warning: 'irqsafe1_hard_rlock_21' defined but not used [-Wunused-function]
./linux/lib/locking-selftest.c:577:1: warning: 'irqsafe1_hard_wlock_12' defined but not used [-Wunused-function]
./linux/lib/locking-selftest.c:577:1: warning: 'irqsafe1_hard_wlock_21' defined but not used [-Wunused-function]
./linux/lib/locking-selftest.c:580:1: warning: 'irqsafe1_soft_spin_12' defined but not used [-Wunused-function]
...
Fixed by wrapping the test definitions in #ifndef CONFIG_PREEMPT_RT
conditionals.
Signed-off-by: Josh Cartwright <josh.cartwright@ni.com>
Signed-off-by: Xander Huff <xander.huff@ni.com>
Acked-by: Gratian Crisan <gratian.crisan@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
154/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: lockdep: disable self-test
Date: Tue, 17 Oct 2017 16:36:18 +0200
The self-test wasn't always 100% accurate for RT. We disabled a few
tests which failed because they had a different semantic for RT. Some
still reported false positives. Now the selftest locks up the system
during boot and it needs to be investigated…
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
155/191 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: drm,radeon,i915: Use preempt_disable/enable_rt() where recommended
Date: Sat, 27 Feb 2016 08:09:11 +0100
DRM folks identified the spots, so use them.
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-rt-users <linux-rt-users@vger.kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
156/191 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: drm/i915: Don't disable interrupts on PREEMPT_RT during atomic updates
Date: Sat, 27 Feb 2016 09:01:42 +0100
Commit
8d7849db3eab7 ("drm/i915: Make sprite updates atomic")
started disabling interrupts across atomic updates. This breaks on PREEMPT_RT
because within this section the code attempt to acquire spinlock_t locks which
are sleeping locks on PREEMPT_RT.
According to the comment the interrupts are disabled to avoid random delays and
not required for protection or synchronisation.
Don't disable interrupts on PREEMPT_RT during atomic updates.
[bigeasy: drop local locks, commit message]
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
157/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: drm/i915: disable tracing on -RT
Date: Thu, 6 Dec 2018 09:52:20 +0100
Luca Abeni reported this:
| BUG: scheduling while atomic: kworker/u8:2/15203/0x00000003
| CPU: 1 PID: 15203 Comm: kworker/u8:2 Not tainted 4.19.1-rt3 #10
| Call Trace:
| rt_spin_lock+0x3f/0x50
| gen6_read32+0x45/0x1d0 [i915]
| g4x_get_vblank_counter+0x36/0x40 [i915]
| trace_event_raw_event_i915_pipe_update_start+0x7d/0xf0 [i915]
The tracing events use trace_i915_pipe_update_start() among other events
use functions acquire spin locks. A few trace points use
intel_get_crtc_scanline(), others use ->get_vblank_counter() wich also
might acquire a sleeping lock.
Based on this I don't see any other way than disable trace points on RT.
Cc: stable-rt@vger.kernel.org
Reported-by: Luca Abeni <lucabe72@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
158/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: drm/i915: skip DRM_I915_LOW_LEVEL_TRACEPOINTS with NOTRACE
Date: Wed, 19 Dec 2018 10:47:02 +0100
The order of the header files is important. If this header file is
included after tracepoint.h was included then the NOTRACE here becomes a
nop. Currently this happens for two .c files which use the tracepoitns
behind DRM_I915_LOW_LEVEL_TRACEPOINTS.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
159/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: drm/i915/gt: Only disable interrupts for the timeline lock on !force-threaded
Date: Tue, 7 Jul 2020 12:25:11 +0200
According to commit
d67739268cf0e ("drm/i915/gt: Mark up the nested engine-pm timeline lock as irqsafe")
the intrrupts are disabled the code may be called from an interrupt
handler and from preemptible context.
With `force_irqthreads' set the timeline mutex is never observed in IRQ
context so it is not neede to disable interrupts.
Disable only interrupts if not in `force_irqthreads' mode.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
160/191 [
Author: Mike Galbraith
Email: efault@gmx.de
Subject: cpuset: Convert callback_lock to raw_spinlock_t
Date: Sun, 8 Jan 2017 09:32:25 +0100
The two commits below add up to a cpuset might_sleep() splat for RT:
8447a0fee974 cpuset: convert callback_mutex to a spinlock
344736f29b35 cpuset: simplify cpuset_node_allowed API
BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:995
in_atomic(): 0, irqs_disabled(): 1, pid: 11718, name: cset
CPU: 135 PID: 11718 Comm: cset Tainted: G E 4.10.0-rt1-rt #4
Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0056.R01.1409242327 09/24/2014
Call Trace:
? dump_stack+0x5c/0x81
? ___might_sleep+0xf4/0x170
? rt_spin_lock+0x1c/0x50
? __cpuset_node_allowed+0x66/0xc0
? ___slab_alloc+0x390/0x570 <disables IRQs>
? anon_vma_fork+0x8f/0x140
? copy_page_range+0x6cf/0xb00
? anon_vma_fork+0x8f/0x140
? __slab_alloc.isra.74+0x5a/0x81
? anon_vma_fork+0x8f/0x140
? kmem_cache_alloc+0x1b5/0x1f0
? anon_vma_fork+0x8f/0x140
? copy_process.part.35+0x1670/0x1ee0
? _do_fork+0xdd/0x3f0
? _do_fork+0xdd/0x3f0
? do_syscall_64+0x61/0x170
? entry_SYSCALL64_slow_path+0x25/0x25
The later ensured that a NUMA box WILL take callback_lock in atomic
context by removing the allocator and reclaim path __GFP_HARDWALL
usage which prevented such contexts from taking callback_mutex.
One option would be to reinstate __GFP_HARDWALL protections for
RT, however, as the 8447a0fee974 changelog states:
The callback_mutex is only used to synchronize reads/updates of cpusets'
flags and cpu/node masks. These operations should always proceed fast so
there's no reason why we can't use a spinlock instead of the mutex.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
161/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: x86: Allow to enable RT
Date: Wed, 7 Aug 2019 18:15:38 +0200
Allow to select RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
162/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mm/scatterlist: Do not disable irqs on RT
Date: Fri, 3 Jul 2009 08:44:34 -0500
For -RT it is enough to keep pagefault disabled (which is currently handled by
kmap_atomic()).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
163/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Add support for lazy preemption
Date: Fri, 26 Oct 2012 18:50:54 +0100
It has become an obsession to mitigate the determinism vs. throughput
loss of RT. Looking at the mainline semantics of preemption points
gives a hint why RT sucks throughput wise for ordinary SCHED_OTHER
tasks. One major issue is the wakeup of tasks which are right away
preempting the waking task while the waking task holds a lock on which
the woken task will block right after having preempted the wakee. In
mainline this is prevented due to the implicit preemption disable of
spin/rw_lock held regions. On RT this is not possible due to the fully
preemptible nature of sleeping spinlocks.
Though for a SCHED_OTHER task preempting another SCHED_OTHER task this
is really not a correctness issue. RT folks are concerned about
SCHED_FIFO/RR tasks preemption and not about the purely fairness
driven SCHED_OTHER preemption latencies.
So I introduced a lazy preemption mechanism which only applies to
SCHED_OTHER tasks preempting another SCHED_OTHER task. Aside of the
existing preempt_count each tasks sports now a preempt_lazy_count
which is manipulated on lock acquiry and release. This is slightly
incorrect as for lazyness reasons I coupled this on
migrate_disable/enable so some other mechanisms get the same treatment
(e.g. get_cpu_light).
Now on the scheduler side instead of setting NEED_RESCHED this sets
NEED_RESCHED_LAZY in case of a SCHED_OTHER/SCHED_OTHER preemption and
therefor allows to exit the waking task the lock held region before
the woken task preempts. That also works better for cross CPU wakeups
as the other side can stay in the adaptive spinning loop.
For RT class preemption there is no change. This simply sets
NEED_RESCHED and forgoes the lazy preemption counter.
Initial test do not expose any observable latency increasement, but
history shows that I've been proven wrong before :)
The lazy preemption mode is per default on, but with
CONFIG_SCHED_DEBUG enabled it can be disabled via:
# echo NO_PREEMPT_LAZY >/sys/kernel/debug/sched_features
and reenabled via
# echo PREEMPT_LAZY >/sys/kernel/debug/sched_features
The test results so far are very machine and workload dependent, but
there is a clear trend that it enhances the non RT workload
performance.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
164/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: x86/entry: Use should_resched() in idtentry_exit_cond_resched()
Date: Tue, 30 Jun 2020 11:45:14 +0200
The TIF_NEED_RESCHED bit is inlined on x86 into the preemption counter.
By using should_resched(0) instead of need_resched() the same check can
be performed which uses the same variable as 'preempt_count()` which was
issued before.
Use should_resched(0) instead need_resched().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
165/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: x86: Support for lazy preemption
Date: Thu, 1 Nov 2012 11:03:47 +0100
Implement the x86 pieces for lazy preempt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
166/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: arm: Add support for lazy preemption
Date: Wed, 31 Oct 2012 12:04:11 +0100
Implement the arm pieces for lazy preempt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
167/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: powerpc: Add support for lazy preemption
Date: Thu, 1 Nov 2012 10:14:11 +0100
Implement the powerpc pieces for lazy preempt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
168/191 [
Author: Anders Roxell
Email: anders.roxell@linaro.org
Subject: arch/arm64: Add lazy preempt support
Date: Thu, 14 May 2015 17:52:17 +0200
arm64 is missing support for PREEMPT_RT. The main feature which is
lacking is support for lazy preemption. The arch-specific entry code,
thread information structure definitions, and associated data tables
have to be extended to provide this support. Then the Kconfig file has
to be extended to indicate the support is available, and also to
indicate that support for full RT preemption is now available.
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
]
169/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: jump-label: disable if stop_machine() is used
Date: Wed, 8 Jul 2015 17:14:48 +0200
Some architectures are using stop_machine() while switching the opcode which
leads to latency spikes.
The architectures which use stop_machine() atm:
- ARM stop machine
- s390 stop machine
The architecures which use other sorcery:
- MIPS
- X86
- powerpc
- sparc
- arm64
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy: only ARM for now]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
170/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: leds: trigger: disable CPU trigger on -RT
Date: Thu, 23 Jan 2014 14:45:59 +0100
as it triggers:
|CPU: 0 PID: 0 Comm: swapper Not tainted 3.12.8-rt10 #141
|[<c0014aa4>] (unwind_backtrace+0x0/0xf8) from [<c0012788>] (show_stack+0x1c/0x20)
|[<c0012788>] (show_stack+0x1c/0x20) from [<c043c8dc>] (dump_stack+0x20/0x2c)
|[<c043c8dc>] (dump_stack+0x20/0x2c) from [<c004c5e8>] (__might_sleep+0x13c/0x170)
|[<c004c5e8>] (__might_sleep+0x13c/0x170) from [<c043f270>] (__rt_spin_lock+0x28/0x38)
|[<c043f270>] (__rt_spin_lock+0x28/0x38) from [<c043fa00>] (rt_read_lock+0x68/0x7c)
|[<c043fa00>] (rt_read_lock+0x68/0x7c) from [<c036cf74>] (led_trigger_event+0x2c/0x5c)
|[<c036cf74>] (led_trigger_event+0x2c/0x5c) from [<c036e0bc>] (ledtrig_cpu+0x54/0x5c)
|[<c036e0bc>] (ledtrig_cpu+0x54/0x5c) from [<c000ffd8>] (arch_cpu_idle_exit+0x18/0x1c)
|[<c000ffd8>] (arch_cpu_idle_exit+0x18/0x1c) from [<c00590b8>] (cpu_startup_entry+0xa8/0x234)
|[<c00590b8>] (cpu_startup_entry+0xa8/0x234) from [<c043b2cc>] (rest_init+0xb8/0xe0)
|[<c043b2cc>] (rest_init+0xb8/0xe0) from [<c061ebe0>] (start_kernel+0x2c4/0x380)
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
171/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: tty/serial/omap: Make the locking RT aware
Date: Thu, 28 Jul 2011 13:32:57 +0200
The lock is a sleeping lock and local_irq_save() is not the
optimsation we are looking for. Redo it to make it work on -RT and
non-RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
172/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: tty/serial/pl011: Make the locking work on RT
Date: Tue, 8 Jan 2013 21:36:51 +0100
The lock is a sleeping lock and local_irq_save() is not the optimsation
we are looking for. Redo it to make it work on -RT and non-RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
173/191 [
Author: Yadi.hu
Email: yadi.hu@windriver.com
Subject: ARM: enable irq in translation/section permission fault handlers
Date: Wed, 10 Dec 2014 10:32:09 +0800
Probably happens on all ARM, with
CONFIG_PREEMPT_RT
CONFIG_DEBUG_ATOMIC_SLEEP
This simple program....
int main() {
*((char*)0xc0001000) = 0;
};
[ 512.742724] BUG: sleeping function called from invalid context at kernel/rtmutex.c:658
[ 512.743000] in_atomic(): 0, irqs_disabled(): 128, pid: 994, name: a
[ 512.743217] INFO: lockdep is turned off.
[ 512.743360] irq event stamp: 0
[ 512.743482] hardirqs last enabled at (0): [< (null)>] (null)
[ 512.743714] hardirqs last disabled at (0): [<c0426370>] copy_process+0x3b0/0x11c0
[ 512.744013] softirqs last enabled at (0): [<c0426370>] copy_process+0x3b0/0x11c0
[ 512.744303] softirqs last disabled at (0): [< (null)>] (null)
[ 512.744631] [<c041872c>] (unwind_backtrace+0x0/0x104)
[ 512.745001] [<c09af0c4>] (dump_stack+0x20/0x24)
[ 512.745355] [<c0462490>] (__might_sleep+0x1dc/0x1e0)
[ 512.745717] [<c09b6770>] (rt_spin_lock+0x34/0x6c)
[ 512.746073] [<c0441bf0>] (do_force_sig_info+0x34/0xf0)
[ 512.746457] [<c0442668>] (force_sig_info+0x18/0x1c)
[ 512.746829] [<c041d880>] (__do_user_fault+0x9c/0xd8)
[ 512.747185] [<c041d938>] (do_bad_area+0x7c/0x94)
[ 512.747536] [<c041d990>] (do_sect_fault+0x40/0x48)
[ 512.747898] [<c040841c>] (do_DataAbort+0x40/0xa0)
[ 512.748181] Exception stack(0xecaa1fb0 to 0xecaa1ff8)
Oxc0000000 belongs to kernel address space, user task can not be
allowed to access it. For above condition, correct result is that
test case should receive a “segment fault” and exits but not stacks.
the root cause is commit 02fe2845d6a8 ("avoid enabling interrupts in
prefetch/data abort handlers"),it deletes irq enable block in Data
abort assemble code and move them into page/breakpiont/alignment fault
handlers instead. But author does not enable irq in translation/section
permission fault handlers. ARM disables irq when it enters exception/
interrupt mode, if kernel doesn't enable irq, it would be still disabled
during translation/section permission fault.
We see the above splat because do_force_sig_info is still called with
IRQs off, and that code eventually does a:
spin_lock_irqsave(&t->sighand->siglock, flags);
As this is architecture independent code, and we've not seen any other
need for other arch to have the siglock converted to raw lock, we can
conclude that we should enable irq for ARM translation/section
permission exception.
Signed-off-by: Yadi.hu <yadi.hu@windriver.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
174/191 [
Author: Josh Cartwright
Email: joshc@ni.com
Subject: genirq: update irq_set_irqchip_state documentation
Date: Thu, 11 Feb 2016 11:54:00 -0600
On -rt kernels, the use of migrate_disable()/migrate_enable() is
sufficient to guarantee a task isn't moved to another CPU. Update the
irq_set_irqchip_state() documentation to reflect this.
Signed-off-by: Josh Cartwright <joshc@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
175/191 [
Author: Josh Cartwright
Email: joshc@ni.com
Subject: KVM: arm/arm64: downgrade preempt_disable()d region to migrate_disable()
Date: Thu, 11 Feb 2016 11:54:01 -0600
kvm_arch_vcpu_ioctl_run() disables the use of preemption when updating
the vgic and timer states to prevent the calling task from migrating to
another CPU. It does so to prevent the task from writing to the
incorrect per-CPU GIC distributor registers.
On -rt kernels, it's possible to maintain the same guarantee with the
use of migrate_{disable,enable}(), with the added benefit that the
migrate-disabled region is preemptible. Update
kvm_arch_vcpu_ioctl_run() to do so.
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Reported-by: Manish Jaggi <Manish.Jaggi@caviumnetworks.com>
Signed-off-by: Josh Cartwright <joshc@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
176/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: arm64: fpsimd: Delay freeing memory in fpsimd_flush_thread()
Date: Wed, 25 Jul 2018 14:02:38 +0200
fpsimd_flush_thread() invokes kfree() via sve_free() within a preempt disabled
section which is not working on -RT.
Delay freeing of memory until preemption is enabled again.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
177/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: x86: Enable RT also on 32bit
Date: Thu, 7 Nov 2019 17:49:20 +0100
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
178/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: ARM: Allow to enable RT
Date: Fri, 11 Oct 2019 13:14:29 +0200
Allow to select RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
179/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: ARM64: Allow to enable RT
Date: Fri, 11 Oct 2019 13:14:35 +0200
Allow to select RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
180/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: powerpc: traps: Use PREEMPT_RT
Date: Fri, 26 Jul 2019 11:30:49 +0200
Add PREEMPT_RT to the backtrace if enabled.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
181/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: powerpc/pseries/iommu: Use a locallock instead local_irq_save()
Date: Tue, 26 Mar 2019 18:31:54 +0100
The locallock protects the per-CPU variable tce_page. The function
attempts to allocate memory while tce_page is protected (by disabling
interrupts).
Use local_irq_save() instead of local_irq_disable().
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
182/191 [
Author: Bogdan Purcareata
Email: bogdan.purcareata@freescale.com
Subject: powerpc/kvm: Disable in-kernel MPIC emulation for PREEMPT_RT
Date: Fri, 24 Apr 2015 15:53:13 +0000
While converting the openpic emulation code to use a raw_spinlock_t enables
guests to run on RT, there's still a performance issue. For interrupts sent in
directed delivery mode with a multiple CPU mask, the emulated openpic will loop
through all of the VCPUs, and for each VCPUs, it call IRQ_check, which will loop
through all the pending interrupts for that VCPU. This is done while holding the
raw_lock, meaning that in all this time the interrupts and preemption are
disabled on the host Linux. A malicious user app can max both these number and
cause a DoS.
This temporary fix is sent for two reasons. First is so that users who want to
use the in-kernel MPIC emulation are aware of the potential latencies, thus
making sure that the hardware MPIC and their usage scenario does not involve
interrupts sent in directed delivery mode, and the number of possible pending
interrupts is kept small. Secondly, this should incentivize the development of a
proper openpic emulation that would be better suited for RT.
Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Bogdan Purcareata <bogdan.purcareata@freescale.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
183/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: powerpc/stackprotector: work around stack-guard init from atomic
Date: Tue, 26 Mar 2019 18:31:29 +0100
This is invoked from the secondary CPU in atomic context. On x86 we use
tsc instead. On Power we XOR it against mftb() so lets use stack address
as the initial value.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
184/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: powerpc: Avoid recursive header includes
Date: Fri, 8 Jan 2021 19:48:21 +0100
- The include of bug.h leads to an include of printk.h which gets back
to spinlock.h and complains then about missing xchg().
Remove bug.h and add bits.h which is needed for BITS_PER_BYTE.
- Avoid the "please don't include this file directly" error from
rwlock-rt. Allow an include from/with rtmutex.h.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
185/191 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: POWERPC: Allow to enable RT
Date: Fri, 11 Oct 2019 13:14:41 +0200
Allow to select RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
186/191 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: drivers/block/zram: Replace bit spinlocks with rtmutex for -rt
Date: Thu, 31 Mar 2016 04:08:28 +0200
They're nondeterministic, and lead to ___might_sleep() splats in -rt.
OTOH, they're a lot less wasteful than an rtmutex per page.
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
187/191 [
Author: Haris Okanovic
Email: haris.okanovic@ni.com
Subject: tpm_tis: fix stall after iowrite*()s
Date: Tue, 15 Aug 2017 15:13:08 -0500
ioread8() operations to TPM MMIO addresses can stall the cpu when
immediately following a sequence of iowrite*()'s to the same region.
For example, cyclitest measures ~400us latency spikes when a non-RT
usermode application communicates with an SPI-based TPM chip (Intel Atom
E3940 system, PREEMPT_RT kernel). The spikes are caused by a
stalling ioread8() operation following a sequence of 30+ iowrite8()s to
the same address. I believe this happens because the write sequence is
buffered (in cpu or somewhere along the bus), and gets flushed on the
first LOAD instruction (ioread*()) that follows.
The enclosed change appears to fix this issue: read the TPM chip's
access register (status code) after every iowrite*() operation to
amortize the cost of flushing data to chip across multiple instructions.
Signed-off-by: Haris Okanovic <haris.okanovic@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
188/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: signals: Allow RT tasks to cache one sigqueue struct
Date: Fri, 3 Jul 2009 08:44:56 -0500
Allow realtime tasks to cache one sigqueue in task struct. This avoids an
allocation which can cause latencies or fail.
Ideally the sigqueue is cached after first sucessfull delivery and will be
available for next signal delivery. This works under the assumption that the RT
task has never an unprocessed singal while one is about to be queued.
The caching is not used for SIGQUEUE_PREALLOC because this kind of sigqueue is
handled differently (and not used for regular signal delivery).
[bigeasy: With a fix from Matt Fleming <matt@codeblueprint.co.uk>]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
189/191 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: genirq: Disable irqpoll on -rt
Date: Fri, 3 Jul 2009 08:29:57 -0500
Creates long latencies for no value
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
190/191 [
Author: Clark Williams
Email: williams@redhat.com
Subject: sysfs: Add /sys/kernel/realtime entry
Date: Sat, 30 Jul 2011 21:55:53 -0500
Add a /sys/kernel entry to indicate that the kernel is a
realtime kernel.
Clark says that he needs this for udev rules, udev needs to evaluate
if its a PREEMPT_RT kernel a few thousand times and parsing uname
output is too slow or so.
Are there better solutions? Should it exist and return 0 on !-rt?
Signed-off-by: Clark Williams <williams@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
]
191/191 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: Add localversion for -RT release
Date: Fri, 8 Jul 2011 20:25:16 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
1/5 [
Author: Bruce Ashfield
Email: bruce.ashfield@gmail.com
Subject: aufs5: aufs-base
Date: Fri, 19 Mar 2021 12:32:07 -0400
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
]
2/5 [
Author: Bruce Ashfield
Email: bruce.ashfield@gmail.com
Subject: aufs5: aufs-mmap
Date: Fri, 19 Mar 2021 12:32:26 -0400
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
]
3/5 [
Author: Bruce Ashfield
Email: bruce.ashfield@gmail.com
Subject: aufs5: aufs-standalone
Date: Fri, 19 Mar 2021 12:32:45 -0400
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
]
4/5 [
Author: Bruce Ashfield
Email: bruce.ashfield@gmail.com
Subject: aufs5: aufs-kbuild
Date: Fri, 19 Mar 2021 12:33:42 -0400
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
]
5/5 [
Author: Bruce Ashfield
Email: bruce.ashfield@gmail.com
Subject: aufs5: core
Date: Fri, 19 Mar 2021 12:34:34 -0400
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
]
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
Remove obsolete NODES_SPAN_OTHER_NODES kernel option which was
removed from kernel starting with v5.7.0, commit acd3f5c441e9
("mm: remove early_pfn_in_nid() and CONFIG_NODES_SPAN_OTHER_NODES").
Signed-off-by: Daniel Dragomir <Daniel.Dragomir@windriver.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
This use to enable the below option to support Bosch M_CAN controller
on Elkhart Lake:
CONFIG_CAN=m
CONFIG_CAN_M_CAN=m
CONFIG_CAN_M_CAN_PCI=m
CONFIG_CAN_M_CAN_PLATFORM=m
CONFIG_CAN_M_CAN_TCAN4X5X=m
Signed-off-by: Liwei Song <liwei.song@windriver.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
Pinctrl/GPIO support for Intel Tiger Lake SoC was added in kernel commit
c9ccf71fc8 ("pinctrl: intel: Add Intel Tiger Lake pin controller support"),
since v5.5-rc1.
Signed-off-by: Yongxin Liu <yongxin.liu@windriver.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
1/1 [
Author: Meng Li
Email: Meng.Li@windriver.com
Subject: aufs: remove the redundant fput function invoking
Date: Wed, 3 Feb 2021 14:58:18 +0800
This issue is introduced by commit bad361d963fb("aufs5:
aufs5-mmap"). The fput should be placed by vma_fput,
but it is missed. So, remove the redundant fput with this patch.
Signed-off-by: Meng Li <Meng.Li@windriver.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
]
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
oprofile is being removed from the kernel, since the oprofile userspace
tools use the perf syscalls (and have for quite a while).
So we drop our remaining configs to avoid obselete config warnings
reference:
https://lore.kernel.org/lkml/CAHk-=whw9t3ZtV8iA2SJWYQS1VOJuS14P_qhj3v5-9PCBmGQww@mail.gmail.com/
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
This has been removed in v5.10:
https://github.com/torvalds/linux/commit/6cbfa11d2694b8a1e46d6834fb9705d5589e3ef1
Signed-off-by: Anuj Mittal <anuj.mittal@intel.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
This is no longer available since v5.5:
https://github.com/torvalds/linux/commit/fb041bb7c0a918b95c6889fc965cdc4a75b4c0ca
Signed-off-by: Anuj Mittal <anuj.mittal@intel.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
The driver has been removed starting v5.10:
https://github.com/torvalds/linux/commit/50044aa715177b336acca01d711bf8edc19162c7
Signed-off-by: Anuj Mittal <anuj.mittal@intel.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
This will enable CONFIG_DPTF_PCH_FIVR for intel-x86 bsp to support
Dynamic Platform and Thermal Framework PCH FIVR Participant device.
Signed-off-by: Liwei Song <liwei.song@windriver.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
This will enable INTEL_UNCORE_FREQ_CONTROL for intel-x86 bsp
Signed-off-by: Liwei Song <liwei.song@windriver.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
This will enable CONFIG_INTEL_IDXD for intel-x86 bsp.
Signed-off-by: Liwei Song <liwei.song@windriver.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
1/3 [
Author: Paul Gortmaker
Email: paul.gortmaker@windriver.com
Subject: clear_warn_once: expand debugfs to include read support
Date: Wed, 16 Dec 2020 12:22:56 -0500
The existing clear_warn_once variable is write-only; used as per the
documentation to reset the warn_once to as-booted state with:
echo 1 > /sys/kernel/debug/clear_warn_once
The objective is to expand on that functionality, which requires the
debugfs variable to be read/write and not just write-only.
Here, we deal with the debugfs boilerplate associated with converting
it from write-only to read-write, in order to factor that out for easier
review, and for what may be a possible future useful bisect point.
Existing functionality is unchanged - the only difference is that we
have tied in a variable that lets you now read the variable and see
the last value written.
Link: https://lore.kernel.org/r/20201126063029.2030-1-paul.gortmaker@windriver.com
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
]
2/3 [
Author: Paul Gortmaker
Email: paul.gortmaker@windriver.com
Subject: clear_warn_once: bind a timer to written reset value
Date: Wed, 16 Dec 2020 12:22:57 -0500
Existing documentation has a write of "1" to clear/reset all the
WARN_ONCE and similar to the as-booted state, so they can possibly
be re-triggered again during debugging/testing.
But having them auto-reset once a day, or once a week, might shed
valuable information to a sysadmin on what the system is doing while
trying to debug an issue.
Here we extend the existing debugfs variable to bind a timer to the
written value N, so that it will reset every N minutes, for N>1.
Writing a zero will clear any previously set timer value.
The pre-existing behaviour of writing N=1 will do a one-shot clear.
Link: https://lore.kernel.org/r/20201126063029.2030-1-paul.gortmaker@windriver.com
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
]
3/3 [
Author: Paul Gortmaker
Email: paul.gortmaker@windriver.com
Subject: clear_warn_once: add a clear_warn_once= boot parameter
Date: Wed, 16 Dec 2020 12:22:58 -0500
In the event debugfs is not mounted, or if built with the .config
setting DEBUG_FS_ALLOW_NONE chosen, this gives the sysadmin access
to reset the WARN_ONCE() state on a periodic basis.
Link: https://lore.kernel.org/r/20201126063029.2030-1-paul.gortmaker@windriver.com
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
]
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
This option is m selected by our dependencies, so lets align with
them and avoid a warning.
We keep the option, so that if it changes in the future, we'll get
the warning again.
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
Fixing the following warning, we can't set it to 'y', when it is 'm'
selected otherwise:
[NOTE]: 'CONFIG_DRM_KMS_HELPER' last val (y) and .config val (m) do not match
[INFO]: CONFIG_DRM_KMS_HELPER : m ## .config: 2331 :configs/v5.10/standard/tiny/features/i915/i915.cfg (y)
[INFO]: raw config text:
config DRM_KMS_HELPER
tristate
depends on DRM && HAS_IOMEM
help
CRTC helpers for KMS drivers.
Config 'DRM_KMS_HELPER' has the following Direct dependencies (DRM_KMS_HELPER=y):
DRM(=y) && HAS_IOMEM(=y)
Parent dependencies are:
DRM [y] HAS_IOMEM [y]
[INFO]: config 'CONFIG_DRM_KMS_HELPER' was set, but it wasn't assignable, check (parent) dependencies
[INFO]: selection details for 'CONFIG_DRM_KMS_HELPER':
Symbols currently m-selecting this symbol:
- DRM_I915
- DRM_CIRRUS_QEMU
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
commit e56dc9e2949edff7932474f2552dd134734cc857
Author: J. Bruce Fields <bfields@redhat.com>
Date: Thu Jul 30 20:33:57 2020 -0400
nfsd: remove fault injection code
It was an interesting idea but nobody seems to be using it, it's buggy
at this point, and nfs4state.c is already complicated enough without it.
The new nfsd/clients/ code provides some of the same functionality, and
could probably do more if desired.
This feature has been deprecated since 9d60d93198c6 ("Deprecate nfsd
fault injection").
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
Signed-off-by: Yongxin Liu <yongxin.liu@windriver.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
1/194 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: stop_machine: Add function and caller debug info
Date: Fri, 23 Oct 2020 12:11:59 +0200
Crashes in stop-machine are hard to connect to the calling code, add a
little something to help with that.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
2/194 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: sched: Fix balance_callback()
Date: Fri, 23 Oct 2020 12:12:00 +0200
The intent of balance_callback() has always been to delay executing
balancing operations until the end of the current rq->lock section.
This is because balance operations must often drop rq->lock, and that
isn't safe in general.
However, as noted by Scott, there were a few holes in that scheme;
balance_callback() was called after rq->lock was dropped, which means
another CPU can interleave and touch the callback list.
Rework code to call the balance callbacks before dropping rq->lock
where possible, and otherwise splice the balance list onto a local
stack.
This guarantees that the balance list must be empty when we take
rq->lock. IOW, we'll only ever run our own balance callbacks.
Reported-by: Scott Wood <swood@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
3/194 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: sched/hotplug: Ensure only per-cpu kthreads run during hotplug
Date: Fri, 23 Oct 2020 12:12:01 +0200
In preparation for migrate_disable(), make sure only per-cpu kthreads
are allowed to run on !active CPUs.
This is ran (as one of the very first steps) from the cpu-hotplug
task which is a per-cpu kthread and completion of the hotplug
operation only requires such tasks.
This constraint enables the migrate_disable() implementation to wait
for completion of all migrate_disable regions on this CPU at hotplug
time without fear of any new ones starting.
This replaces the unlikely(rq->balance_callbacks) test at the tail of
context_switch with an unlikely(rq->balance_work), the fast path is
not affected.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
4/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched/core: Wait for tasks being pushed away on hotplug
Date: Fri, 23 Oct 2020 12:12:02 +0200
RT kernels need to ensure that all tasks which are not per CPU kthreads
have left the outgoing CPU to guarantee that no tasks are force migrated
within a migrate disabled section.
There is also some desire to (ab)use fine grained CPU hotplug control to
clear a CPU from active state to force migrate tasks which are not per CPU
kthreads away for power control purposes.
Add a mechanism which waits until all tasks which should leave the CPU
after the CPU active flag is cleared have moved to a different online CPU.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
5/194 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: workqueue: Manually break affinity on hotplug
Date: Fri, 23 Oct 2020 12:12:03 +0200
Don't rely on the scheduler to force break affinity for us -- it will
stop doing that for per-cpu-kthreads.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
6/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched/hotplug: Consolidate task migration on CPU unplug
Date: Fri, 23 Oct 2020 12:12:04 +0200
With the new mechanism which kicks tasks off the outgoing CPU at the end of
schedule() the situation on an outgoing CPU right before the stopper thread
brings it down completely is:
- All user tasks and all unbound kernel threads have either been migrated
away or are not running and the next wakeup will move them to a online CPU.
- All per CPU kernel threads, except cpu hotplug thread and the stopper
thread have either been unbound or parked by the responsible CPU hotplug
callback.
That means that at the last step before the stopper thread is invoked the
cpu hotplug thread is the last legitimate running task on the outgoing
CPU.
Add a final wait step right before the stopper thread is kicked which
ensures that any still running tasks on the way to park or on the way to
kick themself of the CPU are either sleeping or gone.
This allows to remove the migrate_tasks() crutch in sched_cpu_dying(). If
sched_cpu_dying() detects that there is still another running task aside of
the stopper thread then it will explode with the appropriate fireworks.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
7/194 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: sched: Fix hotplug vs CPU bandwidth control
Date: Fri, 23 Oct 2020 12:12:05 +0200
Since we now migrate tasks away before DYING, we should also move
bandwidth unthrottle, otherwise we can gain tasks from unthrottle
after we expect all tasks to be gone already.
Also; it looks like the RT balancers don't respect cpu_active() and
instead rely on rq->online in part, complete this. This too requires
we do set_rq_offline() earlier to match the cpu_active() semantics.
(The bigger patch is to convert RT to cpu_active() entirely)
Since set_rq_online() is called from sched_cpu_activate(), place
set_rq_offline() in sched_cpu_deactivate().
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
8/194 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: sched: Massage set_cpus_allowed()
Date: Fri, 23 Oct 2020 12:12:06 +0200
Thread a u32 flags word through the *set_cpus_allowed*() callchain.
This will allow adding behavioural tweaks for future users.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
9/194 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: sched: Add migrate_disable()
Date: Fri, 23 Oct 2020 12:12:07 +0200
Add the base migrate_disable() support (under protest).
While migrate_disable() is (currently) required for PREEMPT_RT, it is
also one of the biggest flaws in the system.
Notably this is just the base implementation, it is broken vs
sched_setaffinity() and hotplug, both solved in additional patches for
ease of review.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
10/194 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: sched: Fix migrate_disable() vs set_cpus_allowed_ptr()
Date: Fri, 23 Oct 2020 12:12:08 +0200
Concurrent migrate_disable() and set_cpus_allowed_ptr() has
interesting features. We rely on set_cpus_allowed_ptr() to not return
until the task runs inside the provided mask. This expectation is
exported to userspace.
This means that any set_cpus_allowed_ptr() caller must wait until
migrate_enable() allows migrations.
At the same time, we don't want migrate_enable() to schedule, due to
patterns like:
preempt_disable();
migrate_disable();
...
migrate_enable();
preempt_enable();
And:
raw_spin_lock(&B);
spin_unlock(&A);
this means that when migrate_enable() must restore the affinity
mask, it cannot wait for completion thereof. Luck will have it that
that is exactly the case where there is a pending
set_cpus_allowed_ptr(), so let that provide storage for the async stop
machine.
Much thanks to Valentin who used TLA+ most effective and found lots of
'interesting' cases.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
11/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched/core: Make migrate disable and CPU hotplug cooperative
Date: Fri, 23 Oct 2020 12:12:09 +0200
On CPU unplug tasks which are in a migrate disabled region cannot be pushed
to a different CPU until they returned to migrateable state.
Account the number of tasks on a runqueue which are in a migrate disabled
section and make the hotplug wait mechanism respect that.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
12/194 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: sched,rt: Use cpumask_any*_distribute()
Date: Fri, 23 Oct 2020 12:12:10 +0200
Replace a bunch of cpumask_any*() instances with
cpumask_any*_distribute(), by injecting this little bit of random in
cpu selection, we reduce the chance two competing balance operations
working off the same lowest_mask pick the same CPU.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
13/194 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: sched,rt: Use the full cpumask for balancing
Date: Fri, 23 Oct 2020 12:12:11 +0200
We want migrate_disable() tasks to get PULLs in order for them to PUSH
away the higher priority task.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
14/194 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: sched, lockdep: Annotate ->pi_lock recursion
Date: Fri, 23 Oct 2020 12:12:12 +0200
There's a valid ->pi_lock recursion issue where the actual PI code
tries to wake up the stop task. Make lockdep aware so it doesn't
complain about this.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
15/194 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: sched: Fix migrate_disable() vs rt/dl balancing
Date: Fri, 23 Oct 2020 12:12:13 +0200
In order to minimize the interference of migrate_disable() on lower
priority tasks, which can be deprived of runtime due to being stuck
below a higher priority task. Teach the RT/DL balancers to push away
these higher priority tasks when a lower priority task gets selected
to run on a freshly demoted CPU (pull).
This adds migration interference to the higher priority task, but
restores bandwidth to system that would otherwise be irrevocably lost.
Without this it would be possible to have all tasks on the system
stuck on a single CPU, each task preempted in a migrate_disable()
section with a single high priority task running.
This way we can still approximate running the M highest priority tasks
on the system.
Migrating the top task away is (ofcourse) still subject to
migrate_disable() too, which means the lower task is subject to an
interference equivalent to the worst case migrate_disable() section.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
16/194 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: sched/proc: Print accurate cpumask vs migrate_disable()
Date: Fri, 23 Oct 2020 12:12:14 +0200
Ensure /proc/*/status doesn't print 'random' cpumasks due to
migrate_disable().
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
17/194 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: sched: Add migrate_disable() tracepoints
Date: Fri, 23 Oct 2020 12:12:15 +0200
XXX write a tracer:
- 'migirate_disable() -> migrate_enable()' time in task_sched_runtime()
- 'migrate_pull -> sched-in' time in task_sched_runtime()
The first will give worst case for the second, which is the actual
interference experienced by the task to due migration constraints of
migrate_disable().
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
18/194 [
Author: Valentin Schneider
Email: valentin.schneider@arm.com
Subject: sched: Deny self-issued __set_cpus_allowed_ptr() when migrate_disable()
Date: Fri, 23 Oct 2020 12:12:16 +0200
migrate_disable();
set_cpus_allowed_ptr(current, {something excluding task_cpu(current)});
affine_move_task(); <-- never returns
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20201013140116.26651-1-valentin.schneider@arm.com
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
19/194 [
Author: Valentin Schneider
Email: valentin.schneider@arm.com
Subject: sched: Comment affine_move_task()
Date: Fri, 23 Oct 2020 12:12:17 +0200
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20201013140116.26651-2-valentin.schneider@arm.com
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
20/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: Use CONFIG_PREEMPTION
Date: Fri, 26 Jul 2019 11:30:49 +0200
Thisi is an all-in-one patch of the current `PREEMPTION' branch.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
21/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: blk-mq: Don't complete on a remote CPU in force threaded mode
Date: Wed, 28 Oct 2020 11:07:44 +0100
With force threaded interrupts enabled, raising softirq from an SMP
function call will always result in waking the ksoftirqd thread. This is
not optimal given that the thread runs at SCHED_OTHER priority.
Completing the request in hard IRQ-context on PREEMPT_RT (which enforces
the force threaded mode) is bad because the completion handler may
acquire sleeping locks which violate the locking context.
Disable request completing on a remote CPU in force threaded mode.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
22/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: blk-mq: Always complete remote completions requests in softirq
Date: Wed, 28 Oct 2020 11:07:09 +0100
Controllers with multiple queues have their IRQ-handelers pinned to a
CPU. The core shouldn't need to complete the request on a remote CPU.
Remove this case and always raise the softirq to complete the request.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
23/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: blk-mq: Use llist_head for blk_cpu_done
Date: Wed, 28 Oct 2020 11:08:21 +0100
With llist_head it is possible to avoid the locking (the irq-off region)
when items are added. This makes it possible to add items on a remote
CPU.
llist_add() returns true if the list was previously empty. This can be
used to invoke the SMP function call / raise sofirq only if the first
item was added (otherwise it is already pending).
This simplifies the code a little and reduces the IRQ-off regions. With
this change it possible to reduce the SMP-function call a simple
__raise_softirq_irqoff().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
24/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: lib/test_lockup: Minimum fix to get it compiled on PREEMPT_RT
Date: Wed, 28 Oct 2020 18:55:27 +0100
On PREEMPT_RT the locks are quite different so they can't be tested as
it is done below. The alternative is test for the waitlock within
rtmutex.
This is the bare minim to get it compiled. Problems which exists on
PREEMP_RT:
- none of the locks (spinlock_t, rwlock_t, mutex_t, rw_semaphore) may be
acquired with disabled preemption or interrupts.
If I read the code correct the it is possible to acquire a mutex with
disabled interrupts.
I don't know how to obtain a lock pointer. Technically they are not
exported to userland.
- memory can not be allocated with disabled premption or interrupts even
with GFP_ATOMIC.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
25/194 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: refactor kmsg_dump_get_buffer()
Date: Wed, 14 Oct 2020 19:09:15 +0200
kmsg_dump_get_buffer() requires nearly the same logic as
syslog_print_all(), but uses different variable names and
does not make use of the ringbuffer loop macros. Modify
kmsg_dump_get_buffer() so that the implementation is as similar
to syslog_print_all() as possible.
At some point it would be nice to have this code factored into a
helper function. But until then, the code should at least look
similar enough so that it is obvious there is logic duplication
implemented.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
26/194 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: use buffer pools for sprint buffers
Date: Tue, 13 Oct 2020 22:57:55 +0200
vprintk_store() is using a single static buffer as a temporary
sprint buffer for the message text. This will not work once
@logbuf_lock is removed. Replace the single static buffer with
per-cpu and global pools.
Each per-cpu pool is large enough to support a worse case of 2
contexts (non-NMI and NMI).
To support printk() recursion and printk() calls before per-cpu
variables are ready, an extra/fallback global pool of 2 contexts is
available.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
27/194 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: change @clear_seq to atomic64_t
Date: Tue, 13 Oct 2020 23:19:35 +0200
Currently @clear_seq access is protected by @logbuf_lock. Once
@logbuf_lock is removed some other form of synchronization will be
required. Change the type of @clear_seq to atomic64_t to provide the
synchronization.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
28/194 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: remove logbuf_lock, add syslog_lock
Date: Wed, 14 Oct 2020 19:06:12 +0200
Since the ringbuffer is lockless, there is no need for it to be
protected by @logbuf_lock. Remove @logbuf_lock.
This means that printk_nmi_direct and printk_safe_flush_on_panic()
no longer need to acquire any lock to run.
The global variables @syslog_seq, @syslog_partial, @syslog_time
were also protected by @logbuf_lock. Introduce @syslog_lock to
protect these.
@console_seq, @exclusive_console_stop_seq, @console_dropped are
protected by @console_lock.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
29/194 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: remove safe buffers
Date: Wed, 14 Oct 2020 20:00:11 +0200
With @logbuf_lock removed, the high level printk functions for
storing messages are lockless. Messages can be stored from any
context, so there is no need for the NMI and safe buffers anymore.
Remove the NMI and safe buffers. In NMI or safe contexts, store
the message immediately but still use irq_work to defer the console
printing.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
30/194 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: console: add write_atomic interface
Date: Wed, 14 Oct 2020 20:26:35 +0200
Add a write_atomic() callback to the console. This is an optional
function for console drivers. The function must be atomic (including
NMI safe) for writing to the console.
Console drivers must still implement the write() callback. The
write_atomic() callback will only be used in special situations,
such as when the kernel panics.
Creating an NMI safe write_atomic() that must synchronize with
write() requires a careful implementation of the console driver. To
aid with the implementation, a set of console_atomic_*() functions
are provided:
void console_atomic_lock(unsigned int *flags);
void console_atomic_unlock(unsigned int flags);
These functions synchronize using a processor-reentrant spinlock
(called a cpulock).
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
31/194 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: serial: 8250: implement write_atomic
Date: Wed, 14 Oct 2020 20:31:46 +0200
Implement a non-sleeping NMI-safe write_atomic() console function in
order to support emergency console printing.
Since interrupts need to be disabled during transmit, all usage of
the IER register is wrapped with access functions that use the
console_atomic_lock() function to synchronize register access while
tracking the state of the interrupts. This is necessary because
write_atomic() can be called from an NMI context that has preempted
write_atomic().
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
32/194 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: inline log_output(),log_store() in vprintk_store()
Date: Mon, 19 Oct 2020 16:40:26 +0206
In preparation for supporting atomic printing, inline log_output()
and log_store() into vprintk_store(). This allows these
sub-functions to more easily communicate if they have performed
a finalized commit as well as the sequence number of that commit.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
33/194 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: relocate printk_delay() and vprintk_default()
Date: Mon, 19 Oct 2020 21:02:40 +0206
Move printk_delay() and vprintk_default() "as is" further up so that
they can be used by new functions in an upcoming commit.
Signed-off-by: John Ogness <john.ogness@linutornix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
34/194 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: combine boot_delay_msec() into printk_delay()
Date: Mon, 19 Oct 2020 22:11:31 +0206
boot_delay_msec() is always called immediately before printk_delay()
so just combine the two.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
35/194 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: introduce kernel sync mode
Date: Wed, 14 Oct 2020 20:40:05 +0200
When the kernel performs an OOPS, enter into "sync mode":
- only atomic consoles (write_atomic() callback) will print
- printing occurs within vprintk_store() instead of console_unlock()
Change @console_seq to atomic64_t for atomic access.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
36/194 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: move console printing to kthreads
Date: Mon, 19 Oct 2020 22:30:38 +0206
Create a kthread for each console to perform console printing. Now
all console printing is fully asynchronous except for the boot
console and when the kernel enters sync mode (and there are atomic
consoles available).
The console_lock() and console_unlock() functions now only do what
their name says... locking and unlocking of the console.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
37/194 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: remove deferred printing
Date: Mon, 19 Oct 2020 22:53:30 +0206
Since printing occurs either atomically or from the printing
kthread, there is no need for any deferring or tracking possible
recursion paths. Remove all printk context tracking.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
38/194 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: add console handover
Date: Mon, 19 Oct 2020 23:03:44 +0206
If earlyprintk is used, a boot console will print directly to the
console immediately. The boot console will unregister itself as soon
as a non-boot console registers. However, the non-boot console does
not begin printing until its kthread has started. Since this happens
much later, there is a long pause in the console output. If the
ringbuffer is small, messages could even be dropped during the
pause.
Add a new CON_HANDOVER console flag to be used internally by printk
in order to track which non-boot console took over from a boot
console. If handover consoles have implemented write_atomic(), they
are allowed to print directly to the console until their kthread can
take over.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
39/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: printk: Tiny cleanup
Date: Tue, 20 Oct 2020 18:48:16 +0200
- mark functions and variables static which are used only in this file.
- add printf annotation where appropriate
- remove static functions without caller
- add kdb header file for kgdb builds.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
40/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: cgroup: use irqsave in cgroup_rstat_flush_locked()
Date: Tue, 3 Jul 2018 18:19:48 +0200
All callers of cgroup_rstat_flush_locked() acquire cgroup_rstat_lock
either with spin_lock_irq() or spin_lock_irqsave().
cgroup_rstat_flush_locked() itself acquires cgroup_rstat_cpu_lock which
is a raw_spin_lock. This lock is also acquired in cgroup_rstat_updated()
in IRQ context and therefore requires _irqsave() locking suffix in
cgroup_rstat_flush_locked().
Since there is no difference between spin_lock_t and raw_spin_lock_t
on !RT lockdep does not complain here. On RT lockdep complains because
the interrupts were not disabled here and a deadlock is possible.
Acquire the raw_spin_lock_t with disabled interrupts.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
41/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: mm: workingset: replace IRQ-off check with a lockdep assert.
Date: Mon, 11 Feb 2019 10:40:46 +0100
Commit
68d48e6a2df57 ("mm: workingset: add vmstat counter for shadow nodes")
introduced an IRQ-off check to ensure that a lock is held which also
disabled interrupts. This does not work the same way on -RT because none
of the locks, that are held, disable interrupts.
Replace this check with a lockdep assert which ensures that the lock is
held.
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
42/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: tpm: remove tpm_dev_wq_lock
Date: Mon, 11 Feb 2019 11:33:11 +0100
Added in commit
9e1b74a63f776 ("tpm: add support for nonblocking operation")
but never actually used it.
Cc: Philip Tricca <philip.b.tricca@intel.com>
Cc: Tadeusz Struk <tadeusz.struk@intel.com>
Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
43/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: shmem: Use raw_spinlock_t for ->stat_lock
Date: Fri, 14 Aug 2020 18:53:34 +0200
Each CPU has SHMEM_INO_BATCH inodes available in `->ino_batch' which is
per-CPU. Access here is serialized by disabling preemption. If the pool is
empty, it gets reloaded from `->next_ino'. Access here is serialized by
->stat_lock which is a spinlock_t and can not be acquired with disabled
preemption.
One way around it would make per-CPU ino_batch struct containing the inode
number a local_lock_t.
Another sollution is to promote ->stat_lock to a raw_spinlock_t. The critical
sections are short. The mpol_put() should be moved outside of the critical
section to avoid invoking the destrutor with disabled preemption.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
44/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: net: Move lockdep where it belongs
Date: Tue, 8 Sep 2020 07:32:20 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
45/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: tcp: Remove superfluous BH-disable around listening_hash
Date: Mon, 12 Oct 2020 17:33:54 +0200
Commit
9652dc2eb9e40 ("tcp: relax listening_hash operations")
removed the need to disable bottom half while acquiring
listening_hash.lock. There are still two callers left which disable
bottom half before the lock is acquired.
Drop local_bh_disable() around __inet_hash() which acquires
listening_hash->lock, invoke inet_ehash_nolisten() with disabled BH.
inet_unhash() conditionally acquires listening_hash->lock.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
46/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: x86/fpu: Do not disable BH on RT
Date: Mon, 21 Sep 2020 20:15:50 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
47/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: softirq: Add RT variant
Date: Mon, 21 Sep 2020 17:26:19 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
48/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: tick/sched: Prevent false positive softirq pending warnings on RT
Date: Mon, 31 Aug 2020 17:02:36 +0200
On RT a task which has soft interrupts disabled can block on a lock and
schedule out to idle while soft interrupts are pending. This triggers the
warning in the NOHZ idle code which complains about going idle with pending
soft interrupts. But as the task is blocked soft interrupt processing is
temporarily blocked as well which means that such a warning is a false
positive.
To prevent that check the per CPU state which indicates that a scheduled
out task has soft interrupts disabled.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
49/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rcu: Prevent false positive softirq warning on RT
Date: Mon, 31 Aug 2020 17:26:08 +0200
Soft interrupt disabled sections can legitimately be preempted or schedule
out when blocking on a lock on RT enabled kernels so the RCU preempt check
warning has to be disabled for RT kernels.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
50/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: softirq: Replace barrier() with cpu_relax() in tasklet_unlock_wait()
Date: Mon, 31 Aug 2020 15:12:38 +0200
A barrier() in a tight loop which waits for something to happen on a remote
CPU is a pointless exercise. Replace it with cpu_relax() which allows HT
siblings to make progress.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
51/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: tasklets: Avoid cancel/kill deadlock on RT
Date: Mon, 21 Sep 2020 17:47:34 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
52/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: tasklets: Use static line for functions
Date: Mon, 7 Sep 2020 22:57:32 +0200
Inlines exist for a reason.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
53/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locking/rtmutex: Remove cruft
Date: Tue, 29 Sep 2020 15:21:17 +0200
Most of this is around since the very beginning. I'm not sure if this
was used while the rtmutex-deadlock-tester was around but today it seems
to only waste memory:
- save_state: No users
- name: Assigned and printed if a dead lock was detected. I'm keeping it
but want to point out that lockdep has the same information.
- file + line: Printed if ::name was NULL. This is only used for
in-kernel locks so it ::name shouldn't be NULL and then ::file and
::line isn't used.
- magic: Assigned to NULL by rt_mutex_destroy().
Remove members of rt_mutex which are not used.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
54/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locking/rtmutex: Remove output from deadlock detector.
Date: Tue, 29 Sep 2020 16:05:11 +0200
In commit
f5694788ad8da ("rt_mutex: Add lockdep annotations")
rtmutex gained lockdep annotation for rt_mutex_lock() and and related
functions.
lockdep will see the locking order and may complain about a deadlock
before rtmutex' own mechanism gets a chance to detect it.
The rtmutex deadlock detector will only complain locks with the
RT_MUTEX_MIN_CHAINWALK and a waiter must be pending. That means it
works only for in-kernel locks because the futex interface always uses
RT_MUTEX_FULL_CHAINWALK.
The requirement for an active waiter limits the detector to actual
deadlocks and makes it possible to report potential deadlocks like
lockdep does.
It looks like lockdep is better suited for reporting deadlocks.
Remove rtmutex' debug print on deadlock detection.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
55/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locking/rtmutex: Move rt_mutex_init() outside of CONFIG_DEBUG_RT_MUTEXES
Date: Tue, 29 Sep 2020 16:32:49 +0200
rt_mutex_init() only initializes lockdep if CONFIG_DEBUG_RT_MUTEXES is
enabled. The static initializer (DEFINE_RT_MUTEX) does not have such a
restriction.
Move rt_mutex_init() outside of CONFIG_DEBUG_RT_MUTEXES.
Move the remaining functions in this CONFIG_DEBUG_RT_MUTEXES block to
the upper block.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
56/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locking/rtmutex: Remove rt_mutex_timed_lock()
Date: Wed, 7 Oct 2020 12:11:33 +0200
rt_mutex_timed_lock() has no callers since commit
c051b21f71d1f ("rtmutex: Confine deadlock logic to futex")
Remove rt_mutex_timed_lock().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
57/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: locking/rtmutex: Handle the various new futex race conditions
Date: Fri, 10 Jun 2011 11:04:15 +0200
RT opens a few new interesting race conditions in the rtmutex/futex
combo due to futex hash bucket lock being a 'sleeping' spinlock and
therefor not disabling preemption.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
58/194 [
Author: Steven Rostedt
Email: rostedt@goodmis.org
Subject: futex: Fix bug on when a requeued RT task times out
Date: Tue, 14 Jul 2015 14:26:34 +0200
Requeue with timeout causes a bug with PREEMPT_RT.
The bug comes from a timed out condition.
TASK 1 TASK 2
------ ------
futex_wait_requeue_pi()
futex_wait_queue_me()
<timed out>
double_lock_hb();
raw_spin_lock(pi_lock);
if (current->pi_blocked_on) {
} else {
current->pi_blocked_on = PI_WAKE_INPROGRESS;
run_spin_unlock(pi_lock);
spin_lock(hb->lock); <-- blocked!
plist_for_each_entry_safe(this) {
rt_mutex_start_proxy_lock();
task_blocks_on_rt_mutex();
BUG_ON(task->pi_blocked_on)!!!!
The BUG_ON() actually has a check for PI_WAKE_INPROGRESS, but the
problem is that, after TASK 1 sets PI_WAKE_INPROGRESS, it then tries to
grab the hb->lock, which it fails to do so. As the hb->lock is a mutex,
it will block and set the "pi_blocked_on" to the hb->lock.
When TASK 2 goes to requeue it, the check for PI_WAKE_INPROGESS fails
because the task1's pi_blocked_on is no longer set to that, but instead,
set to the hb->lock.
The fix:
When calling rt_mutex_start_proxy_lock() a check is made to see
if the proxy tasks pi_blocked_on is set. If so, exit out early.
Otherwise set it to a new flag PI_REQUEUE_INPROGRESS, which notifies
the proxy task that it is being requeued, and will handle things
appropriately.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
59/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: locking/rtmutex: Make lock_killable work
Date: Sat, 1 Apr 2017 12:50:59 +0200
Locking an rt mutex killable does not work because signal handling is
restricted to TASK_INTERRUPTIBLE.
Use signal_pending_state() unconditionally.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
60/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: locking/spinlock: Split the lock types header
Date: Wed, 29 Jun 2011 19:34:01 +0200
Split raw_spinlock into its own file and the remaining spinlock_t into
its own non-RT header. The non-RT header will be replaced later by sleeping
spinlocks.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
61/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: locking/rtmutex: Avoid include hell
Date: Wed, 29 Jun 2011 20:06:39 +0200
Include only the required raw types. This avoids pulling in the
complete spinlock header which in turn requires rtmutex.h at some point.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
62/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: lockdep: Reduce header files in debug_locks.h
Date: Fri, 14 Aug 2020 16:55:25 +0200
The inclusion of printk.h leads to circular dependency if spinlock_t is
based on rt_mutex.
Include only atomic.h (xchg()) and cache.h (__read_mostly).
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
63/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locking: split out the rbtree definition
Date: Fri, 14 Aug 2020 17:08:41 +0200
rtmutex.h needs the definition for rb_root_cached. By including kernel.h
we will get to spinlock.h which requires rtmutex.h again.
Split out the required struct definition and move it into its own header
file which can be included by rtmutex.h
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
64/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: locking/rtmutex: Provide rt_mutex_slowlock_locked()
Date: Thu, 12 Oct 2017 16:14:22 +0200
This is the inner-part of rt_mutex_slowlock(), required for rwsem-rt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
65/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: locking/rtmutex: export lockdep-less version of rt_mutex's lock, trylock and unlock
Date: Thu, 12 Oct 2017 16:36:39 +0200
Required for lock implementation ontop of rtmutex.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
66/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Add saved_state for tasks blocked on sleeping locks
Date: Sat, 25 Jun 2011 09:21:04 +0200
Spinlocks are state preserving in !RT. RT changes the state when a
task gets blocked on a lock. So we need to remember the state before
the lock contention. If a regular wakeup (not a RTmutex related
wakeup) happens, the saved_state is updated to running. When the lock
sleep is done, the saved state is restored.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
67/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: locking/rtmutex: add sleeping lock implementation
Date: Thu, 12 Oct 2017 17:11:19 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
68/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locking/rtmutex: Allow rt_mutex_trylock() on PREEMPT_RT
Date: Wed, 2 Dec 2015 11:34:07 +0100
Non PREEMPT_RT kernel can deadlock on rt_mutex_trylock() in softirq
context.
On PREEMPT_RT the softirq context is handled in thread context. This
avoids the deadlock in the slow path and PI-boosting will be done on the
correct thread.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
69/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: locking/rtmutex: add mutex implementation based on rtmutex
Date: Thu, 12 Oct 2017 17:17:03 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
70/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: locking/rtmutex: add rwsem implementation based on rtmutex
Date: Thu, 12 Oct 2017 17:28:34 +0200
The RT specific R/W semaphore implementation restricts the number of readers
to one because a writer cannot block on multiple readers and inherit its
priority or budget.
The single reader restricting is painful in various ways:
- Performance bottleneck for multi-threaded applications in the page fault
path (mmap sem)
- Progress blocker for drivers which are carefully crafted to avoid the
potential reader/writer deadlock in mainline.
The analysis of the writer code paths shows, that properly written RT tasks
should not take them. Syscalls like mmap(), file access which take mmap sem
write locked have unbound latencies which are completely unrelated to mmap
sem. Other R/W sem users like graphics drivers are not suitable for RT tasks
either.
So there is little risk to hurt RT tasks when the RT rwsem implementation is
changed in the following way:
- Allow concurrent readers
- Make writers block until the last reader left the critical section. This
blocking is not subject to priority/budget inheritance.
- Readers blocked on a writer inherit their priority/budget in the normal
way.
There is a drawback with this scheme. R/W semaphores become writer unfair
though the applications which have triggered writer starvation (mostly on
mmap_sem) in the past are not really the typical workloads running on a RT
system. So while it's unlikely to hit writer starvation, it's possible. If
there are unexpected workloads on RT systems triggering it, we need to rethink
the approach.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
71/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: locking/rtmutex: add rwlock implementation based on rtmutex
Date: Thu, 12 Oct 2017 17:18:06 +0200
The implementation is bias-based, similar to the rwsem implementation.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
72/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: locking/rtmutex: wire up RT's locking
Date: Thu, 12 Oct 2017 17:31:14 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
73/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locking/rtmutex: add ww_mutex addon for mutex-rt
Date: Thu, 12 Oct 2017 17:34:38 +0200
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
74/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locking/rtmutex: Use custom scheduling function for spin-schedule()
Date: Tue, 6 Oct 2020 13:07:17 +0200
PREEMPT_RT builds the rwsem, mutex, spinlock and rwlock typed locks on
top of a rtmutex lock. While blocked task->pi_blocked_on is set
(tsk_is_pi_blocked()) and task needs to schedule away while waiting.
The schedule process must distinguish between blocking on a regular
sleeping lock (rwsem and mutex) and a RT-only sleeping lock (spinlock
and rwlock):
- rwsem and mutex must flush block requests (blk_schedule_flush_plug())
even if blocked on a lock. This can not deadlock because this also
happens for non-RT.
There should be a warning if the scheduling point is within a RCU read
section.
- spinlock and rwlock must not flush block requests. This will deadlock
if the callback attempts to acquire a lock which is already acquired.
Similarly to being preempted, there should be no warning if the
scheduling point is within a RCU read section.
Add preempt_schedule_lock() which is invoked if scheduling is required
while blocking on a PREEMPT_RT-only sleeping lock.
Remove tsk_is_pi_blocked() from the scheduler path which is no longer
needed with the additional scheduler entry point.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
75/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: signal: Revert ptrace preempt magic
Date: Wed, 21 Sep 2011 19:57:12 +0200
Upstream commit '53da1d9456fe7f8 fix ptrace slowness' is nothing more
than a bandaid around the ptrace design trainwreck. It's not a
correctness issue, it's merily a cosmetic bandaid.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
76/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: preempt: Provide preempt_*_(no)rt variants
Date: Fri, 24 Jul 2009 12:38:56 +0200
RT needs a few preempt_disable/enable points which are not necessary
otherwise. Implement variants to avoid #ifdeffery.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
77/194 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: mm/vmstat: Protect per cpu variables with preempt disable on RT
Date: Fri, 3 Jul 2009 08:30:13 -0500
Disable preemption on -RT for the vmstat code. On vanila the code runs in
IRQ-off regions while on -RT it is not. "preempt_disable" ensures that the
same ressources is not updated in parallel due to preemption.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
78/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: mm/memcontrol: Disable preemption in __mod_memcg_lruvec_state()
Date: Wed, 28 Oct 2020 18:15:32 +0100
The callers expect disabled preemption/interrupts while invoking
__mod_memcg_lruvec_state(). This works mainline because a lock of
somekind is acquired.
Use preempt_disable_rt() where per-CPU variables are accessed and a
stable pointer is expected. This is also done in __mod_zone_page_state()
for the same reason.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
79/194 [
Author: Ahmed S. Darwish
Email: a.darwish@linutronix.de
Subject: xfrm: Use sequence counter with associated spinlock
Date: Wed, 10 Jun 2020 12:53:22 +0200
A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.
Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.
If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.
Upstream-status: The xfrm locking used for seqcoun writer serialization
appears to be broken. If that's the case, a proper fix will need to be
submitted upstream. (e.g. make the seqcount per network namespace?)
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
80/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: u64_stats: Disable preemption on 32bit-UP/SMP with RT during updates
Date: Mon, 17 Aug 2020 12:28:10 +0200
On RT the seqcount_t is required even on UP because the softirq can be
preempted. The IRQ handler is threaded so it is also preemptible.
Disable preemption on 32bit-RT during value updates. There is no need to
disable interrupts on RT because the handler is run threaded. Therefore
disabling preemption is enough to guarantee that the update is not
interruped.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
81/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: fs/dcache: use swait_queue instead of waitqueue
Date: Wed, 14 Sep 2016 14:35:49 +0200
__d_lookup_done() invokes wake_up_all() while holding a hlist_bl_lock()
which disables preemption. As a workaround convert it to swait.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
82/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: fs/dcache: disable preemption on i_dir_seq's write side
Date: Fri, 20 Oct 2017 11:29:53 +0200
i_dir_seq is an opencoded seqcounter. Based on the code it looks like we
could have two writers in parallel despite the fact that the d_lock is
held. The problem is that during the write process on RT the preemption
is still enabled and if this process is interrupted by a reader with RT
priority then we lock up.
To avoid that lock up I am disabling the preemption during the update.
The rename of i_dir_seq is here to ensure to catch new write sides in
future.
Cc: stable-rt@vger.kernel.org
Reported-by: Oleg.Karfich@wago.com
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
83/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net/Qdisc: use a seqlock instead seqcount
Date: Wed, 14 Sep 2016 17:36:35 +0200
The seqcount disables preemption on -RT while it is held which can't
remove. Also we don't want the reader to spin for ages if the writer is
scheduled out. The seqlock on the other hand will serialize / sleep on
the lock while writer is active.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
84/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net: Properly annotate the try-lock for the seqlock
Date: Tue, 8 Sep 2020 16:57:11 +0200
In patch
("net/Qdisc: use a seqlock instead seqcount")
the seqcount has been replaced with a seqlock to allow to reader to
boost the preempted writer.
The try_write_seqlock() acquired the lock with a try-lock but the
seqcount annotation was "lock".
Opencode write_seqcount_t_begin() and use the try-lock annotation for
lockdep.
Reported-by: Mike Galbraith <efault@gmx.de>
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
85/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: kconfig: Disable config options which are not RT compatible
Date: Sun, 24 Jul 2011 12:11:43 +0200
Disable stuff which is known to have issues on RT
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
86/194 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: mm: Allow only SLUB on RT
Date: Fri, 3 Jul 2009 08:44:03 -0500
Memory allocation disables interrupts as part of the allocation and freeing
process. For -RT it is important that this section remain short and don't
depend on the size of the request or an internal state of the memory allocator.
At the beginning the SLAB memory allocator was adopted for RT's needs and it
required substantial changes. Later, with the addition of the SLUB memory
allocator we adopted this one as well and the changes were smaller. More
important, due to the design of the SLUB allocator it performs better and its
worst case latency was smaller. In the end only SLUB remained supported.
Disable SLAB and SLOB on -RT. Only SLUB is adopted to -RT needs.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
87/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: rcu: make RCU_BOOST default on RT
Date: Fri, 21 Mar 2014 20:19:05 +0100
Since it is no longer invoked from the softirq people run into OOM more
often if the priority of the RCU thread is too low. Making boosting
default on RT should help in those case and it can be switched off if
someone knows better.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
88/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Disable CONFIG_RT_GROUP_SCHED on RT
Date: Mon, 18 Jul 2011 17:03:52 +0200
Carsten reported problems when running:
taskset 01 chrt -f 1 sleep 1
from within rc.local on a F15 machine. The task stays running and
never gets on the run queue because some of the run queues have
rt_throttled=1 which does not go away. Works nice from a ssh login
shell. Disabling CONFIG_RT_GROUP_SCHED solves that as well.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
89/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net/core: disable NET_RX_BUSY_POLL on RT
Date: Sat, 27 May 2017 19:02:06 +0200
napi_busy_loop() disables preemption and performs a NAPI poll. We can't acquire
sleeping locks with disabled preemption so we would have to work around this
and add explicit locking for synchronisation against ksoftirqd.
Without explicit synchronisation a low priority process would "own" the NAPI
state (by setting NAPIF_STATE_SCHED) and could be scheduled out (no
preempt_disable() and BH is preemptible on RT).
In case a network packages arrives then the interrupt handler would set
NAPIF_STATE_MISSED and the system would wait until the task owning the NAPI
would be scheduled in again.
Should a task with RT priority busy poll then it would consume the CPU instead
allowing tasks with lower priority to run.
The NET_RX_BUSY_POLL is disabled by default (the system wide sysctls for
poll/read are set to zero) so disable NET_RX_BUSY_POLL on RT to avoid wrong
locking context on RT. Should this feature be considered useful on RT systems
then it could be enabled again with proper locking and synchronisation.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
90/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: efi: Disable runtime services on RT
Date: Thu, 26 Jul 2018 15:03:16 +0200
Based on meassurements the EFI functions get_variable /
get_next_variable take up to 2us which looks okay.
The functions get_time, set_time take around 10ms. Those 10ms are too
much. Even one ms would be too much.
Ard mentioned that SetVariable might even trigger larger latencies if
the firware will erase flash blocks on NOR.
The time-functions are used by efi-rtc and can be triggered during
runtimed (either via explicit read/write or ntp sync).
The variable write could be used by pstore.
These functions can be disabled without much of a loss. The poweroff /
reboot hooks may be provided by PSCI.
Disable EFI's runtime wrappers.
This was observed on "EFI v2.60 by SoftIron Overdrive 1000".
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
91/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: efi: Allow efi=runtime
Date: Thu, 26 Jul 2018 15:06:10 +0200
In case the command line option "efi=noruntime" is default at built-time, the user
could overwrite its state by `efi=runtime' and allow it again.
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
92/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rt: Add local irq locks
Date: Mon, 20 Jun 2011 09:03:47 +0200
Introduce locallock. For !RT this maps to preempt_disable()/
local_irq_disable() so there is not much that changes. For RT this will
map to a spinlock. This makes preemption possible and locked "ressource"
gets the lockdep anotation it wouldn't have otherwise. The locks are
recursive for owner == current. Also, all locks user migrate_disable()
which ensures that the task is not migrated to another CPU while the lock
is held and the owner is preempted.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
93/194 [
Author: Oleg Nesterov
Email: oleg@redhat.com
Subject: signal/x86: Delay calling signals in atomic
Date: Tue, 14 Jul 2015 14:26:34 +0200
On x86_64 we must disable preemption before we enable interrupts
for stack faults, int3 and debugging, because the current task is using
a per CPU debug stack defined by the IST. If we schedule out, another task
can come in and use the same stack and cause the stack to be corrupted
and crash the kernel on return.
When CONFIG_PREEMPT_RT is enabled, spin_locks become mutexes, and
one of these is the spin lock used in signal handling.
Some of the debug code (int3) causes do_trap() to send a signal.
This function calls a spin lock that has been converted to a mutex
and has the possibility to sleep. If this happens, the above issues with
the corrupted stack is possible.
Instead of calling the signal right away, for PREEMPT_RT and x86_64,
the signal information is stored on the stacks task_struct and
TIF_NOTIFY_RESUME is set. Then on exit of the trap, the signal resume
code will send the signal when preemption is enabled.
[ rostedt: Switched from #ifdef CONFIG_PREEMPT_RT to
ARCH_RT_DELAYS_SIGNAL_SEND and added comments to the code. ]
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy: also needed on 32bit as per Yang Shi <yang.shi@linaro.org>]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
94/194 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: Split IRQ-off and zone->lock while freeing pages from PCP list #2
Date: Mon, 28 May 2018 15:24:21 +0200
Split the IRQ-off section while accessing the PCP list from zone->lock
while freeing pages.
Introcude isolate_pcp_pages() which separates the pages from the PCP
list onto a temporary list and then free the temporary list via
free_pcppages_bulk().
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
95/194 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: Split IRQ-off and zone->lock while freeing pages from PCP list #2
Date: Mon, 28 May 2018 15:24:21 +0200
Split the IRQ-off section while accessing the PCP list from zone->lock
while freeing pages.
Introcude isolate_pcp_pages() which separates the pages from the PCP
list onto a temporary list and then free the temporary list via
free_pcppages_bulk().
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
96/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mm/SLxB: change list_lock to raw_spinlock_t
Date: Mon, 28 May 2018 15:24:22 +0200
The list_lock is used with used with IRQs off on RT. Make it a raw_spinlock_t
otherwise the interrupts won't be disabled on -RT. The locking rules remain
the same on !RT.
This patch changes it for SLAB and SLUB since both share the same header
file for struct kmem_cache_node defintion.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
97/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mm/SLUB: delay giving back empty slubs to IRQ enabled regions
Date: Thu, 21 Jun 2018 17:29:19 +0200
__free_slab() is invoked with disabled interrupts which increases the
irq-off time while __free_pages() is doing the work.
Allow __free_slab() to be invoked with enabled interrupts and move
everything from interrupts-off invocations to a temporary per-CPU list
so it can be processed later.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
98/194 [
Author: Kevin Hao
Email: haokexin@gmail.com
Subject: mm: slub: Always flush the delayed empty slubs in flush_all()
Date: Mon, 4 May 2020 11:34:07 +0800
After commit f0b231101c94 ("mm/SLUB: delay giving back empty slubs to
IRQ enabled regions"), when the free_slab() is invoked with the IRQ
disabled, the empty slubs are moved to a per-CPU list and will be
freed after IRQ enabled later. But in the current codes, there is
a check to see if there really has the cpu slub on a specific cpu
before flushing the delayed empty slubs, this may cause a reference
of already released kmem_cache in a scenario like below:
cpu 0 cpu 1
kmem_cache_destroy()
flush_all()
--->IPI flush_cpu_slab()
flush_slab()
deactivate_slab()
discard_slab()
free_slab()
c->page = NULL;
for_each_online_cpu(cpu)
if (!has_cpu_slab(1, s))
continue
this skip to flush the delayed
empty slub released by cpu1
kmem_cache_free(kmem_cache, s)
kmalloc()
__slab_alloc()
free_delayed()
__free_slab()
reference to released kmem_cache
Fixes: f0b231101c94 ("mm/SLUB: delay giving back empty slubs to IRQ enabled regions")
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: stable-rt@vger.kernel.org
]
99/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: mm/page_alloc: Use migrate_disable() in drain_local_pages_wq()
Date: Thu, 2 Jul 2020 14:27:23 +0200
drain_local_pages_wq() disables preemption to avoid CPU migration during
CPU hotplug.
Using migrate_disable() makes the function preemptible on PREEMPT_RT but
still avoids CPU migrations during CPU-hotplug. On !PREEMPT_RT it
behaves like preempt_disable().
Use migrate_disable() in drain_local_pages_wq().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
100/194 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: mm: page_alloc: rt-friendly per-cpu pages
Date: Fri, 3 Jul 2009 08:29:37 -0500
rt-friendly per-cpu pages: convert the irqs-off per-cpu locking
method into a preemptible, explicit-per-cpu-locks method.
Contains fixes from:
Peter Zijlstra <a.p.zijlstra@chello.nl>
Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
101/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: mm/slub: Make object_map_lock a raw_spinlock_t
Date: Thu, 16 Jul 2020 18:47:50 +0200
The variable object_map is protected by object_map_lock. The lock is always
acquired in debug code and within already atomic context
Make object_map_lock a raw_spinlock_t.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
102/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: slub: Enable irqs for __GFP_WAIT
Date: Wed, 9 Jan 2013 12:08:15 +0100
SYSTEM_RUNNING might be too late for enabling interrupts. Allocations
with GFP_WAIT can happen before that. So use this as an indicator.
[bigeasy: Add warning on RT for allocations in atomic context.
Don't enable interrupts on allocations during SYSTEM_SUSPEND. This is done
during suspend by ACPI, noticed by Liwei Song <liwei.song@windriver.com>
]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
103/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: slub: Disable SLUB_CPU_PARTIAL
Date: Wed, 15 Apr 2015 19:00:47 +0200
|BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:915
|in_atomic(): 1, irqs_disabled(): 0, pid: 87, name: rcuop/7
|1 lock held by rcuop/7/87:
| #0: (rcu_callback){......}, at: [<ffffffff8112c76a>] rcu_nocb_kthread+0x1ca/0x5d0
|Preemption disabled at:[<ffffffff811eebd9>] put_cpu_partial+0x29/0x220
|
|CPU: 0 PID: 87 Comm: rcuop/7 Tainted: G W 4.0.0-rt0+ #477
|Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
| 000000000007a9fc ffff88013987baf8 ffffffff817441c7 0000000000000007
| 0000000000000000 ffff88013987bb18 ffffffff810eee51 0000000000000000
| ffff88013fc10200 ffff88013987bb48 ffffffff8174a1c4 000000000007a9fc
|Call Trace:
| [<ffffffff817441c7>] dump_stack+0x4f/0x90
| [<ffffffff810eee51>] ___might_sleep+0x121/0x1b0
| [<ffffffff8174a1c4>] rt_spin_lock+0x24/0x60
| [<ffffffff811a689a>] __free_pages_ok+0xaa/0x540
| [<ffffffff811a729d>] __free_pages+0x1d/0x30
| [<ffffffff811eddd5>] __free_slab+0xc5/0x1e0
| [<ffffffff811edf46>] free_delayed+0x56/0x70
| [<ffffffff811eecfd>] put_cpu_partial+0x14d/0x220
| [<ffffffff811efc98>] __slab_free+0x158/0x2c0
| [<ffffffff811f0021>] kmem_cache_free+0x221/0x2d0
| [<ffffffff81204d0c>] file_free_rcu+0x2c/0x40
| [<ffffffff8112c7e3>] rcu_nocb_kthread+0x243/0x5d0
| [<ffffffff810e951c>] kthread+0xfc/0x120
| [<ffffffff8174abc8>] ret_from_fork+0x58/0x90
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
104/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: mm: memcontrol: Provide a local_lock for per-CPU memcg_stock
Date: Tue, 18 Aug 2020 10:30:00 +0200
The interrupts are disabled to ensure CPU-local access to the per-CPU
variable `memcg_stock'.
As the code inside the interrupt disabled section acquires regular
spinlocks, which are converted to 'sleeping' spinlocks on a PREEMPT_RT
kernel, this conflicts with the RT semantics.
Convert it to a local_lock which allows RT kernels to substitute them with
a real per CPU lock. On non RT kernels this maps to local_irq_save() as
before, but provides also lockdep coverage of the critical region.
No functional change.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
105/194 [
Author: Yang Shi
Email: yang.shi@windriver.com
Subject: mm/memcontrol: Don't call schedule_work_on in preemption disabled context
Date: Wed, 30 Oct 2013 11:48:33 -0700
The following trace is triggered when running ltp oom test cases:
BUG: sleeping function called from invalid context at kernel/rtmutex.c:659
in_atomic(): 1, irqs_disabled(): 0, pid: 17188, name: oom03
Preemption disabled at:[<ffffffff8112ba70>] mem_cgroup_reclaim+0x90/0xe0
CPU: 2 PID: 17188 Comm: oom03 Not tainted 3.10.10-rt3 #2
Hardware name: Intel Corporation Calpella platform/MATXM-CORE-411-B, BIOS 4.6.3 08/18/2010
ffff88007684d730 ffff880070df9b58 ffffffff8169918d ffff880070df9b70
ffffffff8106db31 ffff88007688b4a0 ffff880070df9b88 ffffffff8169d9c0
ffff88007688b4a0 ffff880070df9bc8 ffffffff81059da1 0000000170df9bb0
Call Trace:
[<ffffffff8169918d>] dump_stack+0x19/0x1b
[<ffffffff8106db31>] __might_sleep+0xf1/0x170
[<ffffffff8169d9c0>] rt_spin_lock+0x20/0x50
[<ffffffff81059da1>] queue_work_on+0x61/0x100
[<ffffffff8112b361>] drain_all_stock+0xe1/0x1c0
[<ffffffff8112ba70>] mem_cgroup_reclaim+0x90/0xe0
[<ffffffff8112beda>] __mem_cgroup_try_charge+0x41a/0xc40
[<ffffffff810f1c91>] ? release_pages+0x1b1/0x1f0
[<ffffffff8106f200>] ? sched_exec+0x40/0xb0
[<ffffffff8112cc87>] mem_cgroup_charge_common+0x37/0x70
[<ffffffff8112e2c6>] mem_cgroup_newpage_charge+0x26/0x30
[<ffffffff8110af68>] handle_pte_fault+0x618/0x840
[<ffffffff8103ecf6>] ? unpin_current_cpu+0x16/0x70
[<ffffffff81070f94>] ? migrate_enable+0xd4/0x200
[<ffffffff8110cde5>] handle_mm_fault+0x145/0x1e0
[<ffffffff810301e1>] __do_page_fault+0x1a1/0x4c0
[<ffffffff8169c9eb>] ? preempt_schedule_irq+0x4b/0x70
[<ffffffff8169e3b7>] ? retint_kernel+0x37/0x40
[<ffffffff8103053e>] do_page_fault+0xe/0x10
[<ffffffff8169e4c2>] page_fault+0x22/0x30
So, to prevent schedule_work_on from being called in preempt disabled context,
replace the pair of get/put_cpu() to get/put_cpu_light().
Signed-off-by: Yang Shi <yang.shi@windriver.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
106/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: mm/memcontrol: Replace local_irq_disable with local locks
Date: Wed, 28 Jan 2015 17:14:16 +0100
There are a few local_irq_disable() which then take sleeping locks. This
patch converts them local locks.
[bigeasy: Move unlock after memcg_check_events() in mem_cgroup_swapout(),
pointed out by Matt Fleming <matt@codeblueprint.co.uk>]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
107/194 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: mm/zsmalloc: copy with get_cpu_var() and locking
Date: Tue, 22 Mar 2016 11:16:09 +0100
get_cpu_var() disables preemption and triggers a might_sleep() splat later.
This is replaced with get_locked_var().
This bitspinlocks are replaced with a proper mutex which requires a slightly
larger struct to allocate.
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
[bigeasy: replace the bitspin_lock() with a mutex, get_locked_var(). Mike then
fixed the size magic]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
108/194 [
Author: Luis Claudio R. Goncalves
Email: lgoncalv@redhat.com
Subject: mm/zswap: Use local lock to protect per-CPU data
Date: Tue, 25 Jun 2019 11:28:04 -0300
zwap uses per-CPU compression. The per-CPU data pointer is acquired with
get_cpu_ptr() which implicitly disables preemption. It allocates
memory inside the preempt disabled region which conflicts with the
PREEMPT_RT semantics.
Replace the implicit preemption control with an explicit local lock.
This allows RT kernels to substitute it with a real per CPU lock, which
serializes the access but keeps the code section preemptible. On non RT
kernels this maps to preempt_disable() as before, i.e. no functional
change.
[bigeasy: Use local_lock(), additional hunks, patch description]
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
109/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: x86: kvm Require const tsc for RT
Date: Sun, 6 Nov 2011 12:26:18 +0100
Non constant TSC is a nightmare on bare metal already, but with
virtualization it becomes a complete disaster because the workarounds
are horrible latency wise. That's also a preliminary for running RT in
a guest on top of a RT host.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
110/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: wait.h: include atomic.h
Date: Mon, 28 Oct 2013 12:19:57 +0100
| CC init/main.o
|In file included from include/linux/mmzone.h:9:0,
| from include/linux/gfp.h:4,
| from include/linux/kmod.h:22,
| from include/linux/module.h:13,
| from init/main.c:15:
|include/linux/wait.h: In function ‘wait_on_atomic_t’:
|include/linux/wait.h:982:2: error: implicit declaration of function ‘atomic_read’ [-Werror=implicit-function-declaration]
| if (atomic_read(val) == 0)
| ^
This pops up on ARM. Non-RT gets its atomic.h include from spinlock.h
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
111/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: hrtimer: Allow raw wakeups during boot
Date: Fri, 9 Aug 2019 15:25:21 +0200
There are a few wake-up timers during the early boot which are essencial for
the system to make progress. At this stage there are no softirq spawn for the
softirq processing so there is no timer processing in softirq.
The wakeup in question:
smpboot_create_thread()
-> kthread_create_on_cpu()
-> kthread_bind()
-> wait_task_inactive()
-> schedule_hrtimeout()
Let the timer fire in hardirq context during the system boot.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
112/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Limit the number of task migrations per batch
Date: Mon, 6 Jun 2011 12:12:51 +0200
Put an upper limit on the number of tasks which are migrated per batch
to avoid large latencies.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
113/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Move mmdrop to RCU on RT
Date: Mon, 6 Jun 2011 12:20:33 +0200
Takes sleeping locks and calls into the memory allocator, so nothing
we want to do in task switch and oder atomic contexts.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
114/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: kernel/sched: move stack + kprobe clean up to __put_task_struct()
Date: Mon, 21 Nov 2016 19:31:08 +0100
There is no need to free the stack before the task struct (except for reasons
mentioned in commit 68f24b08ee89 ("sched/core: Free the stack early if
CONFIG_THREAD_INFO_IN_TASK")). This also comes handy on -RT because we can't
free memory in preempt disabled region.
vfree_atomic() delays the memory cleanup to a worker. Since we move everything
to the RCU callback, we can also free it immediately.
Cc: stable-rt@vger.kernel.org #for kprobe_flush_task()
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
115/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Do not account rcu_preempt_depth on RT in might_sleep()
Date: Tue, 7 Jun 2011 09:19:06 +0200
RT changes the rcu_preempt_depth semantics, so we cannot check for it
in might_sleep().
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
116/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Disable TTWU_QUEUE on RT
Date: Tue, 13 Sep 2011 16:42:35 +0200
The queued remote wakeup mechanism can introduce rather large
latencies if the number of migrated tasks is high. Disable it for RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
117/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: softirq: Check preemption after reenabling interrupts
Date: Sun, 13 Nov 2011 17:17:09 +0100
raise_softirq_irqoff() disables interrupts and wakes the softirq
daemon, but after reenabling interrupts there is no preemption check,
so the execution of the softirq thread might be delayed arbitrarily.
In principle we could add that check to local_irq_enable/restore, but
that's overkill as the rasie_softirq_irqoff() sections are the only
ones which show this behaviour.
Reported-by: Carsten Emde <cbe@osadl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
118/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: softirq: Disable softirq stacks for RT
Date: Mon, 18 Jul 2011 13:59:17 +0200
Disable extra stacks for softirqs. We want to preempt softirqs and
having them on special IRQ-stack does not make this easier.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
119/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net/core: use local_bh_disable() in netif_rx_ni()
Date: Fri, 16 Jun 2017 19:03:16 +0200
In 2004 netif_rx_ni() gained a preempt_disable() section around
netif_rx() and its do_softirq() + testing for it. The do_softirq() part
is required because netif_rx() raises the softirq but does not invoke
it. The preempt_disable() is required to remain on the same CPU which added the
skb to the per-CPU list.
All this can be avoided be putting this into a local_bh_disable()ed
section. The local_bh_enable() part will invoke do_softirq() if
required.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
120/194 [
Author: Grygorii Strashko
Email: Grygorii.Strashko@linaro.org
Subject: pid.h: include atomic.h
Date: Tue, 21 Jul 2015 19:43:56 +0300
This patch fixes build error:
CC kernel/pid_namespace.o
In file included from kernel/pid_namespace.c:11:0:
include/linux/pid.h: In function 'get_pid':
include/linux/pid.h:78:3: error: implicit declaration of function 'atomic_inc' [-Werror=implicit-function-declaration]
atomic_inc(&pid->count);
^
which happens when
CONFIG_PROVE_LOCKING=n
CONFIG_DEBUG_SPINLOCK=n
CONFIG_DEBUG_MUTEXES=n
CONFIG_DEBUG_LOCK_ALLOC=n
CONFIG_PID_NS=y
Vanilla gets this via spinlock.h.
Signed-off-by: Grygorii Strashko <Grygorii.Strashko@linaro.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
121/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: ptrace: fix ptrace vs tasklist_lock race
Date: Thu, 29 Aug 2013 18:21:04 +0200
As explained by Alexander Fyodorov <halcy@yandex.ru>:
|read_lock(&tasklist_lock) in ptrace_stop() is converted to mutex on RT kernel,
|and it can remove __TASK_TRACED from task->state (by moving it to
|task->saved_state). If parent does wait() on child followed by a sys_ptrace
|call, the following race can happen:
|
|- child sets __TASK_TRACED in ptrace_stop()
|- parent does wait() which eventually calls wait_task_stopped() and returns
| child's pid
|- child blocks on read_lock(&tasklist_lock) in ptrace_stop() and moves
| __TASK_TRACED flag to saved_state
|- parent calls sys_ptrace, which calls ptrace_check_attach() and wait_task_inactive()
The patch is based on his initial patch where an additional check is
added in case the __TASK_TRACED moved to ->saved_state. The pi_lock is
taken in case the caller is interrupted between looking into ->state and
->saved_state.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
122/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: kernel/sched: add {put|get}_cpu_light()
Date: Sat, 27 May 2017 19:02:06 +0200
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
123/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: trace: Add migrate-disabled counter to tracing output
Date: Sun, 17 Jul 2011 21:56:42 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
124/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locking: don't check for __LINUX_SPINLOCK_TYPES_H on -RT archs
Date: Fri, 4 Aug 2017 17:40:42 +0200
Upstream uses arch_spinlock_t within spinlock_t and requests that
spinlock_types.h header file is included first.
On -RT we have the rt_mutex with its raw_lock wait_lock which needs
architectures' spinlock_types.h header file for its definition. However
we need rt_mutex first because it is used to build the spinlock_t so
that check does not work for us.
Therefore I am dropping that check.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
125/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locking: Make spinlock_t and rwlock_t a RCU section on RT
Date: Tue, 19 Nov 2019 09:25:04 +0100
On !RT a locked spinlock_t and rwlock_t disables preemption which
implies a RCU read section. There is code that relies on that behaviour.
Add an explicit RCU read section on RT while a sleeping lock (a lock
which would disables preemption on !RT) acquired.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
126/194 [
Author: Scott Wood
Email: swood@redhat.com
Subject: rcu: Use rcuc threads on PREEMPT_RT as we did
Date: Wed, 11 Sep 2019 17:57:28 +0100
While switching to the reworked RCU-thread code, it has been forgotten
to enable the thread processing on -RT.
Besides restoring behavior that used to be default on RT, this avoids
a deadlock on scheduler locks.
Signed-off-by: Scott Wood <swood@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
127/194 [
Author: Julia Cartwright
Email: julia@ni.com
Subject: rcu: enable rcu_normal_after_boot by default for RT
Date: Wed, 12 Oct 2016 11:21:14 -0500
The forcing of an expedited grace period is an expensive and very
RT-application unfriendly operation, as it forcibly preempts all running
tasks on CPUs which are preventing the gp from expiring.
By default, as a policy decision, disable the expediting of grace
periods (after boot) on configurations which enable PREEMPT_RT.
Suggested-by: Luiz Capitulino <lcapitulino@redhat.com>
Acked-by: Paul E. McKenney <paulmck@linux.ibm.com>
Signed-off-by: Julia Cartwright <julia@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
128/194 [
Author: Scott Wood
Email: swood@redhat.com
Subject: rcutorture: Avoid problematic critical section nesting on RT
Date: Wed, 11 Sep 2019 17:57:29 +0100
rcutorture was generating some nesting scenarios that are not
reasonable. Constrain the state selection to avoid them.
Example #1:
1. preempt_disable()
2. local_bh_disable()
3. preempt_enable()
4. local_bh_enable()
On PREEMPT_RT, BH disabling takes a local lock only when called in
non-atomic context. Thus, atomic context must be retained until after BH
is re-enabled. Likewise, if BH is initially disabled in non-atomic
context, it cannot be re-enabled in atomic context.
Example #2:
1. rcu_read_lock()
2. local_irq_disable()
3. rcu_read_unlock()
4. local_irq_enable()
If the thread is preempted between steps 1 and 2,
rcu_read_unlock_special.b.blocked will be set, but it won't be
acted on in step 3 because IRQs are disabled. Thus, reporting of the
quiescent state will be delayed beyond the local_irq_enable().
For now, these scenarios will continue to be tested on non-PREEMPT_RT
kernels, until debug checks are added to ensure that they are not
happening elsewhere.
Signed-off-by: Scott Wood <swood@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
129/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mm/vmalloc: Another preempt disable region which sucks
Date: Tue, 12 Jul 2011 11:39:36 +0200
Avoid the preempt disable version of get_cpu_var(). The inner-lock should
provide enough serialisation.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
130/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: block/mq: do not invoke preempt_disable()
Date: Tue, 14 Jul 2015 14:26:34 +0200
preempt_disable() and get_cpu() don't play well together with the sleeping
locks it tries to allocate later.
It seems to be enough to replace it with get_cpu_light() and migrate_disable().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
131/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: md: raid5: Make raid5_percpu handling RT aware
Date: Tue, 6 Apr 2010 16:51:31 +0200
__raid_run_ops() disables preemption with get_cpu() around the access
to the raid5_percpu variables. That causes scheduling while atomic
spews on RT.
Serialize the access to the percpu data with a lock and keep the code
preemptible.
Reported-by: Udo van den Heuvel <udovdh@xs4all.nl>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Udo van den Heuvel <udovdh@xs4all.nl>
]
132/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: scsi/fcoe: Make RT aware.
Date: Sat, 12 Nov 2011 14:00:48 +0100
Do not disable preemption while taking sleeping locks. All user look safe
for migrate_diable() only.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
133/194 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: sunrpc: Make svc_xprt_do_enqueue() use get_cpu_light()
Date: Wed, 18 Feb 2015 16:05:28 +0100
|BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:915
|in_atomic(): 1, irqs_disabled(): 0, pid: 3194, name: rpc.nfsd
|Preemption disabled at:[<ffffffffa06bf0bb>] svc_xprt_received+0x4b/0xc0 [sunrpc]
|CPU: 6 PID: 3194 Comm: rpc.nfsd Not tainted 3.18.7-rt1 #9
|Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.404 11/06/2014
| ffff880409630000 ffff8800d9a33c78 ffffffff815bdeb5 0000000000000002
| 0000000000000000 ffff8800d9a33c98 ffffffff81073c86 ffff880408dd6008
| ffff880408dd6000 ffff8800d9a33cb8 ffffffff815c3d84 ffff88040b3ac000
|Call Trace:
| [<ffffffff815bdeb5>] dump_stack+0x4f/0x9e
| [<ffffffff81073c86>] __might_sleep+0xe6/0x150
| [<ffffffff815c3d84>] rt_spin_lock+0x24/0x50
| [<ffffffffa06beec0>] svc_xprt_do_enqueue+0x80/0x230 [sunrpc]
| [<ffffffffa06bf0bb>] svc_xprt_received+0x4b/0xc0 [sunrpc]
| [<ffffffffa06c03ed>] svc_add_new_perm_xprt+0x6d/0x80 [sunrpc]
| [<ffffffffa06b2693>] svc_addsock+0x143/0x200 [sunrpc]
| [<ffffffffa072e69c>] write_ports+0x28c/0x340 [nfsd]
| [<ffffffffa072d2ac>] nfsctl_transaction_write+0x4c/0x80 [nfsd]
| [<ffffffff8117ee83>] vfs_write+0xb3/0x1d0
| [<ffffffff8117f889>] SyS_write+0x49/0xb0
| [<ffffffff815c4556>] system_call_fastpath+0x16/0x1b
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
134/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rt: Introduce cpu_chill()
Date: Wed, 7 Mar 2012 20:51:03 +0100
Retry loops on RT might loop forever when the modifying side was
preempted. Add cpu_chill() to replace cpu_relax(). cpu_chill()
defaults to cpu_relax() for non RT. On RT it puts the looping task to
sleep for a tick so the preempted task can make progress.
Steven Rostedt changed it to use a hrtimer instead of msleep():
|
|Ulrich Obergfell pointed out that cpu_chill() calls msleep() which is woken
|up by the ksoftirqd running the TIMER softirq. But as the cpu_chill() is
|called from softirq context, it may block the ksoftirqd() from running, in
|which case, it may never wake up the msleep() causing the deadlock.
+ bigeasy later changed to schedule_hrtimeout()
|If a task calls cpu_chill() and gets woken up by a regular or spurious
|wakeup and has a signal pending, then it exits the sleep loop in
|do_nanosleep() and sets up the restart block. If restart->nanosleep.type is
|not TI_NONE then this results in accessing a stale user pointer from a
|previously interrupted syscall and a copy to user based on the stale
|pointer or a BUG() when 'type' is not supported in nanosleep_copyout().
+ bigeasy: add PF_NOFREEZE:
| [....] Waiting for /dev to be fully populated...
| =====================================
| [ BUG: udevd/229 still has locks held! ]
| 3.12.11-rt17 #23 Not tainted
| -------------------------------------
| 1 lock held by udevd/229:
| #0: (&type->i_mutex_dir_key#2){+.+.+.}, at: lookup_slow+0x28/0x98
|
| stack backtrace:
| CPU: 0 PID: 229 Comm: udevd Not tainted 3.12.11-rt17 #23
| (unwind_backtrace+0x0/0xf8) from (show_stack+0x10/0x14)
| (show_stack+0x10/0x14) from (dump_stack+0x74/0xbc)
| (dump_stack+0x74/0xbc) from (do_nanosleep+0x120/0x160)
| (do_nanosleep+0x120/0x160) from (hrtimer_nanosleep+0x90/0x110)
| (hrtimer_nanosleep+0x90/0x110) from (cpu_chill+0x30/0x38)
| (cpu_chill+0x30/0x38) from (dentry_kill+0x158/0x1ec)
| (dentry_kill+0x158/0x1ec) from (dput+0x74/0x15c)
| (dput+0x74/0x15c) from (lookup_real+0x4c/0x50)
| (lookup_real+0x4c/0x50) from (__lookup_hash+0x34/0x44)
| (__lookup_hash+0x34/0x44) from (lookup_slow+0x38/0x98)
| (lookup_slow+0x38/0x98) from (path_lookupat+0x208/0x7fc)
| (path_lookupat+0x208/0x7fc) from (filename_lookup+0x20/0x60)
| (filename_lookup+0x20/0x60) from (user_path_at_empty+0x50/0x7c)
| (user_path_at_empty+0x50/0x7c) from (user_path_at+0x14/0x1c)
| (user_path_at+0x14/0x1c) from (vfs_fstatat+0x48/0x94)
| (vfs_fstatat+0x48/0x94) from (SyS_stat64+0x14/0x30)
| (SyS_stat64+0x14/0x30) from (ret_fast_syscall+0x0/0x48)
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
135/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: fs: namespace: Use cpu_chill() in trylock loops
Date: Wed, 7 Mar 2012 21:00:34 +0100
Retry loops on RT might loop forever when the modifying side was
preempted. Use cpu_chill() instead of cpu_relax() to let the system
make progress.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
136/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: debugobjects: Make RT aware
Date: Sun, 17 Jul 2011 21:41:35 +0200
Avoid filling the pool / allocating memory with irqs off().
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
137/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: net: Use skbufhead with raw lock
Date: Tue, 12 Jul 2011 15:38:34 +0200
Use the rps lock as rawlock so we can keep irq-off regions. It looks low
latency. However we can't kfree() from this context therefore we defer this
to the softirq and use the tofree_queue list for it (similar to process_queue).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
138/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net: Dequeue in dev_cpu_dead() without the lock
Date: Wed, 16 Sep 2020 16:15:39 +0200
Upstream uses skb_dequeue() to acquire lock of `input_pkt_queue'. The reason is
to synchronize against a remote CPU which still thinks that the CPU is online
enqueues packets to this CPU.
There are no guarantees that the packet is enqueued before the callback is run,
it just hope.
RT however complains about an not initialized lock because it uses another lock
for `input_pkt_queue' due to the IRQ-off nature of the context.
Use the unlocked dequeue version for `input_pkt_queue'.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
139/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net: dev: always take qdisc's busylock in __dev_xmit_skb()
Date: Wed, 30 Mar 2016 13:36:29 +0200
The root-lock is dropped before dev_hard_start_xmit() is invoked and after
setting the __QDISC___STATE_RUNNING bit. If this task is now pushed away
by a task with a higher priority then the task with the higher priority
won't be able to submit packets to the NIC directly instead they will be
enqueued into the Qdisc. The NIC will remain idle until the task(s) with
higher priority leave the CPU and the task with lower priority gets back
and finishes the job.
If we take always the busylock we ensure that the RT task can boost the
low-prio task and submit the packet.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
140/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: irqwork: push most work into softirq context
Date: Tue, 23 Jun 2015 15:32:51 +0200
Initially we defered all irqwork into softirq because we didn't want the
latency spikes if perf or another user was busy and delayed the RT task.
The NOHZ trigger (nohz_full_kick_work) was the first user that did not work
as expected if it did not run in the original irqwork context so we had to
bring it back somehow for it. push_irq_work_func is the second one that
requires this.
This patch adds the IRQ_WORK_HARD_IRQ which makes sure the callback runs
in raw-irq context. Everything else is defered into softirq context. Without
-RT we have the orignal behavior.
This patch incorporates tglx orignal work which revoked a little bringing back
the arch_irq_work_raise() if possible and a few fixes from Steven Rostedt and
Mike Galbraith,
[bigeasy: melt tglx's irq_work_tick_soft() which splits irq_work_tick() into a
hard and soft variant]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
141/194 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: x86: crypto: Reduce preempt disabled regions
Date: Mon, 14 Nov 2011 18:19:27 +0100
Restrict the preempt disabled regions to the actual floating point
operations and enable preemption for the administrative actions.
This is necessary on RT to avoid that kfree and other operations are
called with preemption disabled.
Reported-and-tested-by: Carsten Emde <cbe@osadl.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
142/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: crypto: Reduce preempt disabled regions, more algos
Date: Fri, 21 Feb 2014 17:24:04 +0100
Don Estabrook reported
| kernel: WARNING: CPU: 2 PID: 858 at kernel/sched/core.c:2428 migrate_disable+0xed/0x100()
| kernel: WARNING: CPU: 2 PID: 858 at kernel/sched/core.c:2462 migrate_enable+0x17b/0x200()
| kernel: WARNING: CPU: 3 PID: 865 at kernel/sched/core.c:2428 migrate_disable+0xed/0x100()
and his backtrace showed some crypto functions which looked fine.
The problem is the following sequence:
glue_xts_crypt_128bit()
{
blkcipher_walk_virt(); /* normal migrate_disable() */
glue_fpu_begin(); /* get atomic */
while (nbytes) {
__glue_xts_crypt_128bit();
blkcipher_walk_done(); /* with nbytes = 0, migrate_enable()
* while we are atomic */
};
glue_fpu_end() /* no longer atomic */
}
and this is why the counter get out of sync and the warning is printed.
The other problem is that we are non-preemptible between
glue_fpu_begin() and glue_fpu_end() and the latency grows. To fix this,
I shorten the FPU off region and ensure blkcipher_walk_done() is called
with preemption enabled. This might hurt the performance because we now
enable/disable the FPU state more often but we gain lower latency and
the bug is gone.
Reported-by: Don Estabrook <don.estabrook@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
143/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: crypto: limit more FPU-enabled sections
Date: Thu, 30 Nov 2017 13:40:10 +0100
Those crypto drivers use SSE/AVX/… for their crypto work and in order to
do so in kernel they need to enable the "FPU" in kernel mode which
disables preemption.
There are two problems with the way they are used:
- the while loop which processes X bytes may create latency spikes and
should be avoided or limited.
- the cipher-walk-next part may allocate/free memory and may use
kmap_atomic().
The whole kernel_fpu_begin()/end() processing isn't probably that cheap.
It most likely makes sense to process as much of those as possible in one
go. The new *_fpu_sched_rt() schedules only if a RT task is pending.
Probably we should measure the performance those ciphers in pure SW
mode and with this optimisations to see if it makes sense to keep them
for RT.
This kernel_fpu_resched() makes the code more preemptible which might hurt
performance.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
144/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: crypto: cryptd - add a lock instead preempt_disable/local_bh_disable
Date: Thu, 26 Jul 2018 18:52:00 +0200
cryptd has a per-CPU lock which protected with local_bh_disable() and
preempt_disable().
Add an explicit spin_lock to make the locking context more obvious and
visible to lockdep. Since it is a per-CPU lock, there should be no lock
contention on the actual spinlock.
There is a small race-window where we could be migrated to another CPU
after the cpu_queue has been obtain. This is not a problem because the
actual ressource is protected by the spinlock.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
145/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: panic: skip get_random_bytes for RT_FULL in init_oops_id
Date: Tue, 14 Jul 2015 14:26:34 +0200
Disable on -RT. If this is invoked from irq-context we will have problems
to acquire the sleeping lock.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
146/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: x86: stackprotector: Avoid random pool on rt
Date: Thu, 16 Dec 2010 14:25:18 +0100
CPU bringup calls into the random pool to initialize the stack
canary. During boot that works nicely even on RT as the might sleep
checks are disabled. During CPU hotplug the might sleep checks
trigger. Making the locks in random raw is a major PITA, so avoid the
call on RT is the only sensible solution. This is basically the same
randomness which we get during boot where the random pool has no
entropy and we rely on the TSC randomnness.
Reported-by: Carsten Emde <carsten.emde@osadl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
147/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: random: Make it work on rt
Date: Tue, 21 Aug 2012 20:38:50 +0200
Delegate the random insertion to the forced threaded interrupt
handler. Store the return IP of the hard interrupt handler in the irq
descriptor and feed it into the random generator as a source of
entropy.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
148/194 [
Author: Priyanka Jain
Email: Priyanka.Jain@freescale.com
Subject: net: Remove preemption disabling in netif_rx()
Date: Thu, 17 May 2012 09:35:11 +0530
1)enqueue_to_backlog() (called from netif_rx) should be
bind to a particluar CPU. This can be achieved by
disabling migration. No need to disable preemption
2)Fixes crash "BUG: scheduling while atomic: ksoftirqd"
in case of RT.
If preemption is disabled, enqueue_to_backog() is called
in atomic context. And if backlog exceeds its count,
kfree_skb() is called. But in RT, kfree_skb() might
gets scheduled out, so it expects non atomic context.
3)When CONFIG_PREEMPT_RT is not defined,
migrate_enable(), migrate_disable() maps to
preempt_enable() and preempt_disable(), so no
change in functionality in case of non-RT.
-Replace preempt_enable(), preempt_disable() with
migrate_enable(), migrate_disable() respectively
-Replace get_cpu(), put_cpu() with get_cpu_light(),
put_cpu_light() respectively
Signed-off-by: Priyanka Jain <Priyanka.Jain@freescale.com>
Acked-by: Rajan Srivastava <Rajan.Srivastava@freescale.com>
Cc: <rostedt@goodmis.orgn>
Link: http://lkml.kernel.org/r/1337227511-2271-1-git-send-email-Priyanka.Jain@freescale.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
149/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: lockdep: Make it RT aware
Date: Sun, 17 Jul 2011 18:51:23 +0200
teach lockdep that we don't really do softirqs on -RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
150/194 [
Author: Yong Zhang
Email: yong.zhang@windriver.com
Subject: lockdep: selftest: Only do hardirq context test for raw spinlock
Date: Mon, 16 Apr 2012 15:01:56 +0800
On -rt there is no softirq context any more and rwlock is sleepable,
disable softirq context test and rwlock+irq test.
Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Cc: Yong Zhang <yong.zhang@windriver.com>
Link: http://lkml.kernel.org/r/1334559716-18447-3-git-send-email-yong.zhang0@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
151/194 [
Author: Josh Cartwright
Email: josh.cartwright@ni.com
Subject: lockdep: selftest: fix warnings due to missing PREEMPT_RT conditionals
Date: Wed, 28 Jan 2015 13:08:45 -0600
"lockdep: Selftest: Only do hardirq context test for raw spinlock"
disabled the execution of certain tests with PREEMPT_RT, but did
not prevent the tests from still being defined. This leads to warnings
like:
./linux/lib/locking-selftest.c:574:1: warning: 'irqsafe1_hard_rlock_12' defined but not used [-Wunused-function]
./linux/lib/locking-selftest.c:574:1: warning: 'irqsafe1_hard_rlock_21' defined but not used [-Wunused-function]
./linux/lib/locking-selftest.c:577:1: warning: 'irqsafe1_hard_wlock_12' defined but not used [-Wunused-function]
./linux/lib/locking-selftest.c:577:1: warning: 'irqsafe1_hard_wlock_21' defined but not used [-Wunused-function]
./linux/lib/locking-selftest.c:580:1: warning: 'irqsafe1_soft_spin_12' defined but not used [-Wunused-function]
...
Fixed by wrapping the test definitions in #ifndef CONFIG_PREEMPT_RT
conditionals.
Signed-off-by: Josh Cartwright <josh.cartwright@ni.com>
Signed-off-by: Xander Huff <xander.huff@ni.com>
Acked-by: Gratian Crisan <gratian.crisan@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
152/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: lockdep: disable self-test
Date: Tue, 17 Oct 2017 16:36:18 +0200
The self-test wasn't always 100% accurate for RT. We disabled a few
tests which failed because they had a different semantic for RT. Some
still reported false positives. Now the selftest locks up the system
during boot and it needs to be investigated…
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
153/194 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: drm,radeon,i915: Use preempt_disable/enable_rt() where recommended
Date: Sat, 27 Feb 2016 08:09:11 +0100
DRM folks identified the spots, so use them.
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-rt-users <linux-rt-users@vger.kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
154/194 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: drm/i915: Don't disable interrupts on PREEMPT_RT during atomic updates
Date: Sat, 27 Feb 2016 09:01:42 +0100
Commit
8d7849db3eab7 ("drm/i915: Make sprite updates atomic")
started disabling interrupts across atomic updates. This breaks on PREEMPT_RT
because within this section the code attempt to acquire spinlock_t locks which
are sleeping locks on PREEMPT_RT.
According to the comment the interrupts are disabled to avoid random delays and
not required for protection or synchronisation.
Don't disable interrupts on PREEMPT_RT during atomic updates.
[bigeasy: drop local locks, commit message]
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
155/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: drm/i915: disable tracing on -RT
Date: Thu, 6 Dec 2018 09:52:20 +0100
Luca Abeni reported this:
| BUG: scheduling while atomic: kworker/u8:2/15203/0x00000003
| CPU: 1 PID: 15203 Comm: kworker/u8:2 Not tainted 4.19.1-rt3 #10
| Call Trace:
| rt_spin_lock+0x3f/0x50
| gen6_read32+0x45/0x1d0 [i915]
| g4x_get_vblank_counter+0x36/0x40 [i915]
| trace_event_raw_event_i915_pipe_update_start+0x7d/0xf0 [i915]
The tracing events use trace_i915_pipe_update_start() among other events
use functions acquire spin locks. A few trace points use
intel_get_crtc_scanline(), others use ->get_vblank_counter() wich also
might acquire a sleeping lock.
Based on this I don't see any other way than disable trace points on RT.
Cc: stable-rt@vger.kernel.org
Reported-by: Luca Abeni <lucabe72@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
156/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: drm/i915: skip DRM_I915_LOW_LEVEL_TRACEPOINTS with NOTRACE
Date: Wed, 19 Dec 2018 10:47:02 +0100
The order of the header files is important. If this header file is
included after tracepoint.h was included then the NOTRACE here becomes a
nop. Currently this happens for two .c files which use the tracepoitns
behind DRM_I915_LOW_LEVEL_TRACEPOINTS.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
157/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: drm/i915/gt: Only disable interrupts for the timeline lock on !force-threaded
Date: Tue, 7 Jul 2020 12:25:11 +0200
According to commit
d67739268cf0e ("drm/i915/gt: Mark up the nested engine-pm timeline lock as irqsafe")
the intrrupts are disabled the code may be called from an interrupt
handler and from preemptible context.
With `force_irqthreads' set the timeline mutex is never observed in IRQ
context so it is not neede to disable interrupts.
Disable only interrupts if not in `force_irqthreads' mode.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
158/194 [
Author: Mike Galbraith
Email: efault@gmx.de
Subject: cpuset: Convert callback_lock to raw_spinlock_t
Date: Sun, 8 Jan 2017 09:32:25 +0100
The two commits below add up to a cpuset might_sleep() splat for RT:
8447a0fee974 cpuset: convert callback_mutex to a spinlock
344736f29b35 cpuset: simplify cpuset_node_allowed API
BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:995
in_atomic(): 0, irqs_disabled(): 1, pid: 11718, name: cset
CPU: 135 PID: 11718 Comm: cset Tainted: G E 4.10.0-rt1-rt #4
Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0056.R01.1409242327 09/24/2014
Call Trace:
? dump_stack+0x5c/0x81
? ___might_sleep+0xf4/0x170
? rt_spin_lock+0x1c/0x50
? __cpuset_node_allowed+0x66/0xc0
? ___slab_alloc+0x390/0x570 <disables IRQs>
? anon_vma_fork+0x8f/0x140
? copy_page_range+0x6cf/0xb00
? anon_vma_fork+0x8f/0x140
? __slab_alloc.isra.74+0x5a/0x81
? anon_vma_fork+0x8f/0x140
? kmem_cache_alloc+0x1b5/0x1f0
? anon_vma_fork+0x8f/0x140
? copy_process.part.35+0x1670/0x1ee0
? _do_fork+0xdd/0x3f0
? _do_fork+0xdd/0x3f0
? do_syscall_64+0x61/0x170
? entry_SYSCALL64_slow_path+0x25/0x25
The later ensured that a NUMA box WILL take callback_lock in atomic
context by removing the allocator and reclaim path __GFP_HARDWALL
usage which prevented such contexts from taking callback_mutex.
One option would be to reinstate __GFP_HARDWALL protections for
RT, however, as the 8447a0fee974 changelog states:
The callback_mutex is only used to synchronize reads/updates of cpusets'
flags and cpu/node masks. These operations should always proceed fast so
there's no reason why we can't use a spinlock instead of the mutex.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
159/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: x86: Allow to enable RT
Date: Wed, 7 Aug 2019 18:15:38 +0200
Allow to select RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
160/194 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: mm, rt: kmap_atomic scheduling
Date: Thu, 28 Jul 2011 10:43:51 +0200
In fact, with migrate_disable() existing one could play games with
kmap_atomic. You could save/restore the kmap_atomic slots on context
switch (if there are any in use of course), this should be esp easy now
that we have a kmap_atomic stack.
Something like the below.. it wants replacing all the preempt_disable()
stuff with pagefault_disable() && migrate_disable() of course, but then
you can flip kmaps around like below.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
[dvhart@linux.intel.com: build fix]
Link: http://lkml.kernel.org/r/1311842631.5890.208.camel@twins
[tglx@linutronix.de: Get rid of the per cpu variable and store the idx
and the pte content right away in the task struct.
Shortens the context switch code. ]
]
161/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: x86/highmem: Add a "already used pte" check
Date: Mon, 11 Mar 2013 17:09:55 +0100
This is a copy from kmap_atomic_prot().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
162/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: arm/highmem: Flush tlb on unmap
Date: Mon, 11 Mar 2013 21:37:27 +0100
The tlb should be flushed on unmap and thus make the mapping entry
invalid. This is only done in the non-debug case which does not look
right.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
163/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: arm: Enable highmem for rt
Date: Wed, 13 Feb 2013 11:03:11 +0100
fixup highmem for ARM.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
164/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mm/scatterlist: Do not disable irqs on RT
Date: Fri, 3 Jul 2009 08:44:34 -0500
For -RT it is enough to keep pagefault disabled (which is currently handled by
kmap_atomic()).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
165/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Add support for lazy preemption
Date: Fri, 26 Oct 2012 18:50:54 +0100
It has become an obsession to mitigate the determinism vs. throughput
loss of RT. Looking at the mainline semantics of preemption points
gives a hint why RT sucks throughput wise for ordinary SCHED_OTHER
tasks. One major issue is the wakeup of tasks which are right away
preempting the waking task while the waking task holds a lock on which
the woken task will block right after having preempted the wakee. In
mainline this is prevented due to the implicit preemption disable of
spin/rw_lock held regions. On RT this is not possible due to the fully
preemptible nature of sleeping spinlocks.
Though for a SCHED_OTHER task preempting another SCHED_OTHER task this
is really not a correctness issue. RT folks are concerned about
SCHED_FIFO/RR tasks preemption and not about the purely fairness
driven SCHED_OTHER preemption latencies.
So I introduced a lazy preemption mechanism which only applies to
SCHED_OTHER tasks preempting another SCHED_OTHER task. Aside of the
existing preempt_count each tasks sports now a preempt_lazy_count
which is manipulated on lock acquiry and release. This is slightly
incorrect as for lazyness reasons I coupled this on
migrate_disable/enable so some other mechanisms get the same treatment
(e.g. get_cpu_light).
Now on the scheduler side instead of setting NEED_RESCHED this sets
NEED_RESCHED_LAZY in case of a SCHED_OTHER/SCHED_OTHER preemption and
therefor allows to exit the waking task the lock held region before
the woken task preempts. That also works better for cross CPU wakeups
as the other side can stay in the adaptive spinning loop.
For RT class preemption there is no change. This simply sets
NEED_RESCHED and forgoes the lazy preemption counter.
Initial test do not expose any observable latency increasement, but
history shows that I've been proven wrong before :)
The lazy preemption mode is per default on, but with
CONFIG_SCHED_DEBUG enabled it can be disabled via:
# echo NO_PREEMPT_LAZY >/sys/kernel/debug/sched_features
and reenabled via
# echo PREEMPT_LAZY >/sys/kernel/debug/sched_features
The test results so far are very machine and workload dependent, but
there is a clear trend that it enhances the non RT workload
performance.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
166/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: x86/entry: Use should_resched() in idtentry_exit_cond_resched()
Date: Tue, 30 Jun 2020 11:45:14 +0200
The TIF_NEED_RESCHED bit is inlined on x86 into the preemption counter.
By using should_resched(0) instead of need_resched() the same check can
be performed which uses the same variable as 'preempt_count()` which was
issued before.
Use should_resched(0) instead need_resched().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
167/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: x86: Support for lazy preemption
Date: Thu, 1 Nov 2012 11:03:47 +0100
Implement the x86 pieces for lazy preempt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
168/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: arm: Add support for lazy preemption
Date: Wed, 31 Oct 2012 12:04:11 +0100
Implement the arm pieces for lazy preempt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
169/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: powerpc: Add support for lazy preemption
Date: Thu, 1 Nov 2012 10:14:11 +0100
Implement the powerpc pieces for lazy preempt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
170/194 [
Author: Anders Roxell
Email: anders.roxell@linaro.org
Subject: arch/arm64: Add lazy preempt support
Date: Thu, 14 May 2015 17:52:17 +0200
arm64 is missing support for PREEMPT_RT. The main feature which is
lacking is support for lazy preemption. The arch-specific entry code,
thread information structure definitions, and associated data tables
have to be extended to provide this support. Then the Kconfig file has
to be extended to indicate the support is available, and also to
indicate that support for full RT preemption is now available.
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
]
171/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: jump-label: disable if stop_machine() is used
Date: Wed, 8 Jul 2015 17:14:48 +0200
Some architectures are using stop_machine() while switching the opcode which
leads to latency spikes.
The architectures which use stop_machine() atm:
- ARM stop machine
- s390 stop machine
The architecures which use other sorcery:
- MIPS
- X86
- powerpc
- sparc
- arm64
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy: only ARM for now]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
172/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: leds: trigger: disable CPU trigger on -RT
Date: Thu, 23 Jan 2014 14:45:59 +0100
as it triggers:
|CPU: 0 PID: 0 Comm: swapper Not tainted 3.12.8-rt10 #141
|[<c0014aa4>] (unwind_backtrace+0x0/0xf8) from [<c0012788>] (show_stack+0x1c/0x20)
|[<c0012788>] (show_stack+0x1c/0x20) from [<c043c8dc>] (dump_stack+0x20/0x2c)
|[<c043c8dc>] (dump_stack+0x20/0x2c) from [<c004c5e8>] (__might_sleep+0x13c/0x170)
|[<c004c5e8>] (__might_sleep+0x13c/0x170) from [<c043f270>] (__rt_spin_lock+0x28/0x38)
|[<c043f270>] (__rt_spin_lock+0x28/0x38) from [<c043fa00>] (rt_read_lock+0x68/0x7c)
|[<c043fa00>] (rt_read_lock+0x68/0x7c) from [<c036cf74>] (led_trigger_event+0x2c/0x5c)
|[<c036cf74>] (led_trigger_event+0x2c/0x5c) from [<c036e0bc>] (ledtrig_cpu+0x54/0x5c)
|[<c036e0bc>] (ledtrig_cpu+0x54/0x5c) from [<c000ffd8>] (arch_cpu_idle_exit+0x18/0x1c)
|[<c000ffd8>] (arch_cpu_idle_exit+0x18/0x1c) from [<c00590b8>] (cpu_startup_entry+0xa8/0x234)
|[<c00590b8>] (cpu_startup_entry+0xa8/0x234) from [<c043b2cc>] (rest_init+0xb8/0xe0)
|[<c043b2cc>] (rest_init+0xb8/0xe0) from [<c061ebe0>] (start_kernel+0x2c4/0x380)
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
173/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: tty/serial/omap: Make the locking RT aware
Date: Thu, 28 Jul 2011 13:32:57 +0200
The lock is a sleeping lock and local_irq_save() is not the
optimsation we are looking for. Redo it to make it work on -RT and
non-RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
174/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: tty/serial/pl011: Make the locking work on RT
Date: Tue, 8 Jan 2013 21:36:51 +0100
The lock is a sleeping lock and local_irq_save() is not the optimsation
we are looking for. Redo it to make it work on -RT and non-RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
175/194 [
Author: Yadi.hu
Email: yadi.hu@windriver.com
Subject: ARM: enable irq in translation/section permission fault handlers
Date: Wed, 10 Dec 2014 10:32:09 +0800
Probably happens on all ARM, with
CONFIG_PREEMPT_RT
CONFIG_DEBUG_ATOMIC_SLEEP
This simple program....
int main() {
*((char*)0xc0001000) = 0;
};
[ 512.742724] BUG: sleeping function called from invalid context at kernel/rtmutex.c:658
[ 512.743000] in_atomic(): 0, irqs_disabled(): 128, pid: 994, name: a
[ 512.743217] INFO: lockdep is turned off.
[ 512.743360] irq event stamp: 0
[ 512.743482] hardirqs last enabled at (0): [< (null)>] (null)
[ 512.743714] hardirqs last disabled at (0): [<c0426370>] copy_process+0x3b0/0x11c0
[ 512.744013] softirqs last enabled at (0): [<c0426370>] copy_process+0x3b0/0x11c0
[ 512.744303] softirqs last disabled at (0): [< (null)>] (null)
[ 512.744631] [<c041872c>] (unwind_backtrace+0x0/0x104)
[ 512.745001] [<c09af0c4>] (dump_stack+0x20/0x24)
[ 512.745355] [<c0462490>] (__might_sleep+0x1dc/0x1e0)
[ 512.745717] [<c09b6770>] (rt_spin_lock+0x34/0x6c)
[ 512.746073] [<c0441bf0>] (do_force_sig_info+0x34/0xf0)
[ 512.746457] [<c0442668>] (force_sig_info+0x18/0x1c)
[ 512.746829] [<c041d880>] (__do_user_fault+0x9c/0xd8)
[ 512.747185] [<c041d938>] (do_bad_area+0x7c/0x94)
[ 512.747536] [<c041d990>] (do_sect_fault+0x40/0x48)
[ 512.747898] [<c040841c>] (do_DataAbort+0x40/0xa0)
[ 512.748181] Exception stack(0xecaa1fb0 to 0xecaa1ff8)
Oxc0000000 belongs to kernel address space, user task can not be
allowed to access it. For above condition, correct result is that
test case should receive a “segment fault” and exits but not stacks.
the root cause is commit 02fe2845d6a8 ("avoid enabling interrupts in
prefetch/data abort handlers"),it deletes irq enable block in Data
abort assemble code and move them into page/breakpiont/alignment fault
handlers instead. But author does not enable irq in translation/section
permission fault handlers. ARM disables irq when it enters exception/
interrupt mode, if kernel doesn't enable irq, it would be still disabled
during translation/section permission fault.
We see the above splat because do_force_sig_info is still called with
IRQs off, and that code eventually does a:
spin_lock_irqsave(&t->sighand->siglock, flags);
As this is architecture independent code, and we've not seen any other
need for other arch to have the siglock converted to raw lock, we can
conclude that we should enable irq for ARM translation/section
permission exception.
Signed-off-by: Yadi.hu <yadi.hu@windriver.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
176/194 [
Author: Josh Cartwright
Email: joshc@ni.com
Subject: genirq: update irq_set_irqchip_state documentation
Date: Thu, 11 Feb 2016 11:54:00 -0600
On -rt kernels, the use of migrate_disable()/migrate_enable() is
sufficient to guarantee a task isn't moved to another CPU. Update the
irq_set_irqchip_state() documentation to reflect this.
Signed-off-by: Josh Cartwright <joshc@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
177/194 [
Author: Josh Cartwright
Email: joshc@ni.com
Subject: KVM: arm/arm64: downgrade preempt_disable()d region to migrate_disable()
Date: Thu, 11 Feb 2016 11:54:01 -0600
kvm_arch_vcpu_ioctl_run() disables the use of preemption when updating
the vgic and timer states to prevent the calling task from migrating to
another CPU. It does so to prevent the task from writing to the
incorrect per-CPU GIC distributor registers.
On -rt kernels, it's possible to maintain the same guarantee with the
use of migrate_{disable,enable}(), with the added benefit that the
migrate-disabled region is preemptible. Update
kvm_arch_vcpu_ioctl_run() to do so.
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Reported-by: Manish Jaggi <Manish.Jaggi@caviumnetworks.com>
Signed-off-by: Josh Cartwright <joshc@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
178/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: arm64: fpsimd: Delay freeing memory in fpsimd_flush_thread()
Date: Wed, 25 Jul 2018 14:02:38 +0200
fpsimd_flush_thread() invokes kfree() via sve_free() within a preempt disabled
section which is not working on -RT.
Delay freeing of memory until preemption is enabled again.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
179/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: x86: Enable RT also on 32bit
Date: Thu, 7 Nov 2019 17:49:20 +0100
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
180/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: ARM: Allow to enable RT
Date: Fri, 11 Oct 2019 13:14:29 +0200
Allow to select RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
181/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: ARM64: Allow to enable RT
Date: Fri, 11 Oct 2019 13:14:35 +0200
Allow to select RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
182/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: powerpc/pseries/iommu: Use a locallock instead local_irq_save()
Date: Tue, 26 Mar 2019 18:31:54 +0100
The locallock protects the per-CPU variable tce_page. The function
attempts to allocate memory while tce_page is protected (by disabling
interrupts).
Use local_irq_save() instead of local_irq_disable().
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
183/194 [
Author: Bogdan Purcareata
Email: bogdan.purcareata@freescale.com
Subject: powerpc/kvm: Disable in-kernel MPIC emulation for PREEMPT_RT
Date: Fri, 24 Apr 2015 15:53:13 +0000
While converting the openpic emulation code to use a raw_spinlock_t enables
guests to run on RT, there's still a performance issue. For interrupts sent in
directed delivery mode with a multiple CPU mask, the emulated openpic will loop
through all of the VCPUs, and for each VCPUs, it call IRQ_check, which will loop
through all the pending interrupts for that VCPU. This is done while holding the
raw_lock, meaning that in all this time the interrupts and preemption are
disabled on the host Linux. A malicious user app can max both these number and
cause a DoS.
This temporary fix is sent for two reasons. First is so that users who want to
use the in-kernel MPIC emulation are aware of the potential latencies, thus
making sure that the hardware MPIC and their usage scenario does not involve
interrupts sent in directed delivery mode, and the number of possible pending
interrupts is kept small. Secondly, this should incentivize the development of a
proper openpic emulation that would be better suited for RT.
Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Bogdan Purcareata <bogdan.purcareata@freescale.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
184/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: powerpc: Disable highmem on RT
Date: Mon, 18 Jul 2011 17:08:34 +0200
The current highmem handling on -RT is not compatible and needs fixups.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
185/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: powerpc/stackprotector: work around stack-guard init from atomic
Date: Tue, 26 Mar 2019 18:31:29 +0100
This is invoked from the secondary CPU in atomic context. On x86 we use
tsc instead. On Power we XOR it against mftb() so lets use stack address
as the initial value.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
186/194 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: POWERPC: Allow to enable RT
Date: Fri, 11 Oct 2019 13:14:41 +0200
Allow to select RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
187/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mips: Disable highmem on RT
Date: Mon, 18 Jul 2011 17:10:12 +0200
The current highmem handling on -RT is not compatible and needs fixups.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
188/194 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: drivers/block/zram: Replace bit spinlocks with rtmutex for -rt
Date: Thu, 31 Mar 2016 04:08:28 +0200
They're nondeterministic, and lead to ___might_sleep() splats in -rt.
OTOH, they're a lot less wasteful than an rtmutex per page.
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
189/194 [
Author: Haris Okanovic
Email: haris.okanovic@ni.com
Subject: tpm_tis: fix stall after iowrite*()s
Date: Tue, 15 Aug 2017 15:13:08 -0500
ioread8() operations to TPM MMIO addresses can stall the cpu when
immediately following a sequence of iowrite*()'s to the same region.
For example, cyclitest measures ~400us latency spikes when a non-RT
usermode application communicates with an SPI-based TPM chip (Intel Atom
E3940 system, PREEMPT_RT kernel). The spikes are caused by a
stalling ioread8() operation following a sequence of 30+ iowrite8()s to
the same address. I believe this happens because the write sequence is
buffered (in cpu or somewhere along the bus), and gets flushed on the
first LOAD instruction (ioread*()) that follows.
The enclosed change appears to fix this issue: read the TPM chip's
access register (status code) after every iowrite*() operation to
amortize the cost of flushing data to chip across multiple instructions.
Signed-off-by: Haris Okanovic <haris.okanovic@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
190/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: signals: Allow rt tasks to cache one sigqueue struct
Date: Fri, 3 Jul 2009 08:44:56 -0500
To avoid allocation allow rt tasks to cache one sigqueue struct in
task struct.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
191/194 [
Author: Matt Fleming
Email: matt@codeblueprint.co.uk
Subject: signal: Prevent double-free of user struct
Date: Tue, 7 Apr 2020 10:54:13 +0100
The way user struct reference counting works changed significantly with,
fda31c50292a ("signal: avoid double atomic counter increments for user accounting")
Now user structs are only freed once the last pending signal is
dequeued. Make sigqueue_free_current() follow this new convention to
avoid freeing the user struct multiple times and triggering this
warning:
refcount_t: underflow; use-after-free.
WARNING: CPU: 0 PID: 6794 at lib/refcount.c:288 refcount_dec_not_one+0x45/0x50
Call Trace:
refcount_dec_and_lock_irqsave+0x16/0x60
free_uid+0x31/0xa0
__dequeue_signal+0x17c/0x190
dequeue_signal+0x5a/0x1b0
do_sigtimedwait+0x208/0x250
__x64_sys_rt_sigtimedwait+0x6f/0xd0
do_syscall_64+0x72/0x200
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Reported-by: Daniel Wagner <wagi@monom.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
192/194 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: genirq: Disable irqpoll on -rt
Date: Fri, 3 Jul 2009 08:29:57 -0500
Creates long latencies for no value
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
193/194 [
Author: Clark Williams
Email: williams@redhat.com
Subject: sysfs: Add /sys/kernel/realtime entry
Date: Sat, 30 Jul 2011 21:55:53 -0500
Add a /sys/kernel entry to indicate that the kernel is a
realtime kernel.
Clark says that he needs this for udev rules, udev needs to evaluate
if its a PREEMPT_RT kernel a few thousand times and parsing uname
output is too slow or so.
Are there better solutions? Should it exist and return 0 on !-rt?
Signed-off-by: Clark Williams <williams@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
]
194/194 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: Add localversion for -RT release
Date: Fri, 8 Jul 2011 20:25:16 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
This patch fixes:
Warning: Value of CONFIG_NFT_REJECT_IPV6 is defined multiple times
within fragment ... features/nf_tables/nf_tables.cfg:
CONFIG_NFT_REJECT_IPV6=m
CONFIG_NFT_REJECT_IPV6=m
Signed-off-by: Sam Zeter <szeter@digi.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
1/1 [
Author: Bruce Ashfield
Email: bruce.ashfield@gmail.com
Subject: yaffs: include blkdev.h
Date: Sat, 29 Aug 2020 10:25:04 -0400
The definition of BDEVNAME_SIZE has moved, so we add the new .h
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
]
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
This reverts commit 5ed9a90606e58962adada68e77ab387e8ae43f58.
|
|
And set it to m to avoid conflicts when coretemp.scc is also included.
Also fix the dependencies that rely on this config.
See:
https://github.com/torvalds/linux/commit/25f1ca31e230598eaf3c38d387a355a64bd772a7
Signed-off-by: Anuj Mittal <anuj.mittal@intel.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
Remove CONFIG_OPTIMIZE_INLINING and associated features entirely. The
config has been removed starting 5.7.
https://github.com/torvalds/linux/commit/889b3c1245de48ed0cacf7aebb25c489d3e4a3e9
Signed-off-by: Anuj Mittal <anuj.mittal@intel.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
The watchdog driver can't be y when MFD driver is a loadable module.
Switch the config from y to m to avoid cases when MFD driver is also
enabled as m.
See:
https://github.com/torvalds/linux/commit/b30c1a464c29baf646a5726d90ea3537e775ac85
Signed-off-by: Anuj Mittal <anuj.mittal@intel.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
MEDIA_SUPPORT is set by default to m so DVB_CORE can't be y and this is
the only tristate that we set to y that is controlled by MEDIA_SUPPORT.
Change it to m. Fixes:
[NOTE]: 'CONFIG_DVB_CORE' last val (y) and .config val (m) do not match
[INFO]: CONFIG_DVB_CORE : m ## .config: 3617 :configs/v5.4/standard/features/media/media.cfg (y)
[INFO]: raw config text:
config DVB_CORE
tristate
default y
select CRC32
depends on MEDIA_SUPPORT && MEDIA_DIGITAL_TV_SUPPORT && (I2C || I2C = n) && MEDIA_SUPPORT
Config 'DVB_CORE' has the following Direct dependencies (DVB_CORE=m):
MEDIA_SUPPORT(=m) && MEDIA_DIGITAL_TV_SUPPORT(=y) && I2C(=y) || I2C(=y) = n (=y)
Parent dependencies are:
MEDIA_DIGITAL_TV_SUPPORT [y] MEDIA_SUPPORT [m] I2C [y]
[INFO]: config 'CONFIG_DVB_CORE' was set, but it wasn't assignable, check (parent) dependencies
Signed-off-by: Anuj Mittal <anuj.mittal@intel.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
1/5 [
Author: Bruce Ashfield
Email: bruce.ashfield@gmail.com
Subject: aufs5: aufs5-kbuild
Date: Fri, 7 Aug 2020 09:50:46 -0400
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
]
2/5 [
Author: Bruce Ashfield
Email: bruce.ashfield@gmail.com
Subject: aufs5: aufs5-base
Date: Fri, 7 Aug 2020 09:53:21 -0400
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
]
3/5 [
Author: Bruce Ashfield
Email: bruce.ashfield@gmail.com
Subject: aufs5: aufs5-mmap
Date: Fri, 7 Aug 2020 09:54:28 -0400
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
]
4/5 [
Author: Bruce Ashfield
Email: bruce.ashfield@gmail.com
Subject: aufs5: aufs5-standalone
Date: Fri, 7 Aug 2020 09:55:26 -0400
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
]
5/5 [
Author: Bruce Ashfield
Email: bruce.ashfield@gmail.com
Subject: aufs5: core
Date: Fri, 7 Aug 2020 09:56:48 -0400
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
]
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
Signed-off-by: Sai Hari Chandana Kalluri <chandana.kalluri@xilinx.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
Signed-off-by: Sai Hari Chandana Kalluri <chandana.kalluri@xilinx.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
Add configuration fragment to features/xilinx to enable v4l2 drivers
Signed-off-by: Sai Hari Chandana Kalluri <chandana.kalluri@xilinx.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
In kernel commit 25f1ca31e230 ("platform/x86: intel_pmc_ipc:
Convert to MFD"), CONFIG_INTEL_SOC_PMIC_BXTWC is changed to depend on
CONFIG_MFD_INTEL_PMC_BXT instead of CONFIG_INTEL_PMC_IPC.
Signed-off-by: Yongxin Liu <yongxin.liu@windriver.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
The oci containers fragment was imported from the configuration of
meta-virtualization. But the structure of fragments in meta-virt is
flat, compared to the kernel-cache.
We drop the duplicate lxc fragment, and sync with the main fragment.
We also split out the vswitch and docker fragments so that they can
be more easily found and re-used.
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
The group scheduling options in the lxc fragment were initially
used to support performance guaranteed systems using containers.
This option now causes issues with systemd runtimes and the
original feature it implemented is no longer relevant
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
This will enable CONFIG_ICE=m for intel-x86 bsp.
Signed-off-by: Yongxin Liu <yongxin.liu@windriver.com>
Signed-off-by: Liwei Song <liwei.song@windriver.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
Signed-off-by: Naveen Saini <naveen.kumar.saini@intel.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
Creating an initial feature fragment that can be included when a
reproducible kernel build is desired. This is currently only one
option, but will have more in the future.
Signen-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
kernel.
1/1 [
Author: Jiping Ma
Email: jiping.ma2@windriver.com
Subject: perf: perf can not parser the backtrace of app in the 32bit system and 64bit kernel.
Date: Thu, 30 Apr 2020 09:35:06 +0800
Record PC value from regs[15], it should be regs[32], which cause perf
parser the backtrace failed.
Signed-off-by: Jiping Ma <jiping.ma2@windriver.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
]
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
The uptime feature was specifically requested in the early days of
linux-yocto. It is no longer useful, and is causing build issues, so
we remove it completely.
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
1/1 [
Author: Paul Gortmaker
Email: paul.gortmaker@windriver.com
Subject: yaffs: fix misplaced variable declaration
Date: Thu, 30 Apr 2020 12:59:46 -0400
A variable declaration landed one function higher than intended,
leading to an unused variable warning for configurations with
YAFFS_USE_DIR_ITERATE=y and a build failure for configurations
with the same being unset.
Fixes: "yaffs: Fix build failure by handling inode i_version with proper atomic API"
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
]
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|